201 Comments
Comment deleted
Expand full comment

Did you read OBA? You might disagree with Matt Yglesias’s conclusions and prescriptions, but he’s certainly not a moron.

Expand full comment
Comment deleted
Expand full comment

Have you read "Coming Apart" by C. Murray? You might like it.

Expand full comment

While I don’t have strong opinions on immigration, I do think you’re glossing over some of the hard parts of breaking the cycle of poverty.

First off, you suggest getting a good K-12 education. This in and of itself is a hard problem. Many predominantly slave-descended African American serving schools are not good educational institutions. And turning them into good institutions is both difficult on an individual level, as well as it’s hard to scale up. We could hypothetically just pay teachers tons more, but that probably wouldn’t increase the supply of good teachers, mainly because we don’t really know how to produce them.

Secondly, your results of K-12 education are heavily impacted by your home life. Generally if you come from a low income household, at the very least probably both your parents will be busy and tired from working their jobs. Obviously there are much worse cases, but I don’t know the statistics and would rather not resort to stereotyping. From my experience in a lower-income rural predominantly white school, at least a third of students had a home life that was detrimental to their education.

Finally, you’re ignoring the financial aspects of people dropping out of school. Some people can’t wait to start working until they graduate high school because they have to help support their families. Until we eliminate this cause, you’re still going to have low-income students forced out at higher rates.

Also, as a question: From my impressions of history, many of the ethnic groups you mentioned such as the Irish or Chinese immigrants were also poor, uneducated, and despised about on the same level as the black population. Why wouldn’t they just split the number of available jobs and all stay in relative poverty?

Expand full comment
Comment deleted
Expand full comment

I agree with you on all of those points you just made. To note, I did not mean to imply slavery only exists in the US, I was only trying to use your terminology.

My core point was that you referred to the long term solution of better education as “easy.” I was raising the point that while it is important, it certainly does not appear to be easy.

Expand full comment
Comment deleted
Expand full comment

Okay, makes sense. Thanks for clarifying your point.

Expand full comment

What if we paid parents money if their children did well in school (in hard-to-game ways?)

Expand full comment
Comment deleted
Expand full comment
founding

"Get C's or above" is too easy to game, once the subsidy-financed PTA figures out how to kick back some of that loot to the teachers who always give C's or above.

Expand full comment

I generally support paying student to go to school, to me it's a possible model to combat a lot of poverty-related schooling issues. I do think giving it to the students or the parents both cause serious risks (namely kids will waste money, but low quality parents will waste it too). I do worry that it would be an expensive solution though...

Expand full comment

You need customers to create jobs. One billion people is a lot of customers. And if you read Yglesias's book, you'll see he's just as welcoming of the ultimate uneducated immigrant, aka, American babies, as he is of child or adult immigrants from other parts of the world.

Expand full comment
Comment deleted
Expand full comment

You can make the same point without adding that last sentence, which drags down the quality of your comments a lot.

Expand full comment

Well for the record, I struggled pretty hard to find any "quality" at all -- unless you're judging quality by the trumpian standards of which loudmouth at the end of the bar calls the most people stupid.

"Only I can fix it." Yeah, right.

Expand full comment
Comment deleted
Expand full comment

Reminds me of a woman in the grocery store last spring when I asked that she pull up her mask from around her neck. She complied, confused, then turned back to announce, "I'm not even from around here!" (As if realizing for the first time that she was surrounded by big-city-liberal zombies.)

Say whatever you want, Amy (this place is certainly not predominantly big-city liberals), but you might get some shit for it if it amounts to little more than fist-pounding, along with (to my ears) a bullyish torrent of absolutist declarations.

BTW, does Joe Rogan still hang out with Alex Jones?

Expand full comment

'There are currently 330 million customers; however, the Black American unemployment is consistently around 11% to 16%. What's gonna happen when you have one billion customers? Do you think that the labor force stay the same number of workers, so everyone will be able to find a job, and the unemployment rate will go down to 1-2%?' To defend pro-immigration policies from the view that they'll lead to unemployment problems getting worse, you don't need to think that immigration will *lower* unemployment, but just that the rate of employment will remain the same.

Expand full comment
Comment deleted
Expand full comment

I think there is also a clear division of personalities - traders/portfolio managers need to be comfortable taking risks, most researchers/analysts I met are not (so much)

Expand full comment

Yet... https://www.cnn.com/2014/10/15/us/iraq-chemical-weapons/index.html 🤷🏻‍♂️

Expand full comment

They, like you, downplay this and every other report as “not counting” and yet, eppur si muove...

Expand full comment

What's the argument that dormant Iraqi weapons programs were a reasonable and generally acknowledged causus belli?

Expand full comment

To repeatedly claim that no WMD were ever found in Iraq and thus all allegations were false is itself false. Additionally, components for the production of WMD were found, and there is strong circumstantial evidence that stockpiled Iraqi WMDs were evacuated prior to the invasion. That doesn’t change the fact that major intel agencies also had bad information, but it doesn’t support the imaginary narrative that Scott promulgates over and over again that no WMD were ever found and it was all a lie/false pretext.

Expand full comment

Since the causus belli was "Saddam had an active WMD program." And the reality was "Saddam had no active WMD program." It seems "Saddam has no WMD" is a reasonable shorthand for a true claim, though I suppose it's always better spell things out to avoid pedants.

Expand full comment

It has been a long time since I’ve thought about this issue but I thought the rationale was that the first Iraq war had been suspended on certain conditions, including particular inspection-related obligations. I think it was more or less undisputed that Iraq had violated those obligations by refusing access to inspectors. If that’s right then the rationale was that the conditions on which the suspension of hostilities had been predicated were no longer operative. Then the argument would be that the impact of uncertainty regarding current WMD activity under conditions where inspections obligations had been violated should not fall on the non-breaching parties. And intelligence assessments about ongoing activities were more related to the urgency of action rather than the justification for taking it. But as I say, this is all from faded memory. I understood the operative resolutions and agreements at the time but would have to re-educate myself if I wanted to remember the detailed arguments.

Expand full comment

The legality of the Iraq war has its own wikipedia article, which I think is a reasonable tool for figuring out the general consensus on reality.

https://en.wikipedia.org/wiki/Legality_of_the_Iraq_War

Your argument seems to also be mentioned in a related wikipedia page more concisely, which I'll quote from: https://en.wikipedia.org/wiki/Iraq_and_weapons_of_mass_destruction#Legal_justification

> Most member governments of the United Nations Security Council made clear that after resolution 1441 there still was no authorization for the use of force. Indeed, at the time 1441 was passed, both the U.S. and UK representatives stated explicitly that 1441 contained no provision for military action

So yes, it was determined that they were in breach of inspection obligations, but that alone was not enough to "resume hostilities". The previous conflict was not put on pause, but rather ended and those conditions were given, but not under penalty of the war restarting.

That's what I gleam from the wikipedia article at least, which I expect to be a little less correct than an expert on the subject, and much more trustworthy/correct than an internet commenter with no specific relevant credentials

Expand full comment

There were weapons inspectors on the ground in Iraq at the start of the second Iraq war. These inspectors had to be airlifted out as the invasion started.

Expand full comment

The casus belli had a few parts.

1) Saddam was working towards WMDs, and would be a serious risk if he ever got them.

2) Saddam was in violation of various UN mandates, sanctions regimes, etc., which meant there was a legal casus belli under international law.

3) Saddam was a human rights disaster, and sufficiently evil that his removal was justified on grounds of "good riddance"(to paraphrase).

Side note: #1 isn't the same as saying that he *already had* massive stockpiles. Heck, by the de facto settlement with North Korea, we can see that a big WMD stockpile would have meant that he'd have been safe from invasion. Clearly, nobody thought he had functioning nukes yet, they just worried that he was trying to acquire them in the future.

#3 was very obviously true, and not disputed by anyone serious - the only argument there is if it's sufficient to be a casus belli. #2 was almost as obvious at the time - even if he wasn't really trying to build WMDs, he was trying to create strategic ambiguity around his programs to intimidate rivals, and that ambiguity was created by dodging inspectors rather frequently.

#1 is the one that's in dispute. From what we've seen since the invasion, it's clear that he had no significant stocks of functioning WMDs, just a few leftovers in forgotten bunkers. There were at least a few attempts to maintain the capability to develop them in future if the sanctions ever lifted, but from what I can tell, they weren't terribly effective attempts. Burying centrifuges in the backyards of your nuclear physicists is at least worthy of mention, but you aren't going to want to rely on them still spinning years later.

I supported the war at the time, and for some years thereafter, though not any more(tl;dr, I had way too much faith in the ability of the US to stand up a functioning Iraqi state and society once the war was over). And from what I can tell, I think that there was enough truth in #1 to be within the margin of confirmation bias(i.e. that it was honestly believed by decision-makers), but there's not enough truth that it really stands up as a casus belli with 20/20 hindsight. It's only kinda true, and that's not enough to start a war over. And if that's not true enough for a war, then #2 is a pretty weak reed of a casus belli too.

#3 is still entirely apt, and watching that bastard hang was the single best moment of the whole war. But seeing Saddam Hussein in a noose doesn't justify a war all by itself. It was good, but not *that* good.

Expand full comment

I'm a little sad. I thought Bush in Iraq was a good non-controversial example with which to explain the overall point. I clearly failed.

It's hard to explain a concept without using any examples, but then the examples somehow always eclipse the point...

Expand full comment

If #3 were adequate as a casus belli, then when are the bombs going to start raining down on Saudi Arabia?

Expand full comment

A casus belli is a justification for war.

But having a justification for something does not mean you actually *want* to do that thing. I may be justified in firing an employee who screws up, that doesn't mean I will actually do so if they are an otherwise good employee. I may be justified in breaking off a friendship with someone who insults me, but no one would find it strange if I forgave them instead.

Ukraine has a crystal clear casus belli against Russia. What could be a better justification for war then "Well they actually invaded us and stole half our country". They still didn't go to war with Russia. It needs no explaining why.

By the way, #3 is generally not seen as sufficient for a casus belli. Whether it should be is a question of philosophy. For myself, I'll just say that it would be nice to live in a world most countries were well behaved enough that it could be used as one.

Expand full comment

I am aware of what a casus belli is. My point is that if being a "human rights disaster" were sufficient as a casus belli, then we have chose not to make war on by far the worst human rights violator in the region.

Expand full comment

The other problem with #3 is that a single Big Bad Evil Guy firmly in charge might be preferable to a power vacuum and lots of Smaller Bad Evil Guys all fighting each other. See: Libya after Qaddafi fell.

Expand full comment
founding

A *dormant* Iraqi chemical weapons program would have been a poor cause for war. An *active* Iraqi chemical weapons program would have been a direct and explicit violation of the agreement that ended the *last* war. Deliberately violating peace treaties is generally considered a legitimate reason to go to war, and if the term of the peace treaty that one is violating is the "no forbidden superweapons" part then it is reasonable to believe they have designs involving the use of those weapons.

Iraq didn't have an active chemical weapons program in 2003, but the leaders on both sides were doing their best to convince everyone that they did, so we got a war.

Expand full comment

I thought the U.S. administration was trying to convince everyone that Iraq had an active nuclear weapons program — not that they had made them yet but were working on them. We knew they had had chemical weapons, although not whether they were still producing them. "Weapons of mass destruction" is conveniently ambiguous between the two.

Expand full comment

The article explicitly states that these weapons were old, made before 1991, and not from a then-active program of building them. Yes, if this were a prediction market and it were phrased as "chemical weapons will be found in Iraq", it would pay out, but the issue under contention was always whether there was an active program of building these (because that was the claimed casus belli), not whether there might be old ones lying around. So, you know, hopefully such a prediction market would have *not* phrased it in such a way!

Expand full comment

I 100% agree about hoping a prediction market would structure the proposition differently, but what I am taking issue with is Scott and so many others’ retcon that there were no WMD or evidence of their production or stockpile found after the invasion of Iraq. Even CNN dissembles here and tries to turn it around on the US. If we were foolish enough to hand them to the Hussein regime, make the attack about that, but to paint it as “NBD these chems don’t count because we say so” is disingenuously avoiding the issue.

Expand full comment

> I am taking issue with is Scott and so many others’ retcon that there were no WMD

This is ridiculous. The "retcon" is that the existence of *any* WMDs was relevant to the discussion at hand. Scott's reference to this issue is criticisms of a pundit class that justified a war on WMDs. Everyone who lived through that period knows, in general, what he's referring to even if the specifics of "active WMD program" vs. "decades-old stockpiles of WMDs" eludes them.

Your criticism amounts to, "You should've spent more words elaborating in order to be technically correct, even though everyone already knew what you meant and the technicalities that I'm upset about are irrelevant to your overall point." Which, fair enough if that's your thing, but calling it a "retcon" is absurd.

Expand full comment

These are leftovers from the Iraq/Iran war. Not some new weapons program.

Sometimes Belgian farmers unearth 100 year old mustard gas shells with their ploughs (Google "iron harvest"). Are we going to war with Belgium next?

Expand full comment

I feel like we’d have to take a number after the handful of Euro countries that haven’t gone to war with Belgium, yet. Snark aside, there was and still remains evidence that the Hussein regime didn’t destroy everything and had likely dual-use facilities available (since many chemical weapons are small steps off of widely used agricultural chemicals.) https://www.pbs.org/wgbh/pages/frontline/shows/gunning/etc/arsenal.html <— VX doesn’t “keep” forever after mixing, but it’s precursors are relatively stable and can be handled surprisingly safely to transport. If I could put a small sum on “predictions about future declassification” I would predict we will find out that there were indeed CW stockpiles, but they were removed to Syria (in 2003 much less solidly aligned with Iran) prior to US invasion. Even a grossly incompetent Republican Guard could move some trucks into a friendly territory.

Expand full comment

Any day now, the truth will come out! All we need is some evidence!

Expand full comment

On top of what everyone else mentioned, it's worth pointing out that the term "WMDs" itself is a (probably deliberate) equivocation that conflates small-scale chemical weapons with nuclear bombs. The Bush Administration clearly wanted the public to be thinking in terms of nuclear weapons, using "mushroom cloud" imagery and talking about Iraq looking to buy uranium ore from Africa.

Given all that, I'm a lot more comfortable rounding off to "WMD was a lie" even if technically there were some old chemical warheads lying around.

Expand full comment

The only WMDs that mattered were nuclear. Chem and bio are in a completely different league.

Expand full comment

Chemical, sure. I'm not sure about bio, though. More people have died of COVID-19 than died from the two atomic bombs dropped on Japan. How many people could someone kill by infecting themselves with smallpox and hanging around in airports?

Expand full comment

A few thousand. Millions over a year or two if you were really lucky, and if your opposition was really incompetent. COVID has killed barely half a million people in the USA.

You could kill millions in forty-five minutes with only a dozen primitive atomic bombs and ICBMs. With 1960s tech and an economy the size of modern Iraq (which had a higher GDP per capita than 1950s America, btw), you could probably kill a hundred million Americans in 45 minutes after a twenty-year program.

Expand full comment

Covid-19 has about a 1% fataility rate at most; smallpox had a 30% one. A hypothetical virus as contagious as SARS-COV-2 and as lethal as smallpox, with no pre-existing vaccine, would have killed 15 million Americans instead of 500,000 and 60 million worldwide. HIV has killed approximately 30 million people worldwide so far. Estimated deaths from the 1918 flu pandemic are 50 million worldwide. Potential deaths from bioterrorism don't seem to be that different from nuclear terrorism.

Expand full comment

Another point of comparison: a simulation of a massive nuclear attack by Israel against Iran resulted in about 20 million fatalities.

Source: https://conflictandhealth.biomedcentral.com/articles/10.1186/1752-1505-7-10/tables/4

Expand full comment
founding

Only in technothrillers and conspiracy theories does "biological warfare" normally mean releasing an actively infectious disease and hoping it infects only the enemy. That's the biological-warfare equivalent of the cobalt bomb; nasty and theoretically possible but nobody actually does it because it's too easy to score an own goal. Actual biological warfare programs are almost always targeted at diseases whose vector can be more directly controlled, e.g. Anthrax whose spores can be cultivated en masse in your production facility but whose victims are unlikely to go on and infect other victims. So, functionally equivalent to chemical weapons but with a lower LD50 and a longer time of action.

Expand full comment

Agreed; most actually existing biological warfare programs weren't trying to make "If I go down I'm taking you with me" doomsday devices to deter attacks. But they could have.

Expand full comment

Prior to Operation Iraqi Freedom, physicist Richard Muller argued that the findings of inspections after the first Gulf War made it impossible to trust supposed negative evidence from UN inspector teams. They had completely missed the main Hussein nuclear program using Calutrons. He predicted an invasion was inevitable.

https://www.technologyreview.com/2002/02/07/235255/springtime-taxes-and-the-attack-on-iraq/

Expand full comment

Thanks !

Also maybe explains why the US dropped not just one but two nuclear bombs : to compare their effects!

Expand full comment
founding

0% implied chance of UER between 4 and 5% feels disqualifying to me!

Also worth noting that some of the markets (e.g. inflation, sports) listed have deeper and better existing markets than prediction markets (TIPS and sports betting, namely).

Expand full comment

I think you misread; the implied chance of UER between 4% and 5% is >=60%.

Expand full comment
founding

Whupps, you are correct. I hereby disqualify myself!

Expand full comment

On his own Substack, the misreadings of this particular pair of predictions generated some very large comment threads.

Expand full comment

Slight correction: That should say 60%, not >=60%. I mean, it is >=60%, but that's being under-specific. For some reason I thought we couldn't say more than that, but we absolutely can.

Expand full comment

In response to "we should cherish people who are often extremely wrong but occasionally right when nobody else is; these people would lose a lot of money", I think it would be easy to make the case that they would be quite profitable (or perhaps at least break-even), as the payoff for their successes would be significantly more than their losses (given that everyone disagrees with them, they would theoretically be able to find extremely favorable odds to bet on), so even if they are right only 5% of the time, that could easily be >=20x payout for when they are (and indeed there are quite a few investing strategies, both in prediction markets and traditional markets, that do just this: make small losses for months on end until some improbable event occurs, then making it all back plus some. In many cases this is basically what hedging is as well).

Expand full comment

That’s what the “overconfident” bid is; if the market thinks something is 85% likely and you think it’s 80% likely, trade at 80% even if it means buying the 20% against.

If you’re right, 80% of the time you’ll lose money on the proposition.

Expand full comment

I believe this is exactly how Nassim Taleb got rich.

Expand full comment

I was gonna say the same thing: that's the old Taleb strategy.

Expand full comment

This is what I was thinking when I was reading the original post. Indeed, which is the more important signaling mechanism here? Would we rather get a little more certainty about an event everyone already agrees is likely, or get some insight into an event that nobody thinks is likely?

Expand full comment

In fact, I'd argue this hypothetical person MUST be able to make money on some hypothetical prediction market, or else they aren't actually adding any value and shouldn't be cherished. (A random number generator is occasionally right when nobody else is. This does not make listening to it valuable.)

Scott's example (from the older SSC post he linked) of a magic box that generates elegant scientific hypotheses that are sometimes right but sometimes wrong is ONLY valuable as long as it's right *more often than chance*. A box that generates ALL hypotheses is just the Library of Babel ( https://en.wikipedia.org/wiki/The_Library_of_Babel ) and its information-theoretical payload would be zero.

If you are right *more often than chance* about things that no one else is right about, then you should be able to make money on a prediction market that robustly covers those topics. (Of course, there's no guarantee that such a prediction market actually exists. But you'd have at least a hypothetical money-maker.)

Expand full comment

There are plenty of ways of adding value that don't have any direct cash value in terms of predictions. The classic example from the history and philosophy of science is Copernicus - the Copernican model of the planets (including the Earth) moving in circles (with epicycles) around the Sun was *worse* for predicting the positions of the planets in the sky than the Ptolemaic model where the planets and Sun move in circles (with epicycles) around the Earth. The big thing Copernicus had going for him at the time was that he could explain why the major planets (Mars, Jupiter, Saturn) all go into retrograde precisely when they were in opposition to the sun (because that's when the Earth's orbit pulls ahead of the planet's own orbit) while the Ptolemaic system just needed one specific epicycle for each, that happened to be perfectly aligned with the period of the sun's orbit. And of course, about 70 years later, it turned out that Copernicus's idea of putting the sun at the center was extremely helpful when Kepler replaced the circles-plus-epicycles with ellipses, and finally got better predictions than the Ptolemaic model.

Saying that we shouldn't cherish someone until they are already able to make money in their predictions is saying we shouldn't cherish Copernicus.

Expand full comment

I don't think you are thinking of "predictions" in the same way as me (or as Scott, judging from his example of the hypothesis-generating box).

If you are claiming to add value to the discussion by proposing that the planets orbit the sun, then I would say that the thing you are predicting is "the planets orbit the sun". If you were right about THAT, then that represents a winning bet in some hypothetical prediction market, whether or not you can predict the planets' exact positions. (Just like a correct prediction that "it will rain on Thursday" could win money without predicting exactly where every raindrop will fall.)

If you can say novel things like "the planets orbit the sun" and be right more often than chance, then there is some hypothetical prediction market where you could make money (on average) based off that talent.

(The practical difficulties in creating and running that prediction market might be substantial; I am not suggesting that we *actually* use prediction markets for *everything*. My point is just that your average accuracy needs to outperform a uniform credence over all possible hypotheses, or else you are contributing no more than a random number generator.)

Expand full comment

Right - I'm thinking of predictions as operationalizable things that someone can make a finite-time bet on, which I think is essential to the basic motivation for prediction markets.

Expand full comment

That 20x is slightly reminescent of the https://xkcd.com/882/ spoof of p-hacking (visavis the "traditional" 95%-confidence threshold).

From the perspective of one bidder, if they made many "contrarian" bids at odds of 1/20, and came to win only 1 bid for every 21 such bids they'd made, then that one win may be nice but their payout still doesn't cover it.

From the perspective of those who skim the vast sea of bidders in order to identify those bidders that are making various 1/20 bids *and* are winning them more often than one time out of twenty - odds are that they *will* succeed to find such bidders. But what will be the basis for asserting they did not simply find "green jelly beans" in the vast sea of bidders?

Expand full comment

Well, I think if you scan to find those with a historical record of beating the odds, and make the assumption that some of these are 'green jelly beans' who got lucky and some have some genuine additional information to contribute... you then have to have an observation period. Over the observation period, the lucky jelly beans will lose their good track records and the actually-useful-contrarians will mostly maintain their track records. The longer you observe before trusting/betting on the apparent contrarians, the more lucky jelly beans will fall out and the purer your sample will be. You will, by misfortune, lose a few useful contrarians from the batch too, but a large enough sample to start with should tolerate this effect ok.

Expand full comment

This all assumes that people are static. Someone might've been genuinely good at picking out contrarian takes ten years ago and then lost their skill.

Expand full comment

Not sure if you've seen this before, but for good measure: TipRanks analyzes stock analysts recommendations. https://www.tipranks.com/

Expand full comment

The Brier score, Brier skill score, and concept of climatologic forecasts having accuracy but no skill are all relevant here.

In essence: suppose you know that it rains 1/5 days in a given place. Then, in the absence of any other information, you should predict 20% chance of rain for a given date. This forecast will be completely accurate but not skillful. Accuracy here is defined as: of all your 20% forecasts, how many days had rain? If it's 20%, you were perfectly accurate. But since the climate in this case is know, your skill as a forecaster depends on how accurately you deviate from the climatologic prediction. (There's also power, which is how close your forecasts are to 0 or 1.)

EU Met Cal used to have a spectacularly good set of articles explaining all of this (at eg https://www.eumetcal.org/resources/ukmeteocal/temp/msgcal/www/english/msg/ver_prob_forec/uos2/), but it seems to be off the web with no Archive. If I can find a working link, I will post.

Expand full comment

To elaborate: use of the appropriate scoring tool avoids the sunrise-prediction method of running up one's score. The climate value would simply be set at 100%, making the prediction that the sun will rise tomorrow accurate, powerful, but utterly unskillful.

Expand full comment

How do you establish the "climate values" for some arbitrary prediction like "Joe Biden ends the year with his approval rating above 50%"?

Expand full comment

Whatever the historical average approval rating is for all presidents at the end of their first year for as long as it's been measured.

Expand full comment

How are you choosing the reference class? Why just presidents, rather than all politicians? Are you limiting it to US presidents? Why not limit it only to Democrat presidents, or only to presidents in a certain age range?

Why base things on the *absolute* approval rating rather than the *change* in approval rating compared to the start of their term? The prediction "Joe Biden's approval rating will increase by at least X" would have been logically equivalent, for some value of X (possibly negative)--would the baseline "climate value" have been different for that prediction?

If the historical data has an obvious trend over time, are we supposed to take that into account? If Joe Biden has a history of higher approval ratings than his average peer, are we supposed to take that into account?

What about predictions with no obvious reference class at all, like "Apple releases new iMacs powered by Apple silicon" or "The EU ends the year with more confirmed Covid-19 deaths than the US"?

Expand full comment

If there's no historical referent, probably impossible to use a weather/climate analogy.

On the others, weather prediction faces similar barrier questions, they're just a bit easier to answer. Why choose to ask if it will rain in a 24 hr period, and not over 2 days? Or a week? Why ask a yes/no rain question instead of >0.25" or >0.5"? What geographical boundaries to draw around the dataset - should we look only at that single weather station, or include stations within 10 miles? 50 miles? 1000 miles?

Some of these questions are answered by forecast utility (we care if it rains on a given day because days are meaningful to how we go about our lives) and some on prediction utility.

At the end of the day, the goal is to have a baseline to score skill against. So if there's a basic flaw in the climate value (like a rising baseline over time), savvy forecasters will observe that and it will show as skill in their scores. This even happens with weather - predicting climate value +0.1 degree for daily temperature each day in 2022 is likely to have some positive skill for most of the globe!

Expand full comment

Your questions don't seem similar to mine. Yours are about what sort of prediction we ought to make; mine are about how you determine the baseline "climate value" for a *given* prediction.

I'm not asking whether someone ought to predict Joe Biden's approval rating after 1 year or 2 years or 6 months, I'm asking GIVEN that the prediction was for 1 year, how do we get the baseline "zero skill" prediction that we're scoring it against?

> "So if there's a basic flaw in the climate value (like a rising baseline over time), savvy forecasters will observe that and it will show as skill in their scores"

That's exactly the problem. If your goal is to prevent people from running up their scores by making easy predictions like "the sun will rise tomorrow", then you need to somehow discount ALL easy predictions, not just a few of them. If there's a basic flaw in your "climate value", then your forecasters can exploit it to score free points. The top-scoring predictors will just be whoever made the largest number of predictions that exploited that flaw, rather than the ones with the greatest actual insight.

Expand full comment

Do any of those other things then. I wasn't coming up with some great thought out proposal for how to estimate a base rate, just pointing out that any event with precedent has a base rate. These aren't time-invariant processes, so we're always stuck with a flawed estimate, but it's something. Weather isn't time-invariant, either.

Expand full comment

Harder to do than with weather.

One answer the same: set boundaries on a historical record. Adam's answer is where I'd start: what was the approval rating for all presidents at the end of the first year of their term. Setting the boundaries is where it gets difficult. First terms only, or count year after reelection too? How far back to go?

Alternative: use the prediction market! Let the market make a prediction through buying and selling for 1-2 months. Then close bets, let pundits see the market value, and let the market value be the climate. Skill for the pundit is when they differ from the market.

Expand full comment

I'm confused. The Brier score is just the sum of the squares of the errors of your predictions. But what you seem to be describing is more related to a "calibration" score, which *doesn't* incentivize people to make more extreme predictions the way the Brier score does (except in the parenthetical about "power"). I'd definitely be interested in learning more if you have it, but I'm used to both the Brier (quadratic) scoring rule and the logarithmic scoring rule.

I'm not aware of a scoring rule that has a way to compare people who make predictions on different sets of claims, or a way to "objectively" measure the "difficulty" of claims, so that we can give more points to the people that are closer on the "harder" ones.

Expand full comment

The Brier skill score is one way to compare the accuracy of predictions. It's essentially a difference in Brier scores between two models.

Let's use the approval rating question. Over the last 20 years, Presidents have had a mean of about 48% - I'm going to assume that this maps to the median, though I haven't checked the actual numbers. Let's posit that the distribution of answers to the question "Is the President's approval rating greater or lower than 50% on a given day?" is 55% no, 45% yes. So our climate prediction is 45%.

Now, let's say the outcome is No. The Brier score for the climatic prediction is (0.45-0)^2=0.2025.

If I predicted 45% yes, my Brier score for this prediction is the same (0.2025), and my Brier skill score is *0* (0.2025-0.2025)/0.2025 - I predicted the same thing the climate did, so I demonstrated no skill.

If I predicted 10% yes, my Brier score is (0.1-0)^2=0.81. My Brier skill score is now (0.2025-0.01)/0.2025 = *0.95*. Because my prediction was better than the climate, I get a good Brier skill score.

If I predicted 70% yes, my Brier score is (0.7-0)^2=0.49 (NB: Better Brier scores are smaller.) My Brier skill score is (0.2025-0.49)/0.2025 = -1.42. Because my prediction was better than the climate, I get a bad Brier score.

Brier skill scores can be combined for sets of predictions. Predictions that add no information to the baseline (the sun will come up tomorrow) generate skill scores of 0, so do not move the needle either way. Predictions the wrong way vs. baseline reduce Brier skill scores. Predictions the right way increase them.

Setting baseline can be tough, but it's a problem similar (identical?) to that faced by sports bookmakers in setting spreads.

The Brier skill score is far from perfect and it's also not the only option for measuring forecast skill. I mention it to illustrate that measures of skill exist, and that they can address at least the simple concerns raised by Mr. Alexander. The broader point is that measuring forecast skill to avoid the obvious pitfalls of juking the stats is a problem considered in great depth by meteorology, so those who want to measure pundit performance on prediction would do well to start with weather.

Expand full comment

Dang, last "better" was supposed to be "worse" in the 70% yes case. No excuse. "Because my prediction was *WORSE* than the climate, I get a bad Brier score."

Expand full comment

I think this solves the issue of comparing forecasters of Phoenix weather against forecasters of Dallas weather against forecasters of New York weather, and forecasters of presidential approval rating against forecasters of prime ministerial approval rating. But I don't think it solves the bigger problem of comparing forecasters of economic variables against forecasters of political variables, since there are easy and hard variables within each, and many of them are one-off events that don't have natural reference classes.

Expand full comment

I agree that comparisons across types of predicted events is not addressed with forecast skill metrics. I think it may be impossible to compare an economist's skill at forecasting inflation with a political scientist's skill at forecasting approval rating.

Brier skill scores are also not comparable for different predictions even in the same category. Because the climate prediction is in the denominator, the values are heavily influenced by the baseline rarity of events: for rare events, the climate prediction is more accurate (as measured by Brier score) and it is much harder to show skill.

I'm also concerned about the sample sizes: meteorological forecasting metrics usually involve prediction of at least hundreds of events (say, temperature every day for a year). A single prediction, or small set of them, will be hard to evaluate for skill.

Expand full comment

My half finished prediction market awards extra points for being right when others were wrong and also being first to be right - that was my attempt to adjust for everything anyway.

In light of ever the above article - what do you (meaning fellow commenters) think of the relevance of those two criteria?

Expand full comment

A real-money prediction market like PredictIt accounts for those criteria implicitly. Why make a more complicated system that tries to replicate that effect?

Expand full comment

Letting the prediction market see the pundit is actually a fatal flaw. Those pundits that who the market accurately assesses the reliability of will make an efficient market better at predicting than they are by sharing their position and justification.

It’s one thing to be able to beat the market; it’s another thing entirely to be able to consistently beat the market while also sharing your reasoning and bets.

Maybe you could compare the prediction market before and after a post by a pundit who makes a prediction that differs from the market, to see which direction it updates, and then evaluate whether it more often updated in the direction that it finally evaluated, in conjunction with whether the market adjusted towards or away from the pundit’s estimate.

If the market updates away from the pundit consistently and improves accuracy that way, then they are providing useful analyses but are bad at prediction. If the market updates in a way that reduces accuracy, then the analyses are of negative value, even if the prediction is more accurate than the market.

But there is no way to hold all other things equal.

Expand full comment

Good point, maybe the steps for a conscientious pundit should be:

1. Make prediction.

2. Check to see if there are existing predictions like it. If there are none create a new bet on the prediction.

3. Compare your prediction with the public prediction. If its the same, no new info is really added so skip writing a post.

4. If it is different write up your post explaining why you think it is different.

5. At the end of the year tally up your differences from public market and personal predictions.

Expand full comment

You could publish a commitment to your predictions, too. For a simple one, take a random number of 256 bits, prepend it to a text file of your prediction, hash the whole thing with SHA256, and publish the hash. (If you want this to hide your prediction, you have to use a really random number, not just something you make up off the top of your head or get from a non-cryptographic RNG.)

Expand full comment

To open the commitment, reveal your random number and text file.

Expand full comment

Zeynep isn't wrong often. I'd love to see her vs. an aggregate punditry. Have you talked to the Good Judgment folks about running something like this for pundits that were willing to jump in? Have a parallel version for Superforecasters. Keep track of results.

Expand full comment

Most of what I read comes from the investing community. On that note, that's where my income comes from too.

Currently, my largest investment is in AT&T. The investment thesis here runs along the line of-

HBO Max will do well.

Even without HBO Max AT&T is fine.

What I want to point out here, is that the bet is two sided.

That is, I'm not saying [HBO good: 70%]. In fact, I have no specific number. It's more like [HBO good > Debt dangerous: >50%]. This same logic can be easily applied to politicians, media, and bureaucrats as well. So, Bush said Saddam had WMD, and that we should go to war to solve this.

The two sides of this prediction are:

Iraq war will prevent the nuclear annihilation of New York.

Iraq war will be quick and easy.

If the first half had been proven true, I would have given Bush lots of points for that, sure. But if only the second half had been proven true, I still would have given Bush lots of points. If Saddam did have WMD that he did plan to use on New York, but our military was crushed in a failed invasion (perhaps because of help from, China?) I would have penalized Bush by a ton.

Of course, both halves proved false, and I despise Bush, for that and other reasons, but, anyway, the point is, most policies have an implicit [potential cost]:[potential return]. In investing, it's not actually important to be right about anything. What matters is basically, to not be wrong. At least, the investors I tend to look to for advice, make lots of investments with the potential for profits, and little in the way of potential losses. I think the same can be asked of people like Trump, Fauci, and Tucker Carlson.

Expand full comment

https://metaculusextras.com/top_comments?start_date=2021-03-01&end_date=2021-03-08

Week before last's top comments:

- misha rounds up different forecasting platforms takes on the Olympics (https://www.metaculus.com/questions/5555/rescheduled-2020-olympics/?invite=tLxPdB#comment-56906)

- There were no terrorist attacks in OEDC founder states between the 3-Nov-20 and 1-Mar-21. (But Jgalt points out there were attacks on either side) (https://www.metaculus.com/questions/5441/terrorist-attack-in-oecd-founder-state/?invite=tLxPdB#comment-57003)

- F for SN10 (Jglat) (https://www.metaculus.com/questions/6498/will-sn10-land/?invite=tLxPdB#comment-57038)

- ege_erdil and I look at the distribution for bitcoin's peak in 2021 (https://www.metaculus.com/questions/6666/maximum-price-of-bitcoin-in-2021/?invite=tLxPdB#comment-56828, https://www.metaculus.com/questions/6666/maximum-price-of-bitcoin-in-2021/?invite=tLxPdB#comment-56829)

- ege_erdil calculates the base rate for resignations for politicians resigning after accusations of misconduct (https://www.metaculus.com/questions/6693/will-ny-governor-andrew-cuomo-resign-soon/?invite=tLxPdB#comment-57116)

https://metaculusextras.com/top_comments?start_date=2021-03-07&page=1

Last week's top comments:

- ThirdEyeOpen's application to be a moderator (https://www.metaculus.com/questions/6805/2021-spring-equinox-moderator-election/?invite=9I6hgw#comment-57728)

- Various meta-points on resolution (https://www.metaculus.com/questions/353/will-someone-born-before-2001-live-to-be-150/?invite=9I6hgw#comment-57513, https://www.metaculus.com/questions/6145/brent-crude-oil-to-exceed-70-in-2021/?invite=9I6hgw#comment-57508, https://www.metaculus.com/questions/5555/rescheduled-2020-olympics/?invite=9I6hgw#comment-57601)

- chudetz shares his docuseries on the BEAR question (https://www.metaculus.com/questions/6087/when-will-ai-understand-i-want-my-hat-back/?invite=9I6hgw#comment-57466)

- Cory points out US semi manufacturing might be on the rise (https://www.metaculus.com/questions/6249/november-2021-production-of-semiconductors/?invite=9I6hgw#comment-57456)

- EvanHarper identifies a source how other NY officials feel about Cuomo's position (https://www.metaculus.com/questions/6693/will-ny-governor-andrew-cuomo-resign-soon/?invite=9I6hgw#comment-57783)

Expand full comment

I caution against getting too far into the predictions game, because it draws attention away from substance -- as do the horse-race elements of election coverage. It's usually not important to predict stuff like Biden's end-of-year public approval rating. What counts is: (1) whether a pundit was on the mark in warning about an under-protected risk (Protect against pandemic, dummy); (2) whether a commentator was farsighted in revealing options that most others were ignorant about (e.g., Farrington Daniels' early book on solar power); and (3) whether an expert revealed hidden facets or complexities in ways that deepened readers'/viewers' thinking (perhaps "Think Like an Economist"). There no doubt are more such major categories, but not many.

Expand full comment

I agree with this sentiment. In scientific research (specifically speaking from working with folks in hydrology and water management), prediction certainly has its place (predicting annual rainfall, forecasting streamflow etc...), however, what's really useful for water managers is thinking through scenarios, assessing options (if this, then this) and grappling with decisions under deep uncertainty. The deep uncertainty part is important since the it means the decision context is no longer within quantifiable probability estimates.

Attempting to make predictions in such contexts is a fools errand.

Expand full comment

I agree whole-heartedly and one one more category: (4) pointing out that something which is currently attracting a great deal of Deep Concern is in fact a storm in a teacup.

A good pundit doesn't think for me, but directs my thinking towards what matters. This concept should make more than a little sense to a professional psychiatrist!

Expand full comment

I don't know if I view the primary job of a pundit as prediction. I see them as primarily there to provide context and perspective on issues, which prediction should be a part of, but how harshly should we judge bad predictions?

In the Iraqi WMDs example, pundits were making their assertions based on the available evidence that strongly indicated such weapons did exist - the US government, the US intel community, foreign governments/intel communities, Saddam Hussein himself... All widely agreed that Iraq did have WMDs. How often do we have events with such a large amount of misleading info and so little publicly available truth?

Expand full comment

Dishonesty seems rather common in politics. Here's Greg Cochran on how he knew the intelligence agencies were wrong about Iraq:

https://westhunt.wordpress.com/2019/02/20/beating-the-spooks/

Expand full comment

He's talking about nuclear weapons - at the time, the thrust of the argument was around chem/bio weapons and perhaps old Russian warheads. I was a sucker for the Iraq War lies, but I never thought Iraq was making nukes. (I definitely thought the Russians probably gave them some at the time.)

Expand full comment

https://westhunt.wordpress.com/2019/02/20/beating-the-spooks/#comment-126334

"Calling something like mustard gas, or even nerve gas, “WMD” is just a lie. Mind you, Iraq wasn’t producing either one in 2003."

From another comment there:

"There were gas shells buried in the mud or left in an abandoned bunker by mistake."

He discussed bio war in the main post:

"This was demonstrated again later, when the CIA and DIA concluded that we had found ‘mobile biological labs’ in Iraq. Which were actually vans with portable hydrogen generators – hydrogen for balloons intended to measure high-altitude winds, for increasing the accuracy of artillery. Pinheads."

Overall, he didn't think Iraq had the money or technical capacity to be much of a threat.

Expand full comment
founding

Calling mustard gas a WMD is plain and literal truth, because as long as we've had the term "WMD", war gasses even of the crude WWI variety were one of the three central and definitive examples. The term "WMD" war gasses, biological weapons, nuclear weapons, and anything some future boffin invents that's as lethal and indiscriminate as those". If you want to talk about nuclear weapons because you don't think chemical and biological weapons are lethal enough to be in the same league, we've got a perfectly good term to talk about just nuclear weapons. We've got a term for "weapons", another term for "very bad indiscriminate weapons", and a third term for "the one type of really apocalyptically bad indiscriminate weapon we know of so far", and that's a pretty good set of linguistic tools.

Believing in 2003 that Iraq possessed WMD was reasonable, due to the Iraqi regime's deliberate attempt to convince everybody including its own army that the regime was totally badass because of the hidden bunkers full of Sarin and Anthrax. Believing in 2003 that Iraq possessed or soon would possess nuclear weapons, was foolish. Most of the people I was following at the time, seem to have been reasonable and not foolish.

Expressing one's reasonable belief in Iraqi weapons of mass destruction in terms likely to leave the lay public (or random congressman) believing that Iraq had or would soon have *nuclear* weapons, was malicious and deceptive. I saw a lot of that in 2003, even from people who were carefully couching their explicit claims in the more restrictive and technically accurate sense.

Expand full comment

Mustard gas wasn't even used in WW2. I wouldn't lump biological warfare in with it. The current pandemic has done far more damage than all chemical warfare put together. Infections can spread much further than gas while retaining potency (this also means they can come back to hit the country using them, so it matters how reckless a user is).

Expand full comment
founding

Then you might want to use the somewhat cumbersome term, "nuclear and biological weapons". You will fail at communication if you use the term "WMD" to mean "only nuclear and biological weapons".

Expand full comment

Indeed. My recollection from back then was that every Iraqi commander we debriefed had no chemical weapons in his unit, but had been told by his superiors that such weapons existed and were held by certain elite units.

Eventually we figured out that those commanders had been lied to, and there's a good chance they were the source of the assurance from pretty much every European Intel at the time that "Saddam had Chemical weapons."

I don't recall ever hearing anyone credible assert that Saddam had a nuclear weapons program, but that's over twenty years ago now.

Expand full comment

Colin Powell's testimony before the UN claimed that Iraq had ordered aluminum tubes that could only be for building a centrifuge to refine uranium, IIRC. And the standard line I remember from various politicians was "Our first warning could be a mushroom cloud rising over an American city." Joseph Wilson (Valerie Plaime's husband) was sent off to find evidence that Iraq was buying uranium ores ("yellowcake") from someplace in Africa. And so on. The claim, and probably the belief, included nuclear weapons.

Expand full comment

To serious people, the question of whether Iraq was working on nuclear weapons was just about the only WMD question that mattered. Chemical weapons are World War I technology that Hitler got along without. Biological weapons are a good way to make yourself sick, not an offensive weapon of much use.

Nuclear weapons, however, can be big trouble.

In October 2002, Cochran explained why Iraq didn't have the money or brains anymore to develop nukes.

Expand full comment

There are perfectly effective offensive biological weapons. Tularemia and anthrax work fine. Smallpox would be pretty effective now too, since nobody is vaccinated against it anymore.

The commies were dropping tularemia on German units outside Stalingrad, trying to slow down advances. It worked.

And a really nasty biological weapon would be worse than a nuke. I remember reading some Soviet work on engineering human myelin protein into a fairly innocuous germ: you get sick, mild pneumonia, then a week or two after you clear the infection, major autoimmune response and you get induced multiple sclerosis.

Expand full comment

Ah . . forgot that piece of it, though I remembered l'affair Plaime and how silly it seemed at the time. Twenty years and all that.

Expand full comment

I've linked this above as well. In 2002, physicist Richard Muller, who had been involved in the post-Gulf War assessment of Iraqi nuclear programs, predicted that an invasion of Iraq was inevitable. He based that the evidence found previously, that entire secret enrichment programs using technology nobody in the West predicted Saddam would use, had been in operation under the nose of both intelligence and inspector surveillance.

https://www.technologyreview.com/2002/02/07/235255/springtime-taxes-and-the-attack-on-iraq/

Expand full comment

You can find Gregory Cochran's original October 14, 2002 explanation of why the Iraqis couldn't be progressing toward a lot of nuclear weapons here on Jerry Pournelle's blog:

https://www.jerrypournelle.com/archives2/archives2mail/mail227.html

Expand full comment

Wow, that article is shockingly racist. Calling the population "morons" (unlike the "smart" North Koreans), saying no one in the Muslim world has "invented or discovered anything in 700 years", etc.. This author is disgusting, and even if it should have been obvious that the Iraqis weren't making nuclear weapons it's still crazy how much bias he's bringing to his post.

Expand full comment

It's a blog post rather than an article published in some larger publication. The author, Greg Cochran (whose paper on Ashkenazi intelligence was cited in the SSC Manhattan Project post), could perhaps have a bias causing him to underestimate the number of inventions or scientific discoveries coming out of the Muslim world. But, not being an expert, I'm not aware of any. People with more expertise are invited to provide counter-examples.

Expand full comment

I suspect we have a slider bar we can manipulate w.r.t. our sources of information and commentary. On one end of the slider bar, we have very polite and never offensive commentary, but we lose a lot of insights that involve upsetting or offensive ideas or facts or claims. On the other end of the slider bar, we have impolite and often offensive commentary, but also more accurate information and thus better decisions.

In this particular case, the decision we made as a country led to something like a hundred thousand of people dying, a million or so displaced, and a several-year civil war along religious and ethnic lines that I imagine will leave its mark on Iraqi politics for decades to come. We also spent billions of dollars and got several thousand of our people killed, and tens of thousands of our people maimed.

I'm thinking we would have done a lot better to push that slider bar *all the way* toward impolite and offensive and right.

Expand full comment

In contrast with TGGP's Cochran example—which relies on technical expertise vis-a-vis nuclear weapons pathways—here's another blog post on how to outperform the mass consensus, from a very different angle.

https://blog.danieldavies.com/2004_05_23_d-squareddigest_archive.html

Expand full comment

I worry about over-reliance on prediction as a metric for assessing the quality of journalism. Even apart from the methodological challenges discussed in the post, I think that success in particularized factual forecasting—while an important skill—is often unrelated to the value of a piece of political writing or analysis. Few of the ideas I find most valuable are susceptible to short-term, isolated verification of falsification. Synthesizing disparate facts into an interesting theory, etc., can rarely be tested by a “wagerable” near-term market result, and I am skeptical that facility in the one is particularly relevant to the usefulness of the other.

Expand full comment

Perhaps prediction isn't the best metric, or doesn't capture everything, but I'm not sure that producing "an interesting theory" is any better. Maybe I'm just taking you too literally, and if so, oops, my bad, but, what is the value in such a theory? I see two possible ways it could be valuable : being informative, and being entertaining. Entertainment is all well and good, but I think it best to make the distinction clear. The value of something as being informative, is that it gives us an idea that might be true, and the value comes from the chance that it might be true. There may be truths that one with skill can glean from reading a situation, and which don't cleanly translate into a prediction about an outcome that can be uncontroversially measured/evaluated in the near future.

But, the beliefs should be useful for taking action in some way, right?

If the beliefs led to better choices, shouldn't it be possible to, like, look at how they influenced the decisions, and therefore---

well, I suppose one doesn't have access to what the outcomes would have been if one had taken other choices.

But, one can somewhat evaluate how well things turned out at least, and make some attempt at tracing how things turned out to what predictions were used.

Expand full comment

In my view, theories like that typically have actionable consequences and should lead to better choices. Unfortunately, I am skeptical that those empirical consequences are subject to definitive testing by using a simple declarative sentence that will be objectively provable with a period of several years.

Expand full comment

For factual reporting, what I'd like to see is something quite different. How much confidence does the newspaper have that the facts of the case as reported in this story will not be substantially revised in, say, five years, or with an independent investigation. How much are you willing to bet? (One thing I've seen NPR do right here is, when there's a breaking story involving some tragedy like a terrorist attack or mass shooting or other disaster, they'll have a disclaimer up front saying something like "This is early reporting, and it's very much subject to change as we get better data in.")

The best way to implement that would be to actually have someone honestly revisit the stories five years later and try to work out what really happened.

Expand full comment

MSNBC does this routinely as well (and, yes, I'm talking about their evening shows AND daytime programming). And they routinely correct their factual mistakes. Still, they are not PERCEIVED by "the heat map" as being careful or beholden to the facts. They are perceived as a mirror image of Fox News (which is neither careful nor beholden to the facts -- until their lawyers start fretting about lawsuits from manufacturers of voting equipment!).

What MSNBC is "guilty" of, however, is what everyone else is guilty of -- even NPR and the NYT -- and that is choosing which set of particular facts to talk about. What they (along with punditry in general) offer is context (by choosing from all the gazillions of things they could could be devoting their limited time and space to).

Fox News viewers are culturally oriented NOT to balk when their preferred network spends all day ludicrously diverting attention from, say, a major stimulus bill on the brink of becoming law, with heated nonstop "controversy" over Dr Seuss. Meanwhile, MSNBC's viewers are, indeed, outraged over the latest attacks on voting rights -- but these attacks are ACTUAL, documented, large, and current. Compare that to the Fox/GOP heatmap strategy of ginning up outrage so they can then plausibly claim to be addressing widespread "concerns" -- even when such concerns did not arise from facts. (In the case of voting rights, the manufactured perception crisis over "election security" was born from only one distantly related fact: simply that the GOP lost the last election).

So I too am having a hard time wrapping my brain around how this prediction-model project is helpful for "punditry" (since today's pundit's are only nominally concerned about predictions anymore anyway). Like with a broker who advertises a 90% success rate, the wise consumer knows to evaluate the context of that success (like how tiny were all those wins they advertise?)

Likewise, the job of a self-respecting pundit is to provide well-thought-out context (e.g., to convince readers that they ought to be paying more attention to this than that). It is an elite leadership role, let's face it -- not to be confused with pandering to heat-map driven populism! (Eugene Robinson vs Tucker Carlson.)

If the project is a mere toy (when applied to politics, news, and punditry, as opposed to financial markets), then fine. But if the gamers are actually lobbying to replace elite human wisdom then I do worry that, by default, they will wind up supporting the anti-democratic populist manipulation of both facts and context, which has been screwing with the health of our body politic since we abandoned, say, the Fairness Doctrine. Maybe some of the smartest people (e.g. here) are giving short shrift to the unlimited appetites of the wicked (e.g. Roger Stone).

Expand full comment

Related to Decius's objection below (and alluded to in the post): over the long term, properly functioning prediction markets should always beat any individual pundit. The easy way to see this is: if the prediction market isn't beating the pundit, then somebody can make a ton of money just by doing whatever the pundit says. Now the prediction market does as well as the pundit.

More generally, pundits (and every other part of the "discourse") seem like an important component in the machine that makes prediction markets work. This is good, but it makes "judge pundits by how they stack up against the prediction market" sort of circular. It's not clear that any direct comparison will be stable in the long run. (Is this another instance of Goodhart's law?)

But, I think we can quantify the value of pundits even when they don't beat the market. As a very coarse measure: does a pundit writing about topic X cause the market to shift? If every time Yglesias writes a post, the relevant prediction markets suddenly change by 5 points, that means that Yglesias's posts have some serious information content. That's valuable even if Matt couldn't correctly predict whether or not the sun would rise tomorrow.

Expand full comment

>The easy way to see this is: if the prediction market isn't beating the pundit, then somebody can make a ton of money just by doing whatever the pundit says.

Given an arbitrarily large number of pundits, though, won't there always be a large number who have beaten the market up to now by chance, and won't do so in the future? So, if you pick a pundit that's beaten the market in the past, you may still lose money following them in the future.

It seems like that would be a factor against people blindly following a pundit rather than following their own broader research, which creates an environment where a pundit *could* continue beating the market long-term.

Expand full comment

Ah, yeah, I don't think I really phrased the argument right. Maybe the better argument is: if you think you have a way to reliably identify pundits, the prediction market should use that information, causing the prediction market to do at least as well as you can do with that method. This isn't usually a problem --- after all, the prediction market is /supposed/ to do just about as well as possible --- but it means that you can't demand "pundit beats prediction market" as a measure of quality. If that was a reliable measure of quality, then the prediction market would be exploiting it to make it no longer true.

... I'm pretty sure I'm doing a terrible job of explaining this. In the scenario you mention, its true that a pundit can be beating the market long-term, but only because his/her continued success isn't obvious (after all, it could be a fluke). So the prediction market can't use the expectation that the pundit continues to succeed, and therefore the pundit can continue to beat the prediction market. But that also means that /we/ can't tell that "pundit beats prediction market" is anything other than a fluke. So it's true that pundits can in principle reliably do better than the market, but only in situations that, from the outside, look exactly like statistical flukes.

Expand full comment

It's not uncommon for pundits to become famous for being right about one thing, then getting hired to opinionate in general, at which point they tend to regress toward the mean.

Expand full comment

Warren Buffett got rich off of Benjamin Graham's theory of "Value Investing." But eventually he got so rich and he so articulately extolled Graham that everybody else who was anybody read Graham's book too.

So then Buffett had to move on to other ways to beat the market, such as methods that only work if you are already rich, such as by lending $10 billion to Goldman Sachs at a solid interest rate in the week of the 2008 meltdown.

Expand full comment

"There would already be good prediction markets in which ones will or won't pan out. There would be a few teams, people, and companies who are known for being great at trading in them, and who have expertise in knowing which people are real experts who should be consulted."

Assuming this occurs, these people's careers would be made or broken by the quality of their predictions. Doesn't this result in a scenario where there's a perverse incentive to influence any outcomes they predict?

For example if a pundit predicts an antidepressant will not be approved, then raises their concerns about problems identified by studies on the drug with friends involved in the FDA approval process (or the public to generate pressure) to make it more likely to be denied?

It seems this would be the easiest way to "beat the market" but would generally have negative real-world consequences.

Expand full comment

sounds sorta like the prediction market version of "insider trading". Except backwards, actually. In insider trading, the issue is that the trader has special inside knowledge about a company and he bets on that. Here, the prediction comes first and then the pundit uses their special position as an insider to try to make it come true. But my point is it seems like the same sort of arbitrage as insider trading and that we'd probably want it to be illegal the same way insider trading is.

Expand full comment

That was a problem with Cantor Fitzgerald's attempt to turn their "Hollywood Stock Exchange" website in which hobbyists attempt to predict movies' box office revenues into a real money futures market. Much of the value the non-monetary version has is do to insiders playing it (e.g., pool guys overhearing movie moguls talking about the dailys). But would the government really want to crack down on inside information in Hollywood? To the extent that good movies get made anymore, much of the reason is because everybody is constantly exchanging inside information (this famous director is now zonked on opioids, this child star has made a big leap forward, you really need to read this new screenwriter).

Expand full comment

Wait, so there's a 20-year fascination with prediction markets because it took a while to sink in on everyone that a bunch of people, for their own various and incompatible reasons, lied the country into a war? That, of all things, the lesson out of this - "humans hurt us once, so let's build economic robots and trust those instead"?

Expand full comment

Humans hurt us a lot more often than just that. Hence all the effort put into self-driving cars!

Expand full comment

Built by humans!

Expand full comment

Prediction markets, corporations, Soylent Green... they're all made out of people!

Expand full comment

I think this aspect of the history is a bit mistaken. It can feel this way to people of my generation (including Matt Y and Scott, who are both a little younger than me), because the Iraq War was the thing pundits were constantly bloviating about when we came of age.

The idea of prediction markets is a synthesis of a bunch of things from economics, computer science, philosophy, and psychology, all developing over the past century, and coming together at this period.

Expand full comment

Isn't there value in punditry that makes predications with rational explanations? Even if they can't be validated? Or is it the rationalizing itself that would be more useful with validation?

Expand full comment

I don't know if you're saying Yglesias' posts are paywalled just to warn readers or because you genuinely can't read them yourself. If it's the latter ... come on, Substack! Obviously Substack should give all its authors free subscriptions to each other, so they'll link to each other and do free cross promotion.

Just buy subscriptions for them if you didn't think to set it up ahead of time. Maybe restrict it to authors with X+ followers if you're worried about people making their own substacks just to read for free and not posting anything. Or maybe don't? And let the lure of free subscriptions entice people to overcome the setup hurdle, after which they'll feel compelled to post and promote.

Expand full comment

I think we should distinguish between pundits who make good predictions and pundits who make accurate statements about the world as it is (in part based on what they are pontificating about). Other than punishing people who predict 12 out of the last 10 recessions and the like (speculating on the impact of bills), most of the issues with punditry is people making factually incorrect or bullshit claims (using bullshit to mean that there isn't even a desire to care about truth). In the case of the Iraq war there are lots of predictions that would have been good to make (if we invade, what is the chance Iraq is a democracy within X years) but I don't think those would be as valuable as holding people accountable for spreading obvious lies like links between Al Qaeda and Saddam.

Expand full comment

On January 13, 2003 I introduced a fairly novel argument for why a U.S. invasion of Iraq wouldn't convert Iraq into a post-WWII West German-style successful democracy: the level of cousin marriage in Iraq was vastly too high for Western political arrangements to work well:

https://www.theamericanconservative.com/articles/cousin-marriage-conundrum/

I'd rather get one big prediction right for an important reason that almost no Americans had ever thought about before than get 10 small predictions right because I'd read up on them on Wikipedia or whatever.

Expand full comment

Would "as succesful at democracy as Lebanon" have been a reasonable goal?

Expand full comment

Yes, please. I'd like a way to find people who post accurate information while it's relevant.

Better yet, a way to find people who consistently post accurate information while it's relevant that doesn't have after-the-fact notable omissions that would have caused the poster's conclusions/predictions to seem questionable at the time.

For me at least, this feels more urgent than trying to find accurate predictors.

As with prediction, it's important to correct for people who post lots of information, a small part of which happens to be accurate.

Expand full comment

I think my 'ideal world' fantasy is a little different from what Scott describes. I'd rather see a meter on the side of the screen next to his blog - one that looks like the audio output graphics on old stereos. Give it a column for politics, economics, psychiatry, etc. Then give me the overall grade (weighted more heavily for more recent/high-impact predictions) with a link to the specific predictions.

I like Scott's idea of placing the predictions on each article/post itself, but my concern is that nobody would ever go back and judge whether the predictions made actually panned out. I'd also be concerned that the response to this kind of accountability would be for pundits to stop making testable predictions, even as they continued to make sweeping overconfident statements. My sense is that given how difficult it is to truly make predictions based on current trends, most pundits would try to game the system by making low-impact, high probability predictions as Scott discussed to pad their numbers.

What's that heuristic about how when you turn a measurement into a quota it corrupts the measurement? Still, it would be nice to have a kind of FICO score for pundits. Then I wouldn't have to do the hard work of underwriting their predictions.

Expand full comment

As has been pointed out around here a lot, for the minority of cases for which there's a prediction market at all, they can be pretty crappy. Some coauthors and I are working on the problem of "this prediction market says X will happen with probability P; can we quantify the extent to which that's garbage?" Alternatively: if we have no time or energy to figure out if "the market" is being idiotic or who we might trust to be less idiotic, can we combine all the bid-ask spreads in the various markets and determine the one canonical number that we can agree is the Official Market Probability? Things like http://electionbettingodds.com/ try to do this but they're definitely doing it incorrectly, just averaging all the bid/asks across markets indiscriminately.

This is not my main research area these days and I don't expect to build a better prediction market aggregator/assessor any time soon but I'd love to discuss this problem with people who may be inspired to build such tools.

Expand full comment

Maybe it makes sense to work on identifying questions where the information is so low quality that there's no point in making investment predictions.

Expand full comment

FWIW, the stuff about "Apple silicon" is likely referring to how Apple is transitioning away from Intel processors to ones they make themselves: https://www.cnbc.com/2020/11/10/why-apple-is-breaking-a-15-year-partnership-with-intel-on-its-macs-.html

Expand full comment

Apple Silicon is Apple's in-house ARM chips that allows them to unify in-house design of mobile systems on a chip (which they already did) with non-mobile laptops and desktops, which they previously farmed out to Intel. So far, they have released a Mac Mini, Macbook Air, and 13-in Macbook Pro with the M1 chip. We know for sure they're eventually releasing every model they offer on an M-series chip, so this is just asking whether that will happen for certain remaining models this year.

They haven't publicly revealed any release timeline for the remaining models. As for why these models first, one of the reasons they decided to do this was to design systems with a much lower thermal draw than Intel chips, and these are the models that can most benefit (13-in pro fan being tiny and Mac Mini and Macbook Air have no active cooling at all).

Expand full comment

Thanks for the elaboration.

Expand full comment

In the Apple case, the prediction might have been made based on having contacts within Apple who were violating their confidentiality agreements and leaking trade secret information.

I'm unclear whether "predictor has off-the-record contacts willing to risk being fired in industry X" is a good predictor of that predictor being more accurate than chance in other areas.

OTOH, the prediction could have been made after Apple announced there would be some Apple Silicon Macs, at WWDC 2020 IIRC, or even after Apple announced and even began to ship e.g. an Apple Silicon MacBook Air. In that case it might well have been based on that information plus long experience of Apple's introductions of new and improved products. (Though that still doesn't make me expect the pundit to predict anything more accurately outside of tech, or maybe even outside of the Apple product ecosystem.)

Expand full comment

This has always existed for come classes of journalism. Jim Cramer's stock picks are all there for posterity to see forever. Sports journalism is the most obvious example. Every preseason, midseason, postseason, big playoff game, involves all of the various network and publication talking heads making public selections of just about everything you can bet on. In sports journalism, it's even pretty common for groups of pundits to keep a running tally each year of who was more right and who was more wrong, usually for no actual stakes, just bragging rights, but it's still a scored record. And they all stick to what they claim is their area of expertise.

So the model for something like this definitely exists.

Of course, as many others have pointed out, I don't think making maximally accurate predictions is really the purpose or value or most journalism. We'd do well with more detailed investigative work and pure research and less horse race crap.

Expand full comment

> The current interest in forecasting grew out of Iraq-War-era exasperation with the pundit class. <

Were they mis-predicting, or were they simply lying, aka going along to get along?

The powers-that-be in the US had declared that WMD existed, which appears to be sufficient reason for many people to make the same declaration - often deceiving themselves rather than consciously lying.

Expand full comment

Of course, we should score institutions too.

The attempts to replicate scientific papers is a good example.

Expand full comment

I feel like there’s a way to use machine learning to parse out all the predictions (you’ll need to distinguish between real predictions and sarcasm) someone has made over a period and then process a bunch of news stories and spit out an accuracy percentage.

I can imagine a lot of ways for that idea to fail but I am curious if anyone is trying it out

Expand full comment

The true Chad move is never to make predictions about specific events but always to make confident statements about the arc of history.

Expand full comment

My arc of history prediction is that arc welding keeps being used in fabrication for the next 100 years.

Expand full comment

Mine is that the world will be destroyed by a flood and terrestrial life will be saved by members of each species entering a ship made of gopher wood.

Expand full comment

Seems to me that predictions are only useful if you are going to be placing a bet (investing). If you don't have a vested interest in the outcome, predictions are pointless, and the questions you have listed are especially pointless to me. Are things going to get better for the majority of people in the USA? might be a good question, but I am pretty sure the answer is no. A better question is does anyone have any ideas on things might be improved? Of course good ideas are a dime a dozen, you have to be able to sell them, and I don't see anybody selling any good ideas in current climate.

Expand full comment

There is a lot of discussion of reasons that it is unfair to compare experts to prediction markets. I agree with this sentiment and I think that the solution is to hold experts to a standard of being well calibrated rather than a standard of trying to beat the market. There are several reasons I think this:

1. Trying to beat a well functioning market is essentially impossible.

2. You get penalized for admitting you don't know something, when in reality we should be rewarding experts for this behavior.

3. If someone makes a very confident prediction, knowing that their very confident predictions almost always comes true (i.e. they are well calibrated) is important. Knowing that they are generally able to be as confident as the market is less important. Beating the market requires being well calibrated AND being very confident. Being more confident makes your predictions more useful in general, but it is irrelevant to evaluating any particular prediction.

There is a caveat with this approach, that being that you should only trust a prediction if its predictor is well calibrated in similar predictions, similar in this case means similar in terms of confidence, topic, and difficulty. Assuming that this condition it met, whether or not a predictor beats the market consistently should not be relevant.

Expand full comment

22. United States rejoins JCPOA and Iran resumes compliance (80%)

Resume compliance? They didn't comply from the start. German intelligence said after the deal was signed that Iran was secretly seeking nuclear and missile material from German companies. And in 2018 Israel released a cache of Iranian files that showed they had a clandestine nuclear program.

Expand full comment

You should have been working for the Trump Administration. Rumor would have it that Trump was rather upset that nobody could find a suitable pretext to withdraw from the JCPOA. He should have just asked you.

Meanwhile, we heard similar fairy tales when the United States was seeking a pretext to attack Iraq. Let me know when you find those Iraqi WMDs.

Expand full comment

Ha! "A suitable pretext"? Since when did trump ever demand his pretexts be suitable or sensible? "A lot of people are talking about it!" was always plenty good for him.

Expand full comment

Okay, how about you show us your source(s) stating that Iranian activities constituted a significant breach, and then offer a reason-based argument for why your source should be more valued than the majority sources.

Statistically speaking (from my unscientific sampling) you were enamored of the Bolton crowd who was (irrationally) predicting the agreement would "hand" Iran a nuke. So your confirmation biases when analyzing current sources are hopelessly hardwired. And statistically speaking you are similarly nursing old biases against, say, climate change, and have a ready "expert" to cite all the compelling reasons we still should not be fighting it.

Expand full comment

"In my ideal world, it's silly for random psychiatrists to be speculating on psychiatry papers. There would already be good prediction markets in which ones will or won't pan out. There would be a few teams, people, and companies who are known for being great at trading in them, and who have expertise in knowing which people are real experts who should be consulted."

I don't know what the psychiatry world is like, but from my understanding of academia, adding a metric like this would cause a lot of moral hazard/insider trading/hostile short selling and other related problems. Maybe it's better in a field where you can expect empirical validation on an irrefutable scale in a few years, if a drug gets approved for widespread use.

Expand full comment

I would say there's a fairly good rational argument that an accurate human pundit in the sense of this essay cannot exist. It goes like this:

1. Assume arguendo a pundit exists who could make predictions about the future that were noticeably more accurate than those anyone else can make.

2. Either he does it in a way that depends on some unique quality he has (e.g. he's Jesus Christ, or speaks with Him) or he does it in a way that anyone, or at least some moderate number of people, can duplicate with appropriate intelligence, training, meditation, medication, et cetera.

3. If the latter, then given the high value of accurate prediction and anything approaching an efficient market, he cannot now be unique -- many people will have already learned to do this, long ago, and so he will not stand out. (After all, we can all make accurate predictions of which way an apple falls if we let it go, so that kind of prediction does not stand out.) But a pundit that is no more accurate than any intelligent informed random person contradicts our definitions and need not be considered further. We also have no need to search out such a person, because he won't have any secrets worth the learning.

4. If the former, however, his methods will be indistinguishable from luck or divine intervention to the outsider, because by definition he's doing it in a way nobody else can learn. Which is to say, there is no way of evaluating the *method* to see if it makes sense, is plausible and believable -- all we can do is evaluate the outcomes.

5. He will either have a perfect record or not on the predictions of interest to us, which are "high value" predictions where the payoff is large if we follow the prediction instead of our prior inclination. These are by definition Black Swan or near-Black Swan events. (Predictions that the Sun will rise tomorrow are generally useless, as are most predictions that merely consist of extrapolation of current trends.) If he does *not* have a perfect record, since we cannot evaluate his methods (see 4), we will not be able to distinguish what he does from lucky guesses. On any *given* prediction we'll have no way of knowing whether this is where he's right or one of those times where he's wrong. If his predictions were trivial, that wouldn't matter much, but nobody will make decisions on Black Swan events (heavy payoff if you win, heavy loss if you lose) if the predictions aren't close to foolproof. So he would be more of a Cassandra than an Oracle, someone on whom people were afraid to bet, but who could say "I told you so" a lot, presuming anyone listened (nobody liked Cassandra after all).

6. If he has a perfect record on Black Swan events, we can rely on his predictions. But this is a very strange human being indeed, one who can make perfect predictions of future Black Swan events by means completely incomprehensible to the rest of us. I would need to have the existence of such a super-being demonstrated before I was willing to entertain the notion that such a being could exist and be a member of our species.

Expand full comment

I've lived long enough to be extremely skeptical of my ability to predict the future. There are just too many factors.

Instead, I try to first notice and then to mention trends that are already happening. For example, since June 2020, I've been hollering about how the media-declared Racial Reckoning is driving up the murder rate.

As Orwell said, "To see what is in front of one’s nose needs a constant struggle."

Expand full comment

Predicting the present is very hard indeed.

Expand full comment

How do you distinguish between:

(1) covid/lockdown effects, including on mental health

(2) effects of police demoralization, specifically those not directly related to racial emphasis

(3) effects of overall US political polarization (if someone believes Dems/Reps are truly evil, why not preemptively kill a few? especially if that person is already a brick or two short of a load, in relevant ways)

Expand full comment

[Can't edit; accidentally posted when trying to insert a line break]

(4) Actual effects of this "racial reckoning"

(5) Other random racial effects - e.g. the random wacko that an Indian friend of mine 2 or 3 years ago called the cops on, after the wacko persisted on insisting that her half-white child couldn't possibly be hers, made a giant scene, and (I think) tried to "rescue" the toddler from my friend and her parents (the child's maternal grandparents).

Expand full comment

Noting also that most/all of these potential causes also have racially correlated effects, or are likely to do so.

Expand full comment

(6) Budget cuts to either police or social services, some of them secondary to either covid or "defund the police", but some not

Expand full comment

A pretty good guess here is that police across the US have shifted to a less confrontational, less aggressive stance towards criminals, based on the belief that the community, the department, their union, the courts, etc., were going to hammer them for being too confrontational/aggressive. A knock-on effect of this is that crime becomes easier to get away with.

It's like having huge life-wrecking malpractice suits become a thing--doctors change their behavior in ways that can have a really big impact. Stuff like ordering a lot of marginal tests, or sending patients off to a specialist for marginal concerns. Because they want to cover their asses.

A highly public thing like the 2020 racial reckoning is a great way to coordinate thousands of individuals' decisions in the same direction.

Expand full comment

Just want to say that Bob Cringley is a journalist who has been scoring his predictions for over 20 years at this point.

Expand full comment

Business pundits like Bob Cringely, who I used to read in InfoWorld in the 1980s (although I'm not sure that's the same Bob Cringely), have good reason to tract their predictions, since most of them are relevant to short term decisions that their readers might make. The implicit slogan for business columnists is: Read me to make more money.

That's an honorable goal.

Lots of other pundits have other goals, however.

Expand full comment

It would be informative to go through Philip Tetlock's annual lists of question for his famous forecasting contest and list the really big events that happened during the year in question that we so unanticipated that no questions refer to them. For example, as I noted in my 2016 review of Tetlock's airport book, nobody was asked to predict whether a European leader in 2015 would suddenly invite a million Muslims in the way Angela Merkel did more or less on a whim in the late summer of 2015.

Jean Raspail more or less predicted it in his 1973 dystopian novel "The Camp of the Saints," but even he would have been wrong for each of the first 41 years.

Similarly, did Tetlock ask anybody to predict the Great Awokening or the Transgender Boom or Trumpism or the Racial Reckoning?

Conversely, one of these years, the biggest event of the year in the US will be the Great California Earthquake of 20XX. How much credit should a pundit get for accurately predicting that to the year?

Expand full comment

Re: As far as I know, the first official journalists to do something like this...

There was an attempt bTetlock et al to do something like this at pundits, rather than with their explicit consent. You can see the original proposal here: https://www.openphilanthropy.org/files/Grants/Tetlock/Revolutionizing_the_interviewing_of_alpha-pundits_nov_10_2015.pdf, a related OpenPhil grant here: https://www.openphilanthropy.org/giving/grants/university-pennsylvania-philip-tetlock-forecasting#Goals_and_expectations_for_this_grant. I also got confirmation that nothing came of it, but people come up with similar proposals every now and again.

Expand full comment

Have you considered partnering with a prediction market firm so that all your predictions get automatically added as markets? You'd give them a lot of free advertising and customers, we'd get a functioning prediction market on your stuff.

Expand full comment

Also, why can't we like comments in this thread?

Expand full comment

Because we/Scott explicitly asked Substack to remove the hearts feature for ACX.

Expand full comment

To clarify a bit further: There was a lot of concern that having an upvote system would result in fewer unusual/unpopular comments and more comments just saying things people want to hear (or that the former would be buried beneath the latter if the sorting system took the number of upvotes into account).

Expand full comment

Is there gonna be any attempt to test this, because I bet (60%) it's wrong. Generally the top reddit comments are the best, and if you want to read them all you can just scroll down.

Hope you're well.

Expand full comment

> Is there gonna be any attempt to test this

Not that I know of.

I think I'm also around 60-70% that in the long term some kind of up/downvote system would be beneficial or at least non-harmful, but a 40% chance is still pretty high and the downside of getting this wrong are much higher in one direction than they are in the other.

If we don't have an upvote/downvote system you don't get the ability to sort by popular comments (plus some satisfaction of being able to up/downvote comments you like/dislike), which would be mildly inconvenient. If we do use such a system you run the risk that it breaks that special weird magic that made the SSC comments a place where you can have civil and intelligent discussion with people from wildly different ideologies and convictions without things devolving into flamewars (and it's not clear that you'd be able to get it back by disabling the upvote system again), which would be a disaster.

For what it's worth, I feel the /r/slatestarcodex subreddit and even the sister subreddit /r/themotte tend to have less (intelligent) unusual/non-mainstream comments than the comments here.

Expand full comment

Well said, re your risk/reward analysis.

While I'm not in a position to opine on the "special weird magic" you describe, I do appreciate the devaluation of populism here in the form of ratings. I support it both in principle (for assorted reasons), and also as a mechanism for mitigating some of the social pitfalls that seem to plague typical online communities.

It also occurs to me that Scott may one day find himself questioning whether he's paying a monetary tax for his stance if evidence mounts that it discourages raw numbers. (But, as you suggest with your "magic" comment, he'd also want to weigh quality against quantity.)

Expand full comment

Maybe they could copy the Slashdot system, where you can choose from a set of reasons for upvoting a comment?

<a href="https://www.explainxkcd.com/wiki/index.php/301:_Limerick>I used to find Slashdot delightful,

But my feelings of late are more spiteful;

My comments sarcastic

The iconoclastic

Keep modding to +5 (Insighful).</a>

Expand full comment

@Nathan Young

Good comments being upvoted in the short term doesn't rebut the claim that voting causes a deterioration in the comments in the long term.

Expand full comment

It's a debate that you can follow in each of the open threads, including the latetst.

If Substack were to stop *emailing* people about hearts, then the minority[1] who wants hearts could just use the old JavaScript or a custom UI to interact with the heart system, leaving the rest of the community happily oblivious to them.

At this point, though, Scott might feel like he was annoying Substack to ask for that, when they already did so much work[2] to get them out of the UI. "We already solved the problem once, why are you asking us to solve it again?"

[1] I suspect it's about 10-20% of people who want them. Just a gut instinct. But this is actually good news for the hearters, because the non-hearters' fear of what gets posted and upvoted kind of goes away if most people don't interact with it.

[2] Maybe it shouldn't be so much work, but looking at the code, they seem to have taken a more difficult path than they needed to.

Expand full comment

In response to having meaningful Brier scores...

A Brier score needs a benchmark to judge skill, as others have pointed out. How to achieve this?

ACX could post forecasts onto an app like Maby.app (it is free) and then readers could forecast as well. This provides (a) a baseline against which to judge forecast accuracy, (b) everyone gets feedback in the form of a calibration curve, (c) forecasts of readers are hidden from each other until the question is closed - so it is harder for anyone to piggy-back on the work of others.

Keep the questions open for a couple days then close them - so everyone effectively forecasts at the same time.

Expand full comment

“The process of globalization, powerful as it is, could be substantially slowed or even stopped. Short of a major global conflict, which we regard as improbable, another large-scale development that we believe could stop globalization would be a pandemic…”

That is probably the most chillingly prescient passage from Mapping the Global Future, a report written 16 years ago by experts working for the U.S. National Intelligence Council, describing coming developments in geopolitics, culture, technology, and the economy out to 2020. With the year in question having arrived, I thought it was worthwhile to review the accuracy of it’s predictions, and overall, I was impressed. Mapping the Global Future correctly identified most of the megatrends that shaped the world from 2004-20, (though it was somewhat less accurate forecasting the degrees to which those factors would change things):

https://www.militantfuturist.com/mapping-the-global-future-2020/

Expand full comment

It's not just that the Iraq War pundits were wrong - they were catastrophically wrong, wrong in ways that caused men (Julius Streicher, and, to a lesser extent, Alfred Rosenberg) to hang at Nuremberg, and they suffered not the slightest, not personally or professionally.

Meanwhile, the naysayers were cast into Outer Darkness and have remained there ever since, even though the War On Iraq went worse than the most pessimistic predictions would have it.

Expand full comment

Care to explain that comparison with Julius Streicher ("the founder and publisher of the virulently antisemitic newspaper Der Stürmer, which became a central element of the Nazi propaganda machine")? The war killed a lot of civilians, but there was no hate propaganda against an ethnic group?

Expand full comment

If I understand your question correctly, The Nuremberg Court found Streicher guilty of crimes against humanity.

The term "crime against humanity" could justly be applied to the War on Iraq and those who argued for it. That our wars of aggression weren't strictly racially motivated doesn't mean that few people suffered or that the victims suffered any less as a result.

Expand full comment

Couldn't this "prediction difficulty calibration problem" be solved with an Item Response Theory approach? (The statistical method used in computer adaptive testing). Use the mass of data from metaculus to calibrate item difficulty levels via IRT, then compute skill-level of pundits based on the items they answer.

Or maybe adapt something like the Elo system for chess ratings? (Which is closely related mathematically to IRT). That way you can progressively be estimating the skill of people across time.

The challenge w these approaches is that ppl can only answer items one time and then it's obsolete, which means you need tons of ppl answering each item to establish good item difficulties... but don't you have that w platforms like Metaculus?

You could even incorporate the amount ppl bet to set item difficulty/skill levels, the way that Klinkenberg et al. 2011 does for reaction time in computer adaptive practice (See "High Stakes High Gain Scoring Rule") https://doi.org/10.1016/j.compedu.2011.02.003

Expand full comment

I'm confused by what you want out of prediction markets and I think you might be too. As far as I can tell, the argument goes "we want prediction markets, so that we can tell which pundits are good, so that when the good pundits make predictions, then we can trust them." But if we have prediction markets, we don't need the pundits! You don't need to know who the experts actually are if you know you have a market that's already priced in all of their wisdom.

This still leaves open the case of how to trust in the absence of prediction markets. I believe that the answer is the same for what we do for predictions in science: Kolmogorov complexity. This has two uses against this problem. First, it provides a way to formally(ish) define cheating at making predictions. And second, it provides a rigorous method of discounting the credit of cheaters post facto.

Consider a Pundit who predicts 1000 copies of "The sun will rise in the East." They might publish a book of predictions that looks like:

The sun will rise in the East on 2022-01-01.

The sun will rise in the East on 2022-01-02.

The sun will rise in the East on 2022-01-03.

...

The sun will rise in the East on 2024-09-27.

Then, when all those predictions come true, they claim 1000 points worth of credit on September 27th 2024. The intuition that I wish to formalize is that you could replace this pundit's book with the following program.

from datetime import *

for i in range(1000):

print("The sun will rise in the East on " + date(2022, 1, 1) + timedelta(days=i) + ".")

This program is 3 lines long, so the pundit should only get credit for 3(ish) predictions. This program contains some overhead, it would contain quite a bit more if I wasn't allowed to just import how calendars work. In the limit of a large number of predictions this overhead isn't important, so we should give additional credit to pundits who offer a larger number of predictions. Importantly, this discounting of a prediction book can happen post facto. The minimum entropy it takes to produce the book doesn't change when we know the answers.

The problem with this scheme is that you can never know the true Kolmogorov complexity of a thing, you can only know the lower bound set by the cleverest adversary. This isn't particularly good, but it does enable me to look at a pundit's win/loss record and verify to myself that I find it difficult to compress their selected predictions. It would work even better if this opened up a market for "adversarial pundits" who would publish their best efforts at compressing others prediction sets.

Expand full comment

I am British, and I use "brackets" and "parentheses" interchangeably, favouring brackets: () are brackets or round brackets, [] are square brackets, {} are curly brackets or curly braces. I was confused by "Yglesias' numbers are bold and in parentheses. Metaculus' numbers are in brackets (not all questions are on Metaculus)." - had Yglesias' numbers not also been bold, I'd not have been able to follow the listing below at all.

Expand full comment

I am American, and Wikipedia helped me understand the source of confusion here! https://simple.wikipedia.org/wiki/Bracket

That said, and I'm not trying to argue with you, but in your language doesn't "parentheses" exclusively refer to ()? If so, wouldn't process of elimination tell you that "brackets" referred to []?

Expand full comment

I mean, essentially that's what happened (also I'm not unaware that there are other more popular dialects of English that do make the distinction). I don't think it would sound strange to say "square parens" or "curly parens" though.

This isn't the first time I've been confused by American English on this blog - see e.g. https://astralcodexten.substack.com/p/ontology-of-psychiatric-conditions-34e in which the phrase "gas lines" is used to mean "queues for petrol" rather than "pipes for natural gas", which had me confused for a whole paragraph before I picked up the meaning from context. It's not a massive deal, but I think were I a blogger I'd like feedback on that kind of thing. You know, so I could ignore it while feeling smug about how I speak the One True Dialect.

Expand full comment

Fair enough! Haha, I love the gas lines example.

Expand full comment

I really enjoy reading Matt's thoughts, they're often thought provoking and carefully argued, even when I disagree.

Today he wrote contra meritocracy, and I was left a bit unsatisfied though, mainly for reasons Scott has well covered.

I'd love to weigh in on that article, but cannot comment without paying Matt money. It feels like perverse incentives relevant to Scott's post though. Matt is bright. But Cunningham's Law would mean it will be in his best interest to become a slightly worse pundit, in order to get more subscribers to pay for the opportunity to correct him.

Faced with that dilemma, I subscribed to ACX and referenced his post here. Maybe I'll just discuss it on Reddit or Discord.

In the post, Matt proposes (perhaps accidentally) a really fascinating way to measure how much people value meritocracy over partisanship, and I think it's well worth discussing... but maybe not there, nor here. In the next even numbered open thread maybe.

Expand full comment

I still have yet to see a coherent argument for prediction markets actually leading to useful policy recommendations. Prediction markets put fairly long odds on a COVID vaccine being approved before the end of 2020 (source: https://fortune.com/2020/07/15/coronavirus-vaccine-this-year-prediction-markets-coronavirus/), a prediction which a) turned out to be quite wrong and b) could've lead to some fairly substantial policy failures if policymakers had acted on those beliefs.

COVID should've been a golden opportunity for prediction markets to make useful long-term predictions that could've lead to useful policy, but I have yet to see any kind of impressive results.

Expand full comment

Possible approach to comparing punditry success:

Pool all questions from all pundits. If a pundit didn't answer one, assume that's because they felt too ignorant to; so pretend they did answer it, just with an ignorance prior (/maximum entropy prior), e.g. 50% if it was a yes-no question. Then idk take the average.

This rewards making well-calibrated predictions, and punishes not making them when you could have!

(Drawback: sometimes it's not obvious what the ignorance priors are, e.g. especially for continuous random variables [the prior density functions for x and for x^2 can't *both* be flat].)

Expand full comment

I think you’d enjoy Forecast, a community for crowdsourced predictions. Download the app here: https://apps.apple.com/us/app/id1509378877

Use referral code: BRANNO5902 to get an additional 1000 forecasting points when you sign up.

I’m sure I’m late to this here but I find it fun to use

Expand full comment

I was under the impression that log scores couldn't be positive.

Expand full comment

"Some of the disagreements might come from Yglesias making his predictions in late December and Metaculus opening theirs in February, which is kind of unfair to Matt." I work for Metaculus and wanted to let this discussion section know that we didn't want to be unfair to Matt Yglesias, and so gave him an opportunity to amend any predictions from his original Substack post before we put them into our question series.

Expand full comment