The funniest thing about those arguments you're rebutting is that the average of a large number of past 0-or-1 events is only an estimate of the probability of drawing a 1. In other words, the probabilities they're saying are the only ones that exist, are unknowable!

That's right. Even in a simple case of drawing balls from a jar where you don't know how many balls of each color there are, the last ball has a 100% 'real' probability of being a specific color, you just don't know which one.

I'd say that all probabilities are about "how to measure a lack of information". If you have a fair coin it's easy to pretend that probability is objective because any people present have the exactly same information. But as long as there are information differences then different people will give different probabilities. And it's not the case that some of them are wrong. They just lack different information. But for some reason people expect that if you're giving a number instead of a feeling then you are claiming objectivity. They're just being silly. There are no objective probabilities outside of quantum mechanics and maybe not even there. It's all lack of information, it's just that there are some scenarios where it's easy for everyone to have exactly the same lack.

Okay, after years of this I think I have a better handle on what's going on. It's reasonable to pull probabilities because you can obviously perform an operation where something obviously isn't 1% and obviously isn't 99%, so then you're just 'arguing price.' On the other hand, it's reasonable for people to call this out as secretly rolling in reference class stuff and *not* having done the requisite moves to do more complex reasoning about probabilities, namely, defining the set of counterfactuals you are reasoning about and their assumptions and performing the costly cognitive operations of reasoning through those counterfactuals (what else would be different, what would be expect to see?). When people call BS on those not showing their work, they are being justly suspicious of summary statistics.

The thing is, if they’re wanting to call out those things, they should do that, rather than attacking the concept of probability wholesale.

Like, there are good reasons to not invest in Ponzi schemes, but “but money is just a made-up concept” is one of the least relevant possible objections.

A legend about accuracy versus precision that you may have heard but that I think is applicable:

As the story* goes: When cartographers went to measure the height of Mount Everest, they had a hard time because the top is eternally snow-capped, making it nigh-impossible to get an exact reading. They decided to take several different readings at different times of year and average them out.

The mean of all their measurements turned out to be exactly 29,000 feet.

There was concern that such a number wouldn’t be taken seriously. People would wonder what digit they rounded to. A number like that heavily implies imprecision. The measures might explain and justify in person, but somewhere down the line that figure ends up in a textbook (not to mention Guinness), stripped of context.

It’s a silly problem to have, but it isn’t a made-up one. Their concerns were arguably justified. We infuse a lot of meaning into a number, and sometimes telling the truth obfuscates reality.

I’ve heard more than one version of exactly how they arrived at 29,028 feet, instead—the official measurement for decades. One account says they took the median instead of the mean. Another says they just tacked on some arbitrary extra.

More recently, in 2020, Chinese and Nepali authorities officially established the height to be 29,031 feet, 8.5 inches. Do you trust them any more than the cartographers? I don’t.

All of which is to say, it makes sense that our skepticism is aroused when we encounter what looks like an imbalance of accuracy and precision. Maybe the percentage-giver owes us a little bit more in some cases.

* apocryphal maybe, but illustrative, and probably truth-adjacent at least

>More recently, in 2020, Chinese and Nepali authorities officially established the height to be 29,031 feet, 8.5 inches. Do you trust them any more than the cartographers? I don’t.

For your amusement: I just typed

mount everest rising

into Google, and got back

>Mt. Everest will continue to get taller along with the other summits in the Himalayas. Approximately 2.5 inches per year because of plate tectonics. Everest currently stands at 29,035 feet.Jul 10, 2023

> you’re just forcing them to to say unclear things like “well, it’s a little likely, but not super likely, but not . . . no! back up! More likely than that!”, and confusing everyone for no possible gain.

There's something more to that than meets the eye. When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many. 20%? 30%? Would anyone stake much on the difference?

That's why (in an alternate world where weird new systems were acceptable for twitter dialouge), writing probabilities in decimal binary makes more sense. 0b0.01 express 25% with no false implication that it's not 26%. Now, nobody will learn binary just for this, but if you read it from right to left, it says "not likely (0), but it might happen (1)." 0b0.0101 would be, "Not likely, but it might happen, but it's less likely than that might be taken to imply, but I can see no reason why it could not come to pass." That would be acceptable with a transition to writing normal decimal percentages after three binary digits, when the least significant figure fell below 1/10.

I like this idea. And you can add more zeroes on the end to convey additional precision! (I was going to initially write here about an alternate way of doing this that is even closer to the original suggestion, but then I realized it didn't have that property, so oh well. Of course one can always manually specificy precision, but...)

but isn't this against the theory? if we were mathematically perfect beings we'd have every probability to very high precision regardless of the "amount of evidence". before reading this post I was like hell yea probabilities rock, but now I'm confused by the precision thing.

I guess we might not know until we figure out the laws of resource-constrained optimal reasoning 😔

Unrelated nitpick, but we already know quite a lot about the laws of resource-constrained optimal reasoning, for example AIXI-tl, or logical induction. It's not the end-all solution for human thinking, but I don't think anyone is working on resource-constrained reasoning on the hope of making humans think like that, because humans are hardcodedly dumb about some things.

Is there a tl;dr; for what resource-constrained reasoning says about how many digits to transmit to describe a measurement with some roughly known uncertainty? My knee-jerk reaction is to think of the measurement as having maybe a gaussian with sort-of known width centered on a mean value, and that reporting more and more digits for the mean value is moving a gaussian with a rounded mean closer and closer to distribution known to the transmitter.

Is there a nice model for the cost of the error of the rounding, for rounding errors small compared to the uncertainty? I can imagine a bunch of plausible metrics, but don't know if there are generally accepted ones. I assume that the cost of the rounding error is going to go down exponentially with the number of digits of the mean transmitted, but, for reasonable metrics, is it linear in the rounding error? Quadratic? Something else?

If you think your belief is going to change up or down by 5% on further reflection, that's your precision estimate. Rounding error can be propagated through the normal techniques for error propagation (see any source on scientific calculation).

There is no rule or formula for the precision of resource-constrained reasoning, because you aren't guaranteed to order the terms in the process of deliberation from greatest to smallest. Instead, I use repeated experiments as my example of a belief you're expecting to change within known bounds in the future, to show why most probabilities have limited precision.

>Instead, I use repeated experiments as my example of a belief you're expecting to change within known bounds in the future, to show why most probabilities have limited precision.

Sure, that makes sense.

>Rounding error can be propagated through the normal techniques for error propagation (see any source on scientific calculation).

True. Basically propagating through the derivatives of whatever downstream calculation consumes the probability distribution estimate... For the case of a bet, I _think_ this comes down to an expected cost per bet (against an opponent who has precisely calibrated probabilities) that is the value of the bet times the difference between the rounded mean and the actual mean. Is that it?

If you are trying to figure out whether a coin is fair, the average number of heads per flip among a large number of experimental trials serves as your best estimate of its bias towards heads. Although you have that estimate to an infinite number of digits of precision, your estimate is guaranteed to change as soon as you flip another coin. That means the "infinite precision of belief," although you technically have it, is kind of pointless.

To put it another way, if you expect the exact probabilistic statement of your beliefs to change as expected but unpredictable-in-the-specifics new information comes in, such as further measurements in the presence of noise, there's no point in printing the estimate past the number of digits that you expect to stay the same.

Here's a way to interpret that "infinite precision of belief": if you bet according to different odds than precisely that estimator, you'll lose money on average. In that sense, the precision is useful however far you compute it, losing any of that precision will lose you some ability to make decisions.

Your conclusion about forgoing the precision that is guaranteed to be inexact is wrong. Consider this edge case: a coin will be flipped that is perfectly biased; it either always comes up head or always tail; you have no information about which one it is biased towards. The max-entropy guess in that situation is 1:1 head or tail, with no precision at all (your next guess is guaranteed to be either 1:0 or 0:1). Nonetheless, this guess still allows you to make bets on the current flip whereas you'd just refuse any bet if you followed your advice.

> losing any of that precision will lose you some ability to make decisions.

The amount of money you'd lose though opportunity cost in a betting game like that decreases exponentially with the number of digits of precision you're using. To quote one author whose opinions on the subject agree with mine,

"That means the 'infinite precision of belief,' although you technically have it, is kind of pointless."

;)

Compare this situation with the issue of reporting a length of a rod that you found to be 2.015mm, 2.051mm, and 2.068mm after three consecutive measurements. I personally would not write an average to four digits of precision.

I'm wondering how to interpret a range or distribution for a single future event probability. My P(heads) for a future flip of a fair coin, and for a coin with unknown fairness, would both be 0.5. In both cases I have complete uncertainty of the outcome. Any evidence favoring one outcome or the other would shift my probability towards 0 or 1. Even knowing all parameters of the random process that determines some binary outcome, shouldn't I just pick the expected value to maximize accuracy? In other words, what kind of uncertainty isn't already expressed by the probability?

It's epistemic vs aleatory uncertainty. The way the coin spins in mid air is aleatory i.e. "true random", while the way it's weighted is a fact that you theoretically could know, but you don't. The distribution should represent your epistemic uncertainty (state of knowledge) about the true likelihood of the coin coming up heads. You can improve on the epistemic part by learning more.

Sometimes it gets tough to define a clear line between the two - maybe Laplace's demon could tell you in advance which way the fair coin will go. But in many practical situations you can separate them into "things I, or somebody, might be able to learn" and "things that are so chaotic and unpredictable that they are best modeled as aleatory."

Epistemic vs aleatory is just fancy words for Bayesian vs frequentist, no? Frequentists only measure aleatory uncertainty, Bayesian probability allows for both aleatory and epistemic

Hmmm... frequentists certainly acknowledge epistemic uncertainty. I guess they're sometimes shy about quantifying it. But when you say p < 0.05, that's a statement about your epistemic uncertainty (if not quite as direct as giving a beta distribution).

It's the probability that you will encounter relevant new information.

You could read it as "My probability is 0.5, and if I did a lot of research I predict with 80% likelihood that my probability would still lie in the range 0.499--0.501," whereas for the coin you suspect to be weighted that range might be 0.9--0.1 instead.

Small error bars mean you predict that you've saturated your evidence, large error bars mean you predict that you could very reasonably change your estimate if you put more effort into it. With a coin that I have personally tested extensively, my probability is 0.5 and I would be *shocked* if I ever changed my mind, whereas if a magician says "this is my trick coin" my probability might be 0.5 but I'm pretty sure it won't be five minutes from now

Single future event probabilities, in the context of events that are unrelated to anything you can learn more about during the time before the event, are the cases where the meaning of "uncertain probability" is less clear. That is why rationalists, who prioritize thinking about AI apocalypses and the existence of God, will tell you that "uncertain probability" doesn't mean anything.

However in science, the expectation that your belief will change in the future is the rule, not the exception. You don't know which way it will change, but if you're aware of the precision of your experiments so far you'll be able to estimate by how much it's likely to change. That's what an "uncertain probability" is.

This is the language way to interpret probabilities, and so is correct. If you say you found half of people are Democrat, it means something different than 50.129% of people you found to be Democrat.

Yet it's subject to abuse, especially to those with less knowledge of statistics, math, or how to lie. If my study finds 16.67% of people to go to a party store on a Sunday, it's not obvious to everyone that my study likely had only six people in it.

There are at least three kinds of lies: lies, damned lies, and statistics.

A central issue when discussing significant digits is the sigmoidal behaviour, eg the difference between 1% and 2% is comparable to the difference between 98% and 99%, but NOT the same as the difference between 51% and 52%. So arguments about significant digits in [0, 1] probabilities are not well-founded. If you do a log transformation you can discuss significant digits in a sensible way.

What would I search for to get more information on that sigmoidal behavior as it applies to probabilities? I've noticed the issue myself, but don't know what to look for to find discussion of it. The Wikipedia page for 'Significant figures' doesn't (on a very quick read) touch on the topic.

This has been on my mind recently, especially when staring at tables of LLM benchmarks. The difference between 90% and 91% is significantly larger than 60% and 61%.

I've been mentally transforming probabilities into log(p/(1-p)), and just now noticed from the other comments that this actually has a name, "log odds". Swank.

Why are you adding this prefix “0b0” to the notation? If you want a prefix that indicates it’s not decimal, why not use something more transparent, like “bin” or even “binary” or “b2” (for “base 2”)?

That notation is pretty standard in programming languages. I do object to this being called "decimal binary" though. I'm not sure what exactly to call it, but not that. Maybe "binary fractions".

— Why not use 25% then? That's surely how everyone would actually mentally translate it and (think of/use) it: "ah, he means a 25% chance."

— Hold on, also I realize I'm not sure how ².01 implies precision any less than 25%: in either case, one could say "roughly" or else be interpreted as meaning "precisely this quantity."

— Per Scott's original post, 23% often *is*, in fact, just precise enough (i.e., is used in a way meaningfully distinct from 22% and 24%, such that either of those would be a less-useful datum).

— [ — Relatedly: Contra an aside in your post, one sigfig is /certainly/ NOT too many: 20% vs 30% — 1/3 vs 1/5 — is a distinction we can all intuitively grasp and all make use of IRL, surely...!]

— And hey, hold on a second time: no one uses "94.284" or whatever, anyway! This is solving a non-existent problem!

-------------------------

— Not that important, and perhaps I just misread you, but the English interpretation you give of ².0101 implies (to my mind) an event /less likely/ than ².01 — (from "not likely but maybe" to "not likely but maybe but more not likely than it seems even but technically possible") — but ².0101 ought actually be read as *more* sure, no? (25% vs 31.25%)

— ...Actually, I'm sorry my friend, it IS a neat idea but the more I think about those "English translations" you gave the more I hate them. I wouldn't know WTF someone was really getting at with either one of those, if not for the converted "oh he means 25%" floatin' around in my head...

> Per Scott's original post, 23% often *is*, in fact, just precise enough

I strongly object to your use of the term "often." I would accept "occasionally" or "in certain circumstances"

(Funnily enough, the difference between "occasionally" and "in certain circumstances" is what they imply about meta-probability. The first indicates true randomness, the second indicates certainty but only once you obtain more information)

I intuitively agree that any real-world probability estimate will have a certain finite level of precision, but I'm having trouble imagining what that actually means formally. Normally to work out what level of precision is appropriate, you estimate the probability distribution of the true value and how much that varies, but with a probability, if you have a probability distribution on probabilities, you just integrate it back to a single probability.

One case where having a probability distribution on probabilities is appropriate is as an answer to "What probability would X assign to outcome Y, if they knew the answer to question Z?" (where the person giving this probability distribution does not already know the answer to Z, and X and the person giving the meta-probabilities may be the same person). If we set Z to something along the lines of "What are the facts about the matter at hand that a typical person (or specifically the person I'm talking to) already knows?" or "What are all the facts currently knowable about this?", then the amount of variation in the meta-probability distribution gives an indication of how much useful information the probability (which is the expectation of the meta-probability distribution) conveys. I'm not sure to what extent this lines up with the intuitive idea of the precision of a probability though.

I was thinking something vaguely along these lines while reading the post. It seems like the intuitive thing that people are trying to extract from the number of digits in a probability is "If I took the time to fully understand your reasoning, how likely is it that I'd change my mind?"

In your notation, I think that would be something like "What is the probability that there is a relevant question Z to which you know the answer and I do not?"

It is really easy to understand what the finite number of digits means if you think about how the probability changes with additional measurements. If you expect the parameter to change by 1% up or down after you learn a new fact, that's the precision of your probability. For example, continually rolling a loaded dice to figure out what its average value is involves an estimate that converges to the right answer at a predictable rate. At any point in the experiment, you can calculate how closely you've converged to the rate of rolling a 6 within 95% confidence intervals.

It's only difficult to see this when you're thinking about questions that have no streams of new information to help you answer them - like the existence of God, or the number of aliens in the galaxy.

I like Scott's wording of "lightly held" probabilities. I think this matches what you are describing about the sensitivity of a probability estimate to the answer of an as-yet unanswered question Z.

Okay, hear me out: only write probabilities as odds ratios (or fractions if you prefer), and the number of digits is the number of Hartleys/Bans of information; you have to choose the best approximation available with the number of digits you're willing to venture.

Less goofy answer: Name the payout ratios at which you'd be willing to take each particular side of a small bet on the event. The further apart they are, the less information you're claiming to have.

>When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many.

Ok, though isn't this question orthogonal to whether the number represents probabilities?

This sounds more like a general question of whether to represent uncertainty in some vanilla measurement (say of the weight of an object) with the number of digits of precision or guard digits plus an explicit statement of the uncertainty. E.g. if someone has measured the weight of an object 1000 times on a scale on a vibrating table, and got a nice gaussian distribution, and reported the mean of the distribution as 3.791 +- 0.023 pounds (1 sigma (after using 1/sqrt(N))), it might be marginally more useful than reporting 3.79 +- 0.02 if the cost of the error from using the rounded distribution exceeds the cost of reporting the extra digits.

Yes, this is exactly the same. In your example you are measuring a mass, in my examples you're measuring the parameter of a Bernoulli distribution. For practical reasons, there's always going to be a limited number of digits that it's worth telling someone when communicating your belief about the most likely value of a hidden parameter.

This is one of those societal problem where the root is miscommunication. And frankly it's less of a problem than just the fact of life. I remember Trevor Noah grilling Nate silver that how could Trump win presidency when he predicted that Trump has only in 1/3 chance of winning. It was hilarious is some sense. Now this situation is reverse of what Scott is describing where the person using the probability is using it accurately but the dilemma is same: lack of clear communication.

Yes but most people think of probability like that. They think that probability of below 50% equates to an event being virtually impossible. It's like how many scientists make stupid comments on economics without understanding it's terms.

Most people ... . Most people - outside Korea and Singapore - can not do basic algebra (TIMMS). Most people are just no good with stochastics in new contexts. Most journalists suck at statistics. Many non-economists do not get most concepts of macro/micro - at least not without working on it. Does not make the communication of economists or mathematicians or Nate Silver less clear. 37+73 is 110. Clear. Even if my cat does not understand. - Granted, Nate on TV could have said: "Likely Hillary, but Trump has a real chance." - more adapted to IQs below 100 (not his general audience!). Clearer? Nope.

"more adapted to IQs below 100 (not his general audience!)"

Huh, have you seen the comments section on his substack? It's an absolute cesspool. I don't think I've read another substack with such a high proportion of morons and/or trolls in the comments (though I haven't read many).

I did and agree. He is not writing those comments, is he? - Writing: "who will win: Trump or Biden" will attract morons/trolls. Honey attracts flies just as horseshit does. - MRU comments are mostly too bad to read either.

I agree that the comments are a cesspool, but as far as I know its idiocy borne out of misdirected intellect rather than actual lack of mental horsepower.

If I see a mistake, oftentimes it's something like "Scott says to take X, Y and Z into account. I am going to ignore that he has addressed Y and Z and claim that his comments in X are incorrect!" I would expect someone dumb to not even comprehend that X was mentioned, much less be able to give a coherent (but extremely terrible) argument for this.

I think it's also partially confounded by the existence of... substack grabbers? Don't know what a good term for this type of person is. But when I see a low quality comment, without the background that an ACX reader """should""" have, I'll scroll up and see it's a non regular writing a substack. Which I would guess means that they're sampling from the general substack or internet population.

I have encountered people who use the term "50/50" as an expression of likelihood with no intent of actual numerical content but merely as a rote phrase meaning "unpredictable." On one occasion I asked for a likelihood estimate and was told "50/50," but when I had them count up past occurrences it turned out the ratio was more like 95/5.

I still intuitively feel this way. I know that 40% chance things will happen almost half the time, but I can't help but intuitively feel wronged when my Battle Brothers 40% chance attack doesn't hit.

Charles Babbage said, "On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

Hey sometimes this works now with AI - an LLM figures out what you meant to ask and answers that instead of your dumb typo. Those guys were just 200 years ahead of their time.

Babbage apparently thought the questions [from two MPs, presumably in 1833] were asked in earnest. If so, I think the likeliest explanation is just Clarke's 3rd Law: "Any sufficiently advanced technology is indistinguishable from magic."

I think this is mainly conflict theory hiding behind miscommunication. Because there's no agreed upon standard of using numbers to represent degrees of belief (even the very concept of "degrees of belief" is pretty fraught!), people feel that they have license to dismiss contrary opinions outright and carry on as they were before.

Yes and I suspect this problem gets worse when you start getting to things with 10% probability happening, or things with 90% probability not happening, and so on.

I think the answer to that is to try to raise the sanity waterline.

Also, I am not sure that the issue here is just probability illiteracy.

If Nate had predicted that there is a 1/3 chance of a fair dice landing on 1 or 2, nobody would have batted an eye if two came up. Point out that the chances of dying in a round of Russian roulette are just 1/5 or so, and almost nobody will be convinced to play, because most people can clearly distinguish a 20% risk from a 0% risk.

Part of the problem is that politics is the mind killer. There was a clear tendency of the left to parse "p(Trump)=0.3" (along with the lower odds given by other polls) as "Trump won't win", which was a very calming thought on the left. I guess if you told a patient that there was a 30% chance that they had cancer, they would also be upset if you revised this to 100% after a biopsy. (I guess physicians know better than to give odds beforehand for that very reason.)

Sooner or later, everyone wants to interpret probability statements using a frequentist approach. So, sure you can say that the probability of reaching Mars is 5% to indicate that you think it's very difficult to do, and you're skeptical that this will happen. But sooner or later that 5% will become the basis for a frequentist calculation.

If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. It's just inevitable.

It's also very obscure how to assign numerical probabilities to degrees of belief. For instance, suppose we all agree that there is a low probability that we will travel to Mars by 2050. What's the probability value for that? Is it 5%, 0.1%, or 0.000000001%? How do we adjudicate between those values? And how do I know that your 5% degree of belief represents the same thing as my 5% degree of belief?

Assuming you have no religious or similar objections to gambling, the standard mutually intelligible definition of a 5% degree of belief is a willingness to bet a small fraction of what's in your wallet (for example, one cent if you have $20), at better than 20:1 odds.

It has to be a small fraction because the total value you place on your money is only linear for small changes around a given number of dollars, otherwise it tends to be logarithmic for many.

> the standard mutually intelligible definition of a 5% degree of belief is a willingness to bet a small fraction of what's in your wallet (for example, one cent if you have $20), at better than 20:1 odds.

> It has to be a small fraction because the total value you place on your money is only linear for small changes around a given number of dollars, otherwise it tends to be logarithmic for many.

This isn't right; by definition, people are willing to place fraudulent bets that only amount to "a small fraction of what's in their wallet". They do so all the time to make their positions appear more certain than they really are; your operationalization is confounded by the fact that support has a monetary value independently of whether you win or lose. Placing a bet buys you a chance of winning, and it also buys you valuable support.

It still won't work; consider the belief "I will win the lottery."

Fundamentally, it is correct to say that a 5% degree of belief indicates willingness to bet at 20:1 odds in some window over which the bet size is not too large to be survivable and not too small to be worthwhile, but it is not correct to say that willingness to bet indicates a degree of belief (which is what you're saying when you define degree of belief as willingness to bet), and that is particularly the case when you specify that the amount of the bet is trivial.

Sure. Also, I do think that it's reasonable to invest trivial amounts of money in fair lottery tickets given certain utility functions. For example, if a loss is negligible but you value extremely highly the possibility of imminent comfortable retirement. I don't do this myself because I believe that in my part of the world lotteries are rigged and the chance to win really big is actually zero.

> Also, I do think that it's reasonable to invest trivial amounts of money in fair lottery tickets given certain utility functions.

I agree. But this is not compatible with the definition of "degree of belief" offered above; that definition requires that lottery ticket purchasers do not believe the advertised odds are correct.

"I will win the lottery" has a large fantasy component, where people spend time thinking about all the things they could buy with that money.

An anonymous bet with a small fraction of your wealth does not have that fantasy component. "Look at all the things I could buy with this 20 cents I won!" just doesn't do the same thing.

Because it's anonymous you're not pre-committing anything. I suppose some people might brag about making that bet?

Is it actually a problem for prediction markets, though? People betting at inaccurate odds for emotional (or other) reasons just provides a subsidy for the people who focus on winning, resulting in that market more accurately estimating the true likelihood. Certainly markets can be inaccurate briefly, or markets with very few participants can be inaccurate for longer, but it's pretty easy to look past short-term fluctuations.

Or maybe you're thinking of it being a problem in some different way?

Of course, this assumes that people will be willing to bet on an outcome given some odds ratio.

It might well be that some people would not rationally bet on something.

For example, given no further information, I might not be willing to bet on "Alice and Bob will be a couple a month from now" if I have no idea who Alice and Bob are and anticipate that other people in the market have more information. Without knowing more (are Alice and Bob just two randomly selected humans? Do they know each other? Are they dating?) the Knightian uncertainty in that question is just too high.

Thank you! You are making **exactly** my point -- that although people start out by talking about subjective "degrees of belief", sooner or later they will fall into some sort of frequentist interpretation. There can be no purer expression of this than making an argument about betting, because ultimately you are going to have to appeal to some sort of expected value over a long run of trials, which is by definition a frequentist interpretation.

Not necessarily. I think the probability that P=NP is about 5%, but I don’t think that if we “ran” the universe that many times it would be true in 5% of them - it would either be true in all or none.

Instead it means that, if I had to do something right now whose outcome depended on P being equal NP, I would do it if the amount by which it made things better if they were = is more than 20 times better than the amount by which it made things worse if they were unequal. I need some number or other to coordinate my behavior here and guarantee that I don’t create a book of bets that I am collectively guaranteed to lose (like if I paid 50 cents for a bet that gives me 90 cents if horse A wins and also paid 50 cents for a bet that gives me 90 cents if horse A loses). But the number doesn’t have to express anything about frequency - I can use it even for a thing that I think is logically certain one way or the other, if I don’t know which direction that is.

I think the P=NP example (and the Mars example, if you believe in a deterministic universe) can still be approached this way if we define 'rerunning the timeline' as picking one of the set of possible universes that would produce the evidence we have today.

I'm not sure I follow. If the evidence we have today is insufficient to show that P != NP, how is it logically impossible for universes to exist where we have the same evidence and P does or does not equal NP?

Most people suspect that the mathematical tools we have *are* sufficient to show one of the directions of the P and NP conjecture, it's just that no human has yet figured out how to do it.

Even if it turns out not to be provable, it still either is or isn't, right? Universes where the starting conditions and the laws of physics are different are easy to imagine. Universes where *math* is different just bend my mind in circles. If you changed it to switch whether P=NP, how many other theorems would be affected? Would numbers even work anymore?

Interesting that you should say that. I was thinking about the timelines of possible universes going *forward from the present* — and when it comes to the Mars example, what is the set of possible universes that will *prevent* us from getting to Mars in 2050 vs the set that will *allow* us to get to Mars in 2050? think we can agree (or maybe not?) that there is an infinity of universes for the former, but a smaller infinity of universes for the latter. After all, the set of universes where the sequence of events happen in the proper order to get us to Mars would be smaller than the set of universes where the sequence of events didn't happen (or couldn't happen). If these were sets of numbers we could create a bijection (a one-to-one correspondence) between their elements. But no such comparison is possible between these two sets, and the only thing we can say with surety is that they don't have the same cardinality. Trying to calculate the proportionality of the two sets would be impossible, so determining the probability of universes where we get to Mars in 2050 and universes that don't get us to Mars in 2050 would be a nonsensical question. I'm not going to die on this hill, though. Feel free to shoot my arguments down. ;-)

I've never been any good at reasoning about infinities in this way (Am I so bad at math? No, it's the mathematicians who are wrong!), but I've spotted a better out so excuse me while I take it:

I do disagree that these are infinite sets; I think they're just unfathomably large. If there are 2^(10^80) possible states of the universe one Planck second after any given state, the number of possible histories at time t quickly becomes unrepresentable within our universe. It's a pseudoinfinite number that is not measurably different from infinity in a practical sense, but saves us all of the baggage of an infinite number in the theoretical sense.

If you accept that premise (and I don't blame you if you don't), I believe we're allowed to do the standard statistics stuff like taking a random sample to get an estimate without Cantor rolling in his grave.

I like your counter-argument. But I'll counter your counter with this — if the number of potential histories of our universe going forward is impossible to represent within our universe, then it's also impossible to represent the chances of getting to Mars in 2050 vs not getting to Mars in 2050. Ouch! My brain hurts!

> Not necessarily. I think the probability that P=NP is about 5%, but I don’t think that if we “ran” the universe that many times it would be true in 5% of them - it would either be true in all or none.

I think this is a semantic/conceptual disagreement. I think there are two points where we can tease it apart:

* You're thinking of the world as deterministic, and I as predicated on randomness to a significant degree. If the future depends on randomness, then it makes no sense to claim it would be true in all or none. Whereas if the future is determine by initial conditions and laws of nature, then yes, it will be. In which case:

* You can adapt my conceptualisation such that it survives a deterministic world. A determinist would believe that the future is determined by known and unknown determinants, but nonetheless fixed, ie the laws of nature and initial conditions both known and unknown. To give x a 5% probability, then, is to say that if, for each "run" of the universe you filled in the unknown determinants randomly, or according to the probability distribution you believe in, you would get event x occurring 50/1000 times.

Correct me if I'm mistaken in my assumptions about your belief, but I don't know how else to make sense of your comment.

I think that the P vs NP claim is most likely a logical truth one way or the other, and that no matter how you modify the known or unknown determinants, it will come out the same way in all universes.

If you have some worries about that, just consider the claim that the Goldbach conjecture is true (or false), or the claim that the 1415926th digit of pi is a 5.

Isn’t your 5% estimate essentially meaningless here though? Since it is a proposition about a fundamental law and you know it is actually either 100% or 0%. And more importantly no other prediction you make will bear sufficient similarity to this one that grouping them yields any helpful knowledge about future predictions.

Your first point shows that P=NP doesn’t have a *chance* of 5% - either all physically possible paths give P=NP or none of them do.

Your second point shows that a certain kind of frequentist calibrationism isn’t going to make sense of this either.

But Bayesian probability isn’t about either of those (regardless of what Scott says about calibration). Bayesian probability is just a way of governing your actions in light of uncertainty. I won’t make risky choices that come out very badly if P=NP is false, and I won’t make risky choices that come out 20 times as badly if it’s true compared to how well the choice turns out if it’s false. That’s what it means that my credence (Bayesian probability) is 5%. There is nothing that makes one credence “correct” and another “incorrect” - but there are people who have credence-forming policies that generally lead them well and others that generally lead them badly. And the only policy that avoids guaranteed losses is for your credences to satisfy the probability axioms and to update by Bayesian conditionalization on evidence.

The thing I don’t get is that if it is not a frequentist probability, how can it make sense to applies Bayes theorem to an update for P=NP? Say a highly respected mathematician claims he has a proof of it, then promptly dies. This is supposed to be Bayesian evidence in favour of P=NP. But does it make sense to apply a mathematical update to the chance of P=NP? Surely it is not an event that conforms to a modelable distribution, since as you say it is either wrong or right in all universes.

What Bayes' Theorem says is just that P(A|B)P(B)=P(A&B). (Well, people often write it in another form, as P(A|B)=P(B|A)P(A)/P(B), but that's just a trivial consequence of the previous equation holding for all A and B.)

To a Bayesian, P(A) (or P(B), or P(A&B)) just represents the price you'd be willing to pay for a bet that pays $1 if A (or B, or A&B) is true and nothing otherwise (and they assume you'd also be willing to pay scaled up or down amounts for things with scaled up or down goodness of outcome).

Let's say that P(B|A) is the amount you plan be willing to pay for a bet that pays $1 if B is true and nothing otherwise, in a hypothetical future where you've learned A and nothing else.

The first argument states that this price should be the same amount you'd be willing to pay right now for a bet that pays $1 if A&B are both true, nothing if A is true and B is false, and pays back your initial payment if A is false (i.e., the bet is "called off" if A is false). After all, if the price you'd be willing to pay for this called off bet is higher, then someone could ask you to buy this called off bet right now, as well as a tiny bet that A is false, and then if A turns out to be true they sell you back a bet on B at the lower price you'd be willing to accept after learning A is true. This plan of betting is one you'd be willing to commit to, but it would guarantee that you lose money. There's a converse plan of betting you'd be willing to commit to that would guarantee that you lose money if the price you'd be willing to pay for the called off bet is lower than the price you'd be willing to pay after learning that A is true. The only way to avoid the guaranteed loss is if the price you're willing to pay for the called off bet is precisely equal to the price you'd be willing to pay for a bet on B after learning A.

But a bet on B that is called off if A is false is precisely equal to a sum of three bets that you're willing to make right now - and we can check that if your prices don't precisely satisfy the equation P(B|A)P(A)=P(A&B), then there's a set of bets you're willing to make that collectively guarantee that you'll lose money.

There's nothing objectively right about the posterior P(B|A), just that if you are currently committed to betting on B at P(B), and you're currently committed to betting on A&B and P(A&B), then you better be committed to updating your price for bets on B to P(B|A) if you learn A (and nothing else) or else your commitments are self-undermining.

All that there is for the Bayesian is consistency of commitment. (I think Scott and some others want to say that some commitments are objectively better than others, such as the commitments of Samotsvety, but I say that we can only understand this by saying that Samotsvety is very skilled at coming up with commitments that work out, and not that they are getting the objectively right commitments.)

I do not think that 5% to reach Mars could ever be interpreted as a frequentist probability.

If you are one of the mice which run the simulation that is Earth, and decide to copy it 1000 times and count in how many of the instances the humans reach Mars by 2050, then you can determine a frequentist probability.

If you are living on Earth, you would have to be very confused about frequentism to think that such a prediction could ever be a frequentist probability.

>If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. _It's_ _just_ _inevitable_.

I wonder if there's a bell curve relationship of "how much you care about a thing" versus "how accurately you can make predictions about that thing". E.g. do a football teams' biggest fans predict their outcomes more accurately or less accurately than non-supporters? I would guess that the superfans would be less accurate.

If that's the case, "Person X has spent loads of time thinking about this question" may be a reason to weigh their opinion less than that of a generally well-calibrated person who has considered the question more briefly.

I think you're conflating bias vs "spent loads of time thinking about this question" as the cause of bad predictions. The latter group includes all experts and scientists studying the question, and probably most of the best predictions. It also includes the most biased people who apply no rational reasoning to the question and have the worst predictions. You're better off considering bias and expertise separately than just grouping them as people who spenta lot of time thinking about something.

Me too. But you still have to consider the two factors separately. You can't just reason that since physicists have spent a lot of time thinking about physics, they're probably biased and shouldn't be trusted about physics.

Sure you can. Greybeards with pet theories they won't give up because they're so invested in them are a thing. It's why Planck wrote that "science advances one funeral at a time" - eminent physicists really are a hindrance to accurate physics.

Compared to some platonic ideal of Science!, perhaps. Compared to any real-world alternative that *isn't* eminent physicists, the eminent physicists will give you the most accurate understanding of the physical universe.

On the one hand, bookies do find that the home team gets bet on a lot more than their actual odds of winning.

On the other hand, superfan could mean either "I go to all the games" or "I spend a lot of time thinking about the sport". The latter do tend to be able to put aside their affiliation when there's money on the line.

I think the bookies' experience has more to do with a lot of naïve fans flinging out on a game once in a while. It does tend to happen in the playoffs more than the regular season after all.

In answer to your first question, I find good value in betting on things where there is emotional investment, ie elections and sporting events with local heroes involved.

If Eliezer thought that p(AI doom) was 10^-10, then he would not have spent years working on that question.

On the one hand, I would think it is wrong to privilege the opinion on the existence of God of someone with a degree in theology based on the fact that they have deep domain knowledge, because few atheists study theology.

On the other hand, if a Vulcanologist told me to evacuate, I would privilege his opinion over that of a random member of the public. (It might still be overly alarmist because of bad incentives.)

Most other professionals fall somewhere in this spectrum. You are much more likely to study astronomy / climate change / race relations / parapsychology / homeopathy if you believe that stars (as "luminous objects powered by fusion") / antropogenic climate change / systemic racism / ESP / memory of water exists. Of course, once you are in a field, you are disproportionally subject to the mainstream opinions in that field, publication bias and so on. Trying to publish an article arguing that stars are actually immortalized Greek heroes in an astronomy journal will probably not work.

So to judge "is X for real or are all the practitioners deluding themselves" is not easy to answer.

A great project indeed! A book waiting to be written (if it has not already been written. I 'd put the probability of that at 35 percent at least, though:)

The Emergence of Probability by Ian Hacking covers this topic. Prior to the modern generalized concept of probability, the word meant something very different. Quoting from https://en.wikipedia.org/wiki/Probability:

> According to Richard Jeffrey, "Before the middle of the seventeenth century, the term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.

Thanks. In addition to this and Hacking I've found James Franklin's Science of Conjecture: "The words “reasonable” and “probably” appear in modern English with a frequency and range of application very much higher than their cognates in other European languages, as do words indicating hedges relating to degrees of evidence such as “presumably,” “apparently,” “clearly.” These phenomena indicate an Anglophone mindset that particularly values attention to the uncertainties of evidence.25"

Your suggestion to call different concepts of probability different names ("shmrobability") for metaphysical reasons actually makes complete sense. Maybe call frequentist probability "frequency", and Bayesian probability "chance" or "belief", with "probability" as an umbrella term. The different concepts are different enough that this would be useful. "The frequency of picking white balls from this urn is 40%." Sounds good. "The frequency of AI destroying mankind by 2050 is 1%" Makes no sense, as it should; it happens or not. "The chance of AI destroying mankind by 2050 is 1%." OK, now it makes sense. There we go!

It would be hard to think of easy English terms for all six main interpretations they list, but hardly impossible. We can even create new words if necessary!

On reflection, I've come around to thinking that any expression that brings up an ensemble of possible worlds would be apt to cause more philosophical sidetracking than "probability" itself!

Now, would that be the quantum mechanical many-worlds-interpretation ensemble, the cosmological multiple-independent-inflationary-bubbles ensemble, or the string theory multiple possible compactifications ensemble? :-)

Why assume that there is a finite number of possible worlds to begin with?

If the number of possible worlds is infinite, which to me seems intuitively likely, then any subset of possible worlds will either be ~0% of the whole (if finite) or an undefined fraction (if the subset is itself infinite).

I'm not sure they're meaningfully different, at least not in a way that separates flipping a coin from questions like "will we get to Mars before 2050". If I have a fair coin that I've never flipped before, does it make sense to say the frequency of getting heads is 50%? If I've only ever flipped it twice and it came up heads both times, is the frequency of heads 100%? If yes, then the frequency is bad for making predictions. It's just a count of outcomes in some reference class.

But if we consider the frequency of heads to be 50% even though the coin came up heads both times, then it's because we're using "frequency" to mean the frequency you would hypothetically get if you repeated the experiment many times. But this sounds a lot like chance. If you hypothetically repeated the experiment of "will we get to Mars before 2050" many times, there would also a frequency of how many times we'd get there. Sure, we can't actually repeat experiment in real life, but the same is true of a coin that I won't let anyone flip.

For both the coin flip and the chance of getting to Mars, we come up with a mathematical model and that model gives us a probability. The models exist as pure math and have a well defined probability, but they never perfectly match the real life event we're trying to model. E.g. no real coin is perfectly balanced and has exactly a 50% chance of landing on heads.

"For both the coin flip and the chance of getting to Mars, we come up with a mathematical model and that model gives us a probability. The models exist as pure math and have a well defined probability, but they never perfectly match the real life event we're trying to model."

Sure, but its important that the models match the real life events well enough to be useful. There is no way to determine whether that is the case or not other than defining what "well enough" means, and then comparing the outcomes of experiments to the predictions of the model.

I think it's feasible to conduct enough coin flip experiments to validate the mathematical model to within some acceptable tolerance, and that's why I don't object to talking about the probability of a particular outcome for a coin flip.

I don't think it's feasible to conduct enough "getting to Mars" experiments to validate the mathematical models that people might propose, and that's why I object to trying to assign it a probability.

You can try to come up with a reference class for "getting to Mars" that is large enough that you can validate a mathematical model of the reference class, but I don't think it actually helps, because you still have to validate that getting Mars actually belongs in that reference class, and that brings you back to the need for experiments.

You can also try to break "getting to Mars" down into a series of steps that are themselves feasible to model and validate, and then combine them. But I still don't think it helps you, for two reasons. The first goes back to what I already said -- you still have to experimentally verify that you've captured all of the relevant steps, and that still requires experimentation. The second, deeper objection is that I would draw a distinction between reducible and irreducible complexity. I think "getting to Mars" is irreducibly complex, i.e. it can't be broken down into discrete components that can each individually be well modeled. At least not with our current understanding of economics, politics, and sociology.

The same is true of the coin though. You're using the reference class of other coins that have been flipped and experimentally verified to match the model that p(Heads) = 0.5. But my penny is a different coin. No two pennies are manufactured exactly alike. How do you know that other coins are a good reference class for my coin? You'd have to do more experiments with my coin to ensure it belongs in that reference class.

At the end of the day, you're making the judgement call that me flipping the penny in my pocket is similar to other people flipping other coins. That's a good judgement call. But it's not categorically different than judging that our model for rocket launches will apply to a rocket to Mars. The difference is only one of degree. We're highly confident in applying our coin flipping model to new coins, and somewhat less confident in applying our rocket model to new destinations.

>But my penny is a different coin. No two pennies are manufactured exactly alike. How do you know that other coins are a good reference class for my coin? You'd have to do more experiments with my coin to ensure it belongs in that reference class.

Agreed. One can (almost) never eliminate some reference class tennis. ( Perhaps with elementary particles or identical nuclei one can. Chemistry would look different if electrons were distinguishable... )

I agree that no two pennies, or anything really, are exactly alike. There will always be some tolerance for differences when defining your reference class, and it's always possible to set the tolerances so strictly that the reference class is N=1. But I still think there are qualitative differences between coin flipping and getting to Mars (which I interpreted as "land a group of humans on Mars and bring them back safely").

First, even though no two coins are exactly the same, it's feasible to experimentally measure how similar a large number of coins minted in US mints are when it comes to the distribution of heads and tails. We could even go more granular and look at coins from a particular mint, or coins from a particular production run, and we can still collect enough coins and flip them enough times to model the probability distributions pretty well. You can't do that for getting to Mars.

Of course, there is always a chance that your coin was somehow defective, but we can even quantify the fraction of defective coins pretty well and incorporate that into the model. If you are being sneaky and telling me your coin is a standard quarter, when in fact it's fraudulent. I'll admit that's probably impossible to fully incorporate something like dishonesty into a model. If you think that makes coin flipping and getting to Mars qualitatively similar, then we can agree to disagree.

But even if the reference class really is N=1, then you can at least verify your assumptions by flipping that particular coin a bunch of times. Now we've moved from prediction to postdiction, but I still think this is a meaningful distinction when compared to something like sending people to Mars, that we will struggle to even do once, at least at first.

If "getting to Mars" were merely rocket science, then I think you could argue that it's qualitatively similar to flipping a coin, and it's just a difference of degree. After all, we have successfully (and unsuccessfully) sent rockets to Mars. The sample size may not be as large as the number of coin flips we could do in a day, but it's large enough that there is useful information to be gleaned. And rocket science, while very complex, is closer to what I called reducible complexity than irreducible complexity (I think...I'm not a rocket scientist).

But if by "getting to Mars" we mean "land humans on Mars and bring them back" then I think there is a qualitative difference with flipping a coin. A manned mission to Mars has not been done even once -- it hasn't even been attempted once. And it involves a lot more than just physics and biology. There are economic and political and social factors that I don't think can be modeled and verified in the same way as flipping a coin or even launching a rocket.

Finally, concepts like reducible vs irreducible complexity, how similar two things have to be before you can group them in the same class, and how closely a model has to match observations to be considered "good enough" involve subjectivity, and I think of them as existing on a continuous spectrum. That means you can't set a clear dividing line and say, everything to right is one kind of thing, and everything to the left is another kind of thing. The boundaries are fuzzy. But I still think two things that exist on a spectrum can have qualitative differences when they are are far enough apart.

I'm sympathetic to this approach, but there's an issue where declaring literally anything to *really* be a valid instance of "frequency" invites an argument about interpretations of quantum mechanics. No easy escape from metaphysical arguments in a deterministic universe!

Most philosophers use “chance” for something different from either frequency or credence (our term for Bayesian probability). “Chance” is used for an objective physical probability that may or may not ever be repeatable and that people may or may not ever wonder about to form a credence. Standard interpretations of quantum mechanics suggest that there are chances of this sort (though my own preferred interpretation, the many worlds interpretation, gives those numbers a slightly different meaning than “chance”).

> Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

This seems literally wrong to me. Probability and information both measure (in a very technical sense) how surprising various outcomes are. I think they may literally be isomorphic measures, with the only difference being that information is measured in bits rather than per cents.

Your examples are also off-base here. The probability of a fair coin coming up heads when I'm in California is 1/2, and the probability of a fair coin coming up heads when I'm in New York is 1/2, and we wouldn't say that probability is not the same thing as information because 1/2 does not capture what state I'm flipping the coin in. Similarly, the difference between the first two scenarios is not E[# heads / # flips] but E[(# heads / # flips)^2] - (E[# heads / # flips])^2 aka the expected variance of the distribution is different. This is because (1) is well-modelled by *independent* samples from a known distribution, while in (2) the samples are correlated (aka you need a distribution over hyper parameters if you want to treat the events as independent).

I also noticed you didn't touch logical / computationally-limited probability claims here, like P(the 10^100 digit of pi is 1) = 1/10.

I think he meant "level of information" as in pages of text you could write about it, not the information-theoretic entropy sense of the word that you're rightly saying doesn't work in that sentence.

Right. “This is because 50% isn’t a description of how much knowledge you have, it’s a description of the balance between different outcomes.” He presents scenarios where you have a lot of useless knowledge (the knowledge about the process is made useless by arbitrarily tagging one outcome as heads and the other as tails). Probability is related to the amount of knowledge you have that can be leveraged to cast a prediction. Learning Moby Dick by heart won’t help you predict whether Trump or the other guy will win the election.

Parts of this are unobjectionable, and other parts are very clearly wrong.

It is perfectly fine to use probabilities to represent beliefs. It is unreasonable to pretend the probabilities are something about the world, instead of something about your state of knowledge. Probabilities are part of epistemology, NOT part of the true state of the world.

You say "there's something special about 17%". No! It's just a belief! Maybe the belief is better than mine, but please don't conflate "belief" with "true fact about the world".

If Samotsvety predicts that aliens exist with probability 11.2%, that means they *believe* aliens to exist to that extent. It does not make the aliens "truly there 11.2% of the time" in some metaphysical sense. I can feel free to disagree with Samotsvety, so long as I take into account their history of good predictions.

(Side note: that history of good predictions may be more about politics and global events than it is about aliens; predicting the former well does not mean you predict the latter well.)

----------

Also, a correction: you say

"It’s well-calibrated. Things that they assign 17% probability to will happen about 17% of the time. If you randomly change this number (eg round it to 20%, or invert it to 83%) you will be less well-calibrated."

This is false. It is easy to use a simple rounding algorithm that guarantees the output is calibrated if the input is calibrated (sometimes you can even *increase* calibration by rounding). If I round 17% to 20% but also round a different 23% prediction to 20%, then it is a mathematical guarantee that if the predictions were calibrated before, they are still calibrated.

Calibration is just a very very bad way to measure accuracy, and you should never use it for that purpose. You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

I think the part about calibration answers the first half of your comment: these numbers have real-world validity in that you can bet on them and make more money on expectation than someone using different numbers.

For the second half of the comment: if I understand your argument correctly, it only holds if there are equal numbers of predictions on either side. If Samotsvety says 17% for impeachment and 23% for Mars (or 10% and 90%), and you round those both to 20% (or 50%) and bet accordingly, then, yes, you'll make as much money as someone who used the unrounded numbers.

But if they predicted 17% (or 10%) for *both* events, and you rounded *both* to 20% (or 50%) and bet accordingly, then you'd lose money on expectation compared to someone using the unrounded numbers.

And this applies even more strongly if there's a whole slate of unrelated events, like in a prediction contest. If you threw away all the answers (from Samotsvety or the wisdom of crowds or whoever) and rounded everything to 50%, or even to one significant digit, then you would be losing information, and would be losing money if you bet based on those rounded numbers.

If two people both have functions that satisfy the probability axioms, then each one makes more money on expectation than the other, calculating expectations by means of their own probability function. This is just a property of probability functions generally, that doesn’t pick out a single special one.

If Samotsvety puts 17% for impeachment, and I round it off to 10% and you round it off to 30%, and we all make a bet, then the person who does best will be either you or I. Neither of us could systematically do better than Samotsvety. But there absolutely could be someone who is perfectly equally good to Samotsvety in general, despite giving slightly different numbers to every event than Samotsvety does.

"these numbers have real-world validity in that you can bet on them and make more money on expectation than someone using different numbers."

There's no "in expectation" when considering a single event.

Also, you're conflating "make money in expectation" with "be calibrated". They are not quite the same. I do agree that the 17% can be interpreted in a way that is meaningful, and that I won't be able to round. Calibration is the wrong way to do this.

That I can't round 17% still does not make 17% "objectively true", since another equally skilled predictor can predict 93% on the same question and they can both be right (in the sense that they both get the same calibration and the same Brier score on their entire batch of predictions). The issue is that you can never judge accuracy or calibration on a single prediction, but only on a batch of them.

It's really much better to think of 17% as a measure of belief, not as an objective fact about the world. If I have insider info, maybe I already know the event will happen 100% of the time, so then what does the 17% mean? It just CAN'T be a fact about the world; it doesn't make sense!

If we suppose that quantum events tend not to influence politics overly much, then 17% for impeachment is likely not true.

If God decided to open Omniscient Forecasts Ltd, he might state the impeachment odds as 99% or 1% instead. If he was correct, he would beat Samotsvety in his prediction score, and everyone would use his predictions.

Samotsvety 17% are just our best guess because we do not have a Laplacian Daemon or the best possible psychological model of every American.

It's implied in several places, including in other posts by Scott. In this particular post, we have, for example:

"I think the best way to describe this kind of special number is “it’s the chance of the thing happening”, for example, the chance that Joe Biden will be impeached."

So? Do you also object to people saying that the chance of a fair coin landing on heads on the next throw is 50%? As far as I can tell, these statements are equivalent.

First you claim Scott never said it, and now you claim he's right to say it?

Anyway, a fair coin comes from an implied set of repeatable events (other fair coin flips). That's not true for most things you predict.

What's the probability that the 100th digit of pi is 5? You can say "10%", but you can also just look it up and see that the digit is 9, so the probability is actually "0%" (unless the source you looked up was lying, I guess, so maybe 1%?). Which one is right? They both are: the probabilities represent *your state of belief*, not an objective fact about the world.

Maybe it would be clearer if I asked you "what's the probability the 1st digit of pi is 5"? That should make it more obvious that the answer is not just "10%" in some objective sense. The answer depends on how many digits of pi you know! It's a property of your state of knowledge, not a property of the world.

>First you claim Scott never said it, and now you claim he's right to say it?

Yes, insofar as it's ordinary use of language, even if it elides certain metaphysical subtleties. You do agree of course, that there's no true fact about a fair coin that is represented by the number 50%?

Hmm... Maybe one way of thinking about it is that there are estimates of probability that are difficult to impossible to _improve_? In a Newtonian world, if one knew everything about a coin toss of a fair coin to infinite precision and had infinite computing capacity, then the odds of heads or tails on a given toss would be 0%/100% or 100%/0%, but, in practice, no one can have that information. And in many situations, one can often say "The information doesn't exist" e.g. about the results of an experiment that has not yet been run. And, in some of those cases, the best existing estimate of the probability looks less like a subjective description of one's state of mind and more like a publicly known value.

With a fair coin, you can view it in two metaphysical frames: you could either (a) say the probability it lands heads is "truly" 50%, by viewing the coinflip as an instance out of the larger class of "all fair coinflips", which one can show together approach half heads and half tails; or (b) you can say this given coin, like the rest of the universe, is deterministic and the probability is about our state of knowledge. I allow both frames.

For more practical matters, though, frame (a) does not work. That's because (1) there is no natural class like "all fair coinflips" from which you drew your sample, and (2) not everyone has the same state of belief about the event. There is no longer any way to pretend that 17% is a fact of the universe instead of a property of your belief state. Some individual congressmen might have perfect knowledge of whether Biden will be impeached; how then could 17% be a fact of the universe?

You've tried to gotcha me, and I've answered your question carefully. Now answer mine: is the statement "10% chance the 100th digit of pi is 5" really a factual claim about the world instead of a claim about your state of knowledge? How could that be, when I just looked it up and know the 100th digit?

I don't think that's necessary for the objection to hold. If I understand correctly, Scott says that if you are well calibrated, then 17% represents a true fact about your state of knowledge about the world. But the objection is that two people can be well calibrated, share the same knowledge, and assign totally different probabilities to events.

Yes, Scott is sloppy with his treatment of calibration, but I think that the main thrust of his post is obviously true, and people eager to nitpick him should first acknowledge that, if they prioritize discourse improvements over being excessively pedantic.

>You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

This changed my mind more than any other comment. However, now I wish I could see some empirical data as to how often this happens. Are superforecasters of similar calibration really assigning totally different probabilities to events? I could believe this happening if people just have different knowledge and expertise. But if it's based on publicly available knowledge, then I would expect high quality predictions to cluster.

It's a good question, and I don't know the answer. My guess is that the superforecasters would generally assign similar probabilities to normal questions they see a lot and have kind of developed a model for (e.g. elections) but would probably wildly diverge on some questions, including some that you might care more about (e.g. AI risk).

There's also a clear selection bias for calibration based on time-to-event. Say we're both superforecasters working on predicting the development of display technologies. You and I were both well calibrated on predicting the rise of flat screens and LED, the overhyped 3D movement, and HD/4K. We both predicted which technologies would take hold, and how long it would take them to mature to the point of becoming standard (or not).

You predict with 90% certainty that intra-cranial image projection will happen by 2075, but I predict this is only 10% likely. How can this be true if we're both well calibrated?

All our well-calibrated predictions are on technologies <25 years old, and were likely made with time horizons of less than 10 years. Yet this is the basis for appeals to our accuracy on much longer time horizons.

If someone predicted in 1980 that high definition flat screen TVs with a wider color gamut would be the norm in 2024, that would be interesting. If they accurately predicted date ranges within which these technologies would be adopted and mature, that would give me more confidence that this same person making predictions today had useful insights for 40+ years into the future.

When people say, "sure you predicted events within a 5 year time horizon, but I'm not convinced you're able to predict with accuracy 50 years out" that's not them irrationally ignoring the calibration curves. It's accurately discerning the limits of the data.

Assume there is a one-shot event with two possible outcomes, A and B. A generous, trustworthy person offers you a choice between two free deals. With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs. By adjusting X, and under some mild(ish) assumptions, the threshold value of X behaves a helluva lot like a probability.

Or at least, it better. If it doesn’t, then I can either come up with a set of bets you will take that guarantee you lose, or come up with a set of bets that you are collectively guaranteed to win that you won’t take any of.

You're absolutely right! Thanks for the catch. Deal B should be $1. In that case, p(A) is the inverse of X when X is set such that Deals A and B are equally attractive. Falls apart at the ends because people are terrible at small probabilities and large numbers, but it does suggest that there's something that acts an awful lot like a probability, even in the absence of a frequentist interpretation.

I feel that it's in a sense a continuation of the argument about whether it's OK to say that there's a 50% chance that bloxors are greeblic (i.e. to share raw priors like that). The section "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" specifically leans into that, and I disagree with it.

Suppose I ask you what are the chances that a biased coin flips heads. You tell me, 33%. It flips heads and I ask you again. In one world you say "50%", in another you say "34%", because in the first world most of your probability estimate came from your prior, while in the second you actually have a lot of empirical data.

That's two very different worlds. It is usually very important for me to know which one I'm in, because sure if you put a gun to my head and tell me to bet immediately, I should go with your estimate either way, but in the real world "collect more information before making a costly decision" is almost always an option.

There's nothing abstruse or philosophical about this situation. You can convey this extra information right this way, "33% but will update to 50% if a thing happens", with mathematical rigor. Though of course it would be nice if Bayesians recognized that it's important and useful and tried to find a better way of conveying the ratio of prior/updates that went into the estimate instead of insisting that a single number should be enough for everyone.

And so, I mean, sure, it's not anyone's job to also provide the prior/updates ratio, however it might look like, to go along with their estimates (unless they are specifically paid to do that of course), and people can ask them for that specifically if they are interested. But then you shouldn't be surprised that even the people who have never heard about the Bayes theorem still might intuitively understand that a number "50%" could come entirely from your prior and should be treated us such, and treat you with suspicion for not disclosing it.

Yes, it would be better to use a distribution to represent our beliefs instead of a single probability. When we have a fair coin, our distribution is over the options: 1) the coin will come up tails, 2) the coin will come up heads.

When we have an unknown coin, the distribution is over the kind of bias the coin could be, and without further knowlesge, we have a uniform distribution over [0,1] representing all the ways in which the coin could be biased. When someone asks us what the probability of heads on the next flip will be, we use this distribution to compute this probability.

Exactly. The distribution for p(heads) of a fair coin is a dirac function at 0.5.

The prior distribution for some process which returns either "head" or "tail" should be that p follows a constant distribution over the unit interval. (depending what you assume about the process, you might want to add small dirac peaks at 0 and 1.)

The expected value of p is 0.5 in either case, but the amount of knowledge you have is very different.

Of course, if you state your prior distribution, that will also tell people how much you would update.

In frequentist terms, if you've flipped a coin 100 times, you know more than if you flipped it 10 times, because you have more data and you can estimate the bias more precisely. The variance tells you this, or if you plot the probability distribution, you get this too.

If we switch to Bayesian reasoning, there is jargon about this, like having a "flat prior". Maybe we should use that more? Predictions about AI seem like the sort of thing where the probability distribution should be pretty flat?

Is there better terminology for talking about this informally?

I am glad, there is no new example of: ' "Do bronkfels shadwimp?" is binary, no one knows, thus: 50% chance. ' As in the "coin which you suspect is biased but you’re not sure to which side" - which IS 50%. - If A ask about bronkfels and knows/pretends to know: may be 50%. If no one around knows: Chance of a specific verb working with a specific noun; which is less than 1%. - "Are the balls in this bag all red?": around 4%. No surprise if they are, even if you did not know. - "Are 20% purple with swissair-blue dots?" I'd be surprised. And would not believe you did not know before. - "Are they showing pictures of your first student?": 50% really?

And in particular, it's certainly inconsistent to give a 50% chance to *all* of a) all the balls in this bag have a photo on them, b) all the balls in this bag have a photo of me on them, c) all the balls in this bag have a photo of Scott on them, d) all the balls in this bag have a photo of your first student on them.

> Whenever something happens that makes Joe Biden’s impeachment more likely, this number will go up, and vice versa for things that make his impeachment less likely, and most people will agree that the size of the update seems to track how much more or less likely impeachment is.

There is a close parallel here to the same issue in polling, where the general sense is that the absolute level determined by any given poll is basically meaningless - it's very easy to run parallel polls with very similar questions that give you wildly different numbers - but such polls move in tandem, so the change in polled levels over time is meaningful.

For some types of polls and some types of comparisons the opposite can happen. "Which nation is happiest", examined by asking "Are you happy or unhappy" on a 1-10 scale, where "happy" is in different languages with somewhat different meanings...can get messy... To some extent, similar questions in a single nation, spread across periods long enough that meanings shift, run into the same problem.

Something's been bothering me for a while, related to an online dispute between Bret Devereaux and Matthew Yglesias. Yglesias took the position that, if history is supposed to be informative about the present, then that information should come with quantified probabilities attached. Devereaux took the position that Yglesias' position was stupid.

I think Devereaux is right. I want to draw an analogy to the Lorenz butterfly:

It is famous for the fact that its state cannot be predicted far in advance. I was very underwhelmed when I first found a presentation of the effect - it's very easy to predict what will happen, as long as you're vague about it. The point will move around whichever pole it is close to, until it gets close to the other pole, at which point it will flip. Over time, it broadly follows a figure 8.

You can make a lot of very informative comments this way. At any given time, the point is going to lie somewhere within a well-defined constant rectangle. That's already a huge amount of information when we're working with an infinite plane. And at any given time, the point is engaged in thoroughly characteristic orbital behavior. The things that are hard to predict are the details:

1. At time t, will the point be on the left or on the right?

2. How close will it be, within the range of possibilities, to the pole that currently dominates its movement?

3. How many degrees around that pole will it have covered? (In other words, what is the angle from the far pole, through the near pole, to the point?)

4. When the point next approaches the transition zone, will it repeat another orbit around its current pole, or will it switch to the opposite pole?

If you only have a finite amount of information about the point's position, these questions are unanswerable, even though you also have perfect information about the movement of the point. But that information does let us make near-term predictions. And just watching the simulation for a bit will also let you make near-term predictions.

This seems to me like an appropriate model for how the lessons of history apply to the present. There are many possible paths. You can't know which one you're on. But you can know what historical paths are similar to your current situation, and where they went. The demand for probabilities is ill-founded, because the system is not stable enough, 𝘢𝘴 𝘵𝘰 𝘵𝘩𝘦 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴 𝘺𝘰𝘶’𝘳𝘦 𝘢𝘴𝘬𝘪𝘯𝘨, for probabilities to be assessable given realistic limitations on how much information it's possible to have.

Just because not everything can in principle be long-term predicted, it doesn't mean that nothing can be. Of course, there's the problem that we don't have a universal easy way to separate questions into those categories, but it's not like the only epistemological alternative of "throwing hands up in the air" is particularly enticing.

This is just chaos theory, yes? The sensitivity of deterministic systems to initial conditions is an important consideration and an awful lot of ink can be spilled on it, but it's well-trod ground and I don't think you're getting any metaphysical ramifications out of it.

A point I have tried to make before wrt Scott's enthusiasm for prediction markets is the difference between information (a) that exists, but is not yet known to you; or (b) that does not exist.

The market is a good way to attract existing information to a central location where it can be easily harvested. But it is not such a good way to learn information that doesn't exist. You can ask whatever question you want, and you'll learn 𝘴𝘰𝘮𝘦𝘵𝘩𝘪𝘯𝘨, but you won't necessarily learn anything 𝘢𝘣𝘰𝘶𝘵 𝘺𝘰𝘶𝘳 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯.

In cases where information about your question doesn't already exist and is difficult to produce, you should not expect to derive much if any benefit from a prediction market.

This framework is applicable here; I suppose I'm arguing that the Lorenz butterfly, and human history, are processes where, for many questions, probabilistic information doesn't really exist, can't exist, and looking for it - and even more so claiming to have it - is a mistake.

Throwing your hands up in the air may not be enticing or prestigious, but that won't make it incorrect.

Quantum mechanics tells us that some outcomes are inherently undetermined, and information about which outcome will happen does not exist in advance. But probability applies to such processes quite well. We can use probability to quantify our ignorance.

Mmm... I'm not sure what's going on there is that we're using probability to quantify our ignorance. I think we quantified our ignorance by counting the outcomes of a large number of experiments, and used probability to describe the quantified ignorance.

But that approach will only work when we can make a large number of similar (hopefully, identical) experiments.

> I suppose I'm arguing that the Lorenz butterfly, and human history, are processes where, for many questions, probabilistic information doesn't really exist, can't exist

"Probabilistic information doesn't really exist" doesn't make too much sense as a claim, let's simplify from a Lorenz system and take the double pendulum: If I tell you that the weight starts on the right, you can confidently say that after a tiny amount of time that the weight will still be on the right, and after a smallish amount of time it will be on the left, and maybe after that it exponentially flattens to a 50/50 chance. But a uniform distribution is most definitely probabilistic information! It's really interesting that the uncertainty in position approaches the maximum so quickly, and throughout the evolution of the system we can absolutely put bounds on the velocity of the weight, how far it is from the pivot, etc.

And of course, it's really really hard to make claims on what *can't* be known. One can sometimes get away with that in certain QM interpretations, but I see you're taking a different track on that downthread.

> In cases where information about your question doesn't already exist and is difficult to produce, you should not expect to derive much if any benefit from a prediction market.

When Scott says "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" he's straightforwardly correct. But we *do* still care about level of information for its own sake e.g. the explore/exploit dichotomy, and in the specific case of prediction markets it's extremely important to know the trade volume as well.

But I think probably the greatest lesson that superforecasting as a learnable discipline has to offer is that what we might think are unique one-off events to which no prior probability can be attributed... might just be a tweaked instantiation of the same patterns.

I'm failing to see why you can't assign probabilities to those questions about the point around the Lorenz butterfly. Maybe for a particular Lorenz butterfly, you find the point spends 54% of its time on the left wing. That's already a basis for giving a probability of which side the point will be on at time t. With more information and a better model, you could make even better predictions.

> Maybe for a particular Lorenz butterfly, you find the point spends 54% of its time on the left wing. That's already a basis for giving a probability of which side the point will be on at time t.

Yes, that's true.

> With more information and a better model, you could make even better predictions.

No, the fact that you can't do this is the point. Or to be more precise, with more information, you can make predictions about times in the near future, and there are very sharply diminishing returns on how far you can extend the reach of "the near future" by gaining more information. If you're halfway around one wing you can make an excellent prediction of how long it will take you to get to the transition zone. Everything immediately after that looks much foggier.

Applying these refinements to history gets us points like:

- We should be able to notice, and become alarmed, when a major event is almost upon us. But maybe not before then.

- The fact that a major event is almost upon us will not be good evidence that that event will actually happen.

- We should be unsurprised† by day-to-day developments almost all of the time.

- We should have no idea, this year, what next year will be like. Maybe the winter solstice will accelerate and occur in August. If that turns out to be what the future holds, we won't know until maybe late July. (Astronomy turns out to be better behaved than history.)

The general idea here, in terms of your analogy, is that we can know that the point spends 54% of its time on the left wing, but that the probabilities we can assign to "will the point be on the left wing at time t" will almost always be restricted to (a) 54%, (b) almost 0%, or (c) almost 100%. Notably, none of those possibilities allows for a prediction market to be helpful; they correspond to "we're quite sure what's about to happen" and "we have no idea what will happen farther out", but in both of those cases, that information is already well-known to everyone.

In the case of history, we can't know (the analogue of the fact) that the point spends 54% of its time on the left wing; the possibility space is too large. Imagine a butterfly that explores 17,000 dimensions instead of 2, with a "cycle length" much longer than all of human history. Without the ability to observe one cycle, how would we justify a figure of 54%? (This question has direct application to "futurist" predictions. What was the empirical probability, in 320 BC, that Iran would have a nuclear bomb in AD 2241? Is it different today than it was 2340 years ago? Is the _theoretical_ probability different today than it was 2340 years ago?)

We can do near-term prediction, identifying a point in the past close to present location, and seeing where it went over a short period. If we're presenting a strong analogy, we probably have a close match on a number of dimensions somewhere in the dozens. Are the other 16,000+ dimensions going to make a difference this time? Yes! Are they going to make a difference on the two dimensions we want to predict? Maybe!

† Why is Firefox flagging "unsurprised" for being misspelled? This is a common word without any alternative spelling!

I think that even in a chaotic system, I can make statistical predictions. The pinnacle of that is thermodynamics.

Climatologists can not predict if a Hurricane will happen two years from now with certainty, because Earth atmosphere is a chaotic system. They can still make a reasonable guess at the number of Hurricanes we will have in 2027.

Likewise, 583 can not divine who will win the next presidential election. However, there are statistical models based on historic data which can give you some probabilities for outcomes.

The existence of questions that do have well-defined probabilistic answers is not even theoretical evidence that other questions have equally well-defined probabilistic answers.

I listed many predictions that are easy to make. But when you're thinking about a question, you want to know whether the tools you're using apply to it. Assuming that they do is not a good approach.

I followed this argument a little at the time, and found Devereaux's position unconvincing. He wanted to say that historical knowledge was beneficial in informing decision-making, while never being pinned down on what conclusions were being justified by what knowledge. If you can't say anything about what is more likely in response to a given question, whether or not you express it numerically, then you are ignorant about that question however much erudition about other facts garlands your ignorance on the specific point. If it doesn't allow you to do that, how is the historical information improving your understanding of the point at issue? To my mind Bret had no answer to this question.

I don't know anything about the Lorenz butterfly, but as best I can read your post it seems that you are saying that there are some aspects of its movement that are reasonably predictable based on the past data about how it has moved and others that are not. In that case, presumably percentages or other such indications of how likely something is could be given at least for those things that can be effectively predicted. I didn't understand Bret to be saying that historical study allowed for probabilistic predictions on the answers to some questions but not to others.

If history is a good guide to at least some questions about the future, it should be possible to give indications of likelihood about those questions. Bret was arguing that this was never possible based on the study of history, yet that the study of history nonetheless informed those predictions in some valuable but unpindownable way. As someone who studied history at university I would like to believe that it has this kind of value, but if you're not willing to defend the claim that it actually improves your ability to predict events - and be put to the test accordingly - I am not convinced that is sustainable.

Quoting: “My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology”

Sorry if this is a naive question but if there are RCTs comparing vaccines to placebos (and not using other vaccines as placebos) with a long enough follow up to diagnose autism I would be keen to see a reference. Just asking because I thought while all the people claiming there is a link are frauds, we didn’t actually have evidence at the level of ‘sure about this as we are about anything in biology’.

Are there RCTs for "drinking monkey blood" relationship to autism? What about RCTs for breakfasts causing car crashes - a lot of people end up in a crash after eating breakfast, after all.

You can't expect to have RSTs for every made-up "connection", there has to be at least a plausible mechanism, or no useful work will ever be done. Wakefield literally made up the numbers for his publication, to the everlasting shame of whoever peer-reviewed his garbage and approved it for publishing.

> Are there RCTs for "drinking monkey blood" relationship to autism? What about RCTs for breakfasts causing car crashes - a lot of people end up in a crash after eating breakfast, after all.

Yeah, but the point is that Scott specifically distinguishes vaccines from these sorts of cases, saying:

> My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology

Sure, and if you said 'not every well-conducted study has to be that specific kind', that would be a valid response.

But responding to a request for RCTs by comparing it to the monkey blood example ignores that Scott had specifically contrasted the level of evidence between the two, and said that there was so much evidence that "we're as sure about this as we are about anything in biology".

The claim that vaccines don't cause autism particularly *un*like the claim that monkey blood doesn't cause autism, specifically in that it *does* have positive evidence against it. So saying, to a request for that evidence, 'well monkey blood causing autism doesn't have that kind of evidence either' doesn't seem like an appropriate response.

I was just under the impression that RCTs are the gold standard and since Scott said that we are about as sure of this as anything in biology, I assumed that meant multiple large, long term RCTs.

But just asked because he said dozen of studies exist, so I was curious if anyone could expand on that a bit.

There are few medical questions about which we are so sure as about this. I remember an interview about recommendation for funding agencies. The recommendation explicitly mentioned that no more money should be spent on the vaccine-autism question because this was one of the few questions that were answered beyond any reasonable doubts.

The thing is that at some point it seemed like a realistic claim that vaccines caused autism (because some fraud paper made up data), so this was studied really intensively for several decades. It is one of the most researched questions in medicine, and the results were super-clear after the fraud papers had been retracted.

I would absolutely not say “as sure as we are about anything in biology”. Nothing based on a few dozen studies of tens of thousands of individuals could make us as confident about an effect size being zero as we are about the general truth of evolution, or that humans have eyes or whatever.

Reading that meta-analysis, they report the “odds ratio” of autism among vaccinated and unvaccinated children (that is, the odds of autism with vaccination (number with autism divided by number without) divided by the odds without vaccination (number with aitism divided by number without)) as “OR”. Almost all the studies had a point estimate of OR that was less than 1, and all of them had confidence intervals that included 1, except for a couple studies where the entire confidence interval was below 1.

I don't have any objections to phrasing things as "x% likely", and I do this colloquially sometimes, and I know lots of people who take this very seriously and get all the quoted benefits from it, and my constant thought when asked to actually do it myself or to take any suggested "low probability" numbers at face value is "oh God", because I'm normed on genetic disorders.

Any given disorder is an event so low-probability that people use their at-birth prevalences as meme numbers when mentioning things they think are vanishingly unlikely ("Oh, there's no *way* that could happen, it's, like, a 0.1% chance"). It turns out "1 in 1000" chances happen! When I look at probability estimates as percentages, I think of them in terms of "number of universes diverging from this point for this thing to likely happen/not happen". Say a 5% chance, so p=1–(0.95)^n. The 50%-likelihood-this-happens-in-one-universe marker is about p=1–(0.95)^14, which is actually a little above 50%. So 14 paths from this, it'll "just above half the time" happen in one of those paths? ~64% likelihood in 20 paths, ~90% in 45 paths (often more than once, in 45 paths). The probabilities people intuitively ascribe to "5%" feel noticeably lower than this.

"There's a 1 in 100,000 chance vaccines cause autism"? I've known pretty well, going about my daily life, absolutely unselected context, not any sort of reason to overrepresent medically unusual people, someone with a disorder *way* rarer than 1/100k! Probability estimates go much lower than people tend to intuit them. We think about "very unlikely" and say numbers that are more likely than people's "very unlikely" tends to be, when you push them on it, or when you watch how they react to those numbers (people estimating single-digit-percentage chances of a nuclear exchange this year don't seem to think of it as that likely).

I would never get on a plane that had a 1 in 1,000 chance of crashing, and wouldn’t like a 1 in 100,000 chance either! But I think for a general claim like “vaccines cause autism”, it’s very hard to reasonably come to probabilities lower than those, while it’s much easier for single cases like “this plane will crash”, when we can repeat the trial hundreds of thousands of times a day around the world.

edited Mar 21The funniest thing about those arguments you're rebutting is that the average of a large number of past 0-or-1 events is only an estimate of the probability of drawing a 1. In other words, the probabilities they're saying are the only ones that exist, are unknowable!

That's right. Even in a simple case of drawing balls from a jar where you don't know how many balls of each color there are, the last ball has a 100% 'real' probability of being a specific color, you just don't know which one.

I'd say that all probabilities are about "how to measure a lack of information". If you have a fair coin it's easy to pretend that probability is objective because any people present have the exactly same information. But as long as there are information differences then different people will give different probabilities. And it's not the case that some of them are wrong. They just lack different information. But for some reason people expect that if you're giving a number instead of a feeling then you are claiming objectivity. They're just being silly. There are no objective probabilities outside of quantum mechanics and maybe not even there. It's all lack of information, it's just that there are some scenarios where it's easy for everyone to have exactly the same lack.

Okay, after years of this I think I have a better handle on what's going on. It's reasonable to pull probabilities because you can obviously perform an operation where something obviously isn't 1% and obviously isn't 99%, so then you're just 'arguing price.' On the other hand, it's reasonable for people to call this out as secretly rolling in reference class stuff and *not* having done the requisite moves to do more complex reasoning about probabilities, namely, defining the set of counterfactuals you are reasoning about and their assumptions and performing the costly cognitive operations of reasoning through those counterfactuals (what else would be different, what would be expect to see?). When people call BS on those not showing their work, they are being justly suspicious of summary statistics.

The thing is, if they’re wanting to call out those things, they should do that, rather than attacking the concept of probability wholesale.

Like, there are good reasons to not invest in Ponzi schemes, but “but money is just a made-up concept” is one of the least relevant possible objections.

👏🏼 👏🏼 👏🏼

A legend about accuracy versus precision that you may have heard but that I think is applicable:

As the story* goes: When cartographers went to measure the height of Mount Everest, they had a hard time because the top is eternally snow-capped, making it nigh-impossible to get an exact reading. They decided to take several different readings at different times of year and average them out.

The mean of all their measurements turned out to be exactly 29,000 feet.

There was concern that such a number wouldn’t be taken seriously. People would wonder what digit they rounded to. A number like that heavily implies imprecision. The measures might explain and justify in person, but somewhere down the line that figure ends up in a textbook (not to mention Guinness), stripped of context.

It’s a silly problem to have, but it isn’t a made-up one. Their concerns were arguably justified. We infuse a lot of meaning into a number, and sometimes telling the truth obfuscates reality.

I’ve heard more than one version of exactly how they arrived at 29,028 feet, instead—the official measurement for decades. One account says they took the median instead of the mean. Another says they just tacked on some arbitrary extra.

More recently, in 2020, Chinese and Nepali authorities officially established the height to be 29,031 feet, 8.5 inches. Do you trust them any more than the cartographers? I don’t.

All of which is to say, it makes sense that our skepticism is aroused when we encounter what looks like an imbalance of accuracy and precision. Maybe the percentage-giver owes us a little bit more in some cases.

* apocryphal maybe, but illustrative, and probably truth-adjacent at least

>More recently, in 2020, Chinese and Nepali authorities officially established the height to be 29,031 feet, 8.5 inches. Do you trust them any more than the cartographers? I don’t.

For your amusement: I just typed

mount everest rising

into Google, and got back

>Mt. Everest will continue to get taller along with the other summits in the Himalayas. Approximately 2.5 inches per year because of plate tectonics. Everest currently stands at 29,035 feet.Jul 10, 2023

They should have just measured in meters, problem solved.

Mentioning Yoshua Bengio attracts typos:

"Yoshua Bengio said the probability of AI causing a global catastrophe everybody is 20%"

"Yosuha Bengio thinks there’s 20% chance of AI catastrophe"

So it seems, indeed.

edited Mar 21> you’re just forcing them to to say unclear things like “well, it’s a little likely, but not super likely, but not . . . no! back up! More likely than that!”, and confusing everyone for no possible gain.

There's something more to that than meets the eye. When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many. 20%? 30%? Would anyone stake much on the difference?

That's why (in an alternate world where weird new systems were acceptable for twitter dialouge), writing probabilities in decimal binary makes more sense. 0b0.01 express 25% with no false implication that it's not 26%. Now, nobody will learn binary just for this, but if you read it from right to left, it says "not likely (0), but it might happen (1)." 0b0.0101 would be, "Not likely, but it might happen, but it's less likely than that might be taken to imply, but I can see no reason why it could not come to pass." That would be acceptable with a transition to writing normal decimal percentages after three binary digits, when the least significant figure fell below 1/10.

I like this idea. And you can add more zeroes on the end to convey additional precision! (I was going to initially write here about an alternate way of doing this that is even closer to the original suggestion, but then I realized it didn't have that property, so oh well. Of course one can always manually specificy precision, but...)

but isn't this against the theory? if we were mathematically perfect beings we'd have every probability to very high precision regardless of the "amount of evidence". before reading this post I was like hell yea probabilities rock, but now I'm confused by the precision thing.

I guess we might not know until we figure out the laws of resource-constrained optimal reasoning 😔

Unrelated nitpick, but we already know quite a lot about the laws of resource-constrained optimal reasoning, for example AIXI-tl, or logical induction. It's not the end-all solution for human thinking, but I don't think anyone is working on resource-constrained reasoning on the hope of making humans think like that, because humans are hardcodedly dumb about some things.

edited Mar 22Is there a tl;dr; for what resource-constrained reasoning says about how many digits to transmit to describe a measurement with some roughly known uncertainty? My knee-jerk reaction is to think of the measurement as having maybe a gaussian with sort-of known width centered on a mean value, and that reporting more and more digits for the mean value is moving a gaussian with a rounded mean closer and closer to distribution known to the transmitter.

Is there a nice model for the cost of the error of the rounding, for rounding errors small compared to the uncertainty? I can imagine a bunch of plausible metrics, but don't know if there are generally accepted ones. I assume that the cost of the rounding error is going to go down exponentially with the number of digits of the mean transmitted, but, for reasonable metrics, is it linear in the rounding error? Quadratic? Something else?

edited Mar 23If you think your belief is going to change up or down by 5% on further reflection, that's your precision estimate. Rounding error can be propagated through the normal techniques for error propagation (see any source on scientific calculation).

There is no rule or formula for the precision of resource-constrained reasoning, because you aren't guaranteed to order the terms in the process of deliberation from greatest to smallest. Instead, I use repeated experiments as my example of a belief you're expecting to change within known bounds in the future, to show why most probabilities have limited precision.

Many Thanks!

>Instead, I use repeated experiments as my example of a belief you're expecting to change within known bounds in the future, to show why most probabilities have limited precision.

Sure, that makes sense.

>Rounding error can be propagated through the normal techniques for error propagation (see any source on scientific calculation).

True. Basically propagating through the derivatives of whatever downstream calculation consumes the probability distribution estimate... For the case of a bet, I _think_ this comes down to an expected cost per bet (against an opponent who has precisely calibrated probabilities) that is the value of the bet times the difference between the rounded mean and the actual mean. Is that it?

edited Mar 22If you are trying to figure out whether a coin is fair, the average number of heads per flip among a large number of experimental trials serves as your best estimate of its bias towards heads. Although you have that estimate to an infinite number of digits of precision, your estimate is guaranteed to change as soon as you flip another coin. That means the "infinite precision of belief," although you technically have it, is kind of pointless.

To put it another way, if you expect the exact probabilistic statement of your beliefs to change as expected but unpredictable-in-the-specifics new information comes in, such as further measurements in the presence of noise, there's no point in printing the estimate past the number of digits that you expect to stay the same.

Here's a way to interpret that "infinite precision of belief": if you bet according to different odds than precisely that estimator, you'll lose money on average. In that sense, the precision is useful however far you compute it, losing any of that precision will lose you some ability to make decisions.

Your conclusion about forgoing the precision that is guaranteed to be inexact is wrong. Consider this edge case: a coin will be flipped that is perfectly biased; it either always comes up head or always tail; you have no information about which one it is biased towards. The max-entropy guess in that situation is 1:1 head or tail, with no precision at all (your next guess is guaranteed to be either 1:0 or 0:1). Nonetheless, this guess still allows you to make bets on the current flip whereas you'd just refuse any bet if you followed your advice.

edited Mar 23> losing any of that precision will lose you some ability to make decisions.

The amount of money you'd lose though opportunity cost in a betting game like that decreases exponentially with the number of digits of precision you're using. To quote one author whose opinions on the subject agree with mine,

"That means the 'infinite precision of belief,' although you technically have it, is kind of pointless."

;)

Compare this situation with the issue of reporting a length of a rod that you found to be 2.015mm, 2.051mm, and 2.068mm after three consecutive measurements. I personally would not write an average to four digits of precision.

Wouldn't it be best to assign a range to represent uncertainty? Or give error bars?

So, for a generic risk you could say something like 6(-2/+5)% chance of X outcome occurring.

Yes, and the next step is to give a probability distribution.

I'm wondering how to interpret a range or distribution for a single future event probability. My P(heads) for a future flip of a fair coin, and for a coin with unknown fairness, would both be 0.5. In both cases I have complete uncertainty of the outcome. Any evidence favoring one outcome or the other would shift my probability towards 0 or 1. Even knowing all parameters of the random process that determines some binary outcome, shouldn't I just pick the expected value to maximize accuracy? In other words, what kind of uncertainty isn't already expressed by the probability?

It's epistemic vs aleatory uncertainty. The way the coin spins in mid air is aleatory i.e. "true random", while the way it's weighted is a fact that you theoretically could know, but you don't. The distribution should represent your epistemic uncertainty (state of knowledge) about the true likelihood of the coin coming up heads. You can improve on the epistemic part by learning more.

Sometimes it gets tough to define a clear line between the two - maybe Laplace's demon could tell you in advance which way the fair coin will go. But in many practical situations you can separate them into "things I, or somebody, might be able to learn" and "things that are so chaotic and unpredictable that they are best modeled as aleatory."

Epistemic vs aleatory is just fancy words for Bayesian vs frequentist, no? Frequentists only measure aleatory uncertainty, Bayesian probability allows for both aleatory and epistemic

Hmmm... frequentists certainly acknowledge epistemic uncertainty. I guess they're sometimes shy about quantifying it. But when you say p < 0.05, that's a statement about your epistemic uncertainty (if not quite as direct as giving a beta distribution).

It's the probability that you will encounter relevant new information.

You could read it as "My probability is 0.5, and if I did a lot of research I predict with 80% likelihood that my probability would still lie in the range 0.499--0.501," whereas for the coin you suspect to be weighted that range might be 0.9--0.1 instead.

Small error bars mean you predict that you've saturated your evidence, large error bars mean you predict that you could very reasonably change your estimate if you put more effort into it. With a coin that I have personally tested extensively, my probability is 0.5 and I would be *shocked* if I ever changed my mind, whereas if a magician says "this is my trick coin" my probability might be 0.5 but I'm pretty sure it won't be five minutes from now

Single future event probabilities, in the context of events that are unrelated to anything you can learn more about during the time before the event, are the cases where the meaning of "uncertain probability" is less clear. That is why rationalists, who prioritize thinking about AI apocalypses and the existence of God, will tell you that "uncertain probability" doesn't mean anything.

However in science, the expectation that your belief will change in the future is the rule, not the exception. You don't know which way it will change, but if you're aware of the precision of your experiments so far you'll be able to estimate by how much it's likely to change. That's what an "uncertain probability" is.

This is the language way to interpret probabilities, and so is correct. If you say you found half of people are Democrat, it means something different than 50.129% of people you found to be Democrat.

Yet it's subject to abuse, especially to those with less knowledge of statistics, math, or how to lie. If my study finds 16.67% of people to go to a party store on a Sunday, it's not obvious to everyone that my study likely had only six people in it.

There are at least three kinds of lies: lies, damned lies, and statistics.

Why not “1/4”?

Because it's hard to pronounce 1/4 with your tongue in your cheek. ;-)

A central issue when discussing significant digits is the sigmoidal behaviour, eg the difference between 1% and 2% is comparable to the difference between 98% and 99%, but NOT the same as the difference between 51% and 52%. So arguments about significant digits in [0, 1] probabilities are not well-founded. If you do a log transformation you can discuss significant digits in a sensible way.

What would I search for to get more information on that sigmoidal behavior as it applies to probabilities? I've noticed the issue myself, but don't know what to look for to find discussion of it. The Wikipedia page for 'Significant figures' doesn't (on a very quick read) touch on the topic.

Try looking up the deciban, the unit for that measure: https://en.m.wikipedia.org/wiki/Hartley_(unit)

Ah, yeah, that does seem like a good starting point, thanks. For anyone else who's interested, this short article is good:

http://rationalnumbers.james-kay.com/?p=306

Many Thanks!

This has been on my mind recently, especially when staring at tables of LLM benchmarks. The difference between 90% and 91% is significantly larger than 60% and 61%.

I've been mentally transforming probabilities into log(p/(1-p)), and just now noticed from the other comments that this actually has a name, "log odds". Swank.

Why are you adding this prefix “0b0” to the notation? If you want a prefix that indicates it’s not decimal, why not use something more transparent, like “bin” or even “binary” or “b2” (for “base 2”)?

That notation is pretty standard in programming languages. I do object to this being called "decimal binary" though. I'm not sure what exactly to call it, but not that. Maybe "binary fractions".

I think "binary floating-point" is probably the least confusing term.

It's not actually floating point though. That'd be binary scientific notation, like 0b1.011e-1100.

edited Mar 21I kind of like it in principle, but...

— Why not use 25% then? That's surely how everyone would actually mentally translate it and (think of/use) it: "ah, he means a 25% chance."

— Hold on, also I realize I'm not sure how ².01 implies precision any less than 25%: in either case, one could say "roughly" or else be interpreted as meaning "precisely this quantity."

— Per Scott's original post, 23% often *is*, in fact, just precise enough (i.e., is used in a way meaningfully distinct from 22% and 24%, such that either of those would be a less-useful datum).

— [ — Relatedly: Contra an aside in your post, one sigfig is /certainly/ NOT too many: 20% vs 30% — 1/3 vs 1/5 — is a distinction we can all intuitively grasp and all make use of IRL, surely...!]

— And hey, hold on a second time: no one uses "94.284" or whatever, anyway! This is solving a non-existent problem!

-------------------------

— Not that important, and perhaps I just misread you, but the English interpretation you give of ².0101 implies (to my mind) an event /less likely/ than ².01 — (from "not likely but maybe" to "not likely but maybe but more not likely than it seems even but technically possible") — but ².0101 ought actually be read as *more* sure, no? (25% vs 31.25%)

— ...Actually, I'm sorry my friend, it IS a neat idea but the more I think about those "English translations" you gave the more I hate them. I wouldn't know WTF someone was really getting at with either one of those, if not for the converted "oh he means 25%" floatin' around in my head...

> Per Scott's original post, 23% often *is*, in fact, just precise enough

I strongly object to your use of the term "often." I would accept "occasionally" or "in certain circumstances"

(Funnily enough, the difference between "occasionally" and "in certain circumstances" is what they imply about meta-probability. The first indicates true randomness, the second indicates certainty but only once you obtain more information)

I intuitively agree that any real-world probability estimate will have a certain finite level of precision, but I'm having trouble imagining what that actually means formally. Normally to work out what level of precision is appropriate, you estimate the probability distribution of the true value and how much that varies, but with a probability, if you have a probability distribution on probabilities, you just integrate it back to a single probability.

One case where having a probability distribution on probabilities is appropriate is as an answer to "What probability would X assign to outcome Y, if they knew the answer to question Z?" (where the person giving this probability distribution does not already know the answer to Z, and X and the person giving the meta-probabilities may be the same person). If we set Z to something along the lines of "What are the facts about the matter at hand that a typical person (or specifically the person I'm talking to) already knows?" or "What are all the facts currently knowable about this?", then the amount of variation in the meta-probability distribution gives an indication of how much useful information the probability (which is the expectation of the meta-probability distribution) conveys. I'm not sure to what extent this lines up with the intuitive idea of the precision of a probability though.

I was thinking something vaguely along these lines while reading the post. It seems like the intuitive thing that people are trying to extract from the number of digits in a probability is "If I took the time to fully understand your reasoning, how likely is it that I'd change my mind?"

In your notation, I think that would be something like "What is the probability that there is a relevant question Z to which you know the answer and I do not?"

edited Mar 23It is really easy to understand what the finite number of digits means if you think about how the probability changes with additional measurements. If you expect the parameter to change by 1% up or down after you learn a new fact, that's the precision of your probability. For example, continually rolling a loaded dice to figure out what its average value is involves an estimate that converges to the right answer at a predictable rate. At any point in the experiment, you can calculate how closely you've converged to the rate of rolling a 6 within 95% confidence intervals.

It's only difficult to see this when you're thinking about questions that have no streams of new information to help you answer them - like the existence of God, or the number of aliens in the galaxy.

I like Scott's wording of "lightly held" probabilities. I think this matches what you are describing about the sensitivity of a probability estimate to the answer of an as-yet unanswered question Z.

Okay, hear me out: only write probabilities as odds ratios (or fractions if you prefer), and the number of digits is the number of Hartleys/Bans of information; you have to choose the best approximation available with the number of digits you're willing to venture.

Less goofy answer: Name the payout ratios at which you'd be willing to take each particular side of a small bet on the event. The further apart they are, the less information you're claiming to have.

The idea of separate bid and ask prices is a very good way to communicate this concept to finance people, thanks for that.

edited Mar 22>When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many.

Ok, though isn't this question orthogonal to whether the number represents probabilities?

This sounds more like a general question of whether to represent uncertainty in some vanilla measurement (say of the weight of an object) with the number of digits of precision or guard digits plus an explicit statement of the uncertainty. E.g. if someone has measured the weight of an object 1000 times on a scale on a vibrating table, and got a nice gaussian distribution, and reported the mean of the distribution as 3.791 +- 0.023 pounds (1 sigma (after using 1/sqrt(N))), it might be marginally more useful than reporting 3.79 +- 0.02 if the cost of the error from using the rounded distribution exceeds the cost of reporting the extra digits.

edited Mar 23Yes, this is exactly the same. In your example you are measuring a mass, in my examples you're measuring the parameter of a Bernoulli distribution. For practical reasons, there's always going to be a limited number of digits that it's worth telling someone when communicating your belief about the most likely value of a hidden parameter.

Many Thanks!

This is one of those societal problem where the root is miscommunication. And frankly it's less of a problem than just the fact of life. I remember Trevor Noah grilling Nate silver that how could Trump win presidency when he predicted that Trump has only in 1/3 chance of winning. It was hilarious is some sense. Now this situation is reverse of what Scott is describing where the person using the probability is using it accurately but the dilemma is same: lack of clear communication.

Lack of clear/logical thinking on Trevor Noah's side.

Yes but most people think of probability like that. They think that probability of below 50% equates to an event being virtually impossible. It's like how many scientists make stupid comments on economics without understanding it's terms.

Most people ... . Most people - outside Korea and Singapore - can not do basic algebra (TIMMS). Most people are just no good with stochastics in new contexts. Most journalists suck at statistics. Many non-economists do not get most concepts of macro/micro - at least not without working on it. Does not make the communication of economists or mathematicians or Nate Silver less clear. 37+73 is 110. Clear. Even if my cat does not understand. - Granted, Nate on TV could have said: "Likely Hillary, but Trump has a real chance." - more adapted to IQs below 100 (not his general audience!). Clearer? Nope.

"more adapted to IQs below 100 (not his general audience!)"

Huh, have you seen the comments section on his substack? It's an absolute cesspool. I don't think I've read another substack with such a high proportion of morons and/or trolls in the comments (though I haven't read many).

I did and agree. He is not writing those comments, is he? - Writing: "who will win: Trump or Biden" will attract morons/trolls. Honey attracts flies just as horseshit does. - MRU comments are mostly too bad to read either.

I didn't blame him for the comments (though I assume he's publically decided not to moderate them?), I was responding to "his general audience".

I agree that the comments are a cesspool, but as far as I know its idiocy borne out of misdirected intellect rather than actual lack of mental horsepower.

If I see a mistake, oftentimes it's something like "Scott says to take X, Y and Z into account. I am going to ignore that he has addressed Y and Z and claim that his comments in X are incorrect!" I would expect someone dumb to not even comprehend that X was mentioned, much less be able to give a coherent (but extremely terrible) argument for this.

I think it's also partially confounded by the existence of... substack grabbers? Don't know what a good term for this type of person is. But when I see a low quality comment, without the background that an ACX reader """should""" have, I'll scroll up and see it's a non regular writing a substack. Which I would guess means that they're sampling from the general substack or internet population.

I have encountered people who use the term "50/50" as an expression of likelihood with no intent of actual numerical content but merely as a rote phrase meaning "unpredictable." On one occasion I asked for a likelihood estimate and was told "50/50," but when I had them count up past occurrences it turned out the ratio was more like 95/5.

I still intuitively feel this way. I know that 40% chance things will happen almost half the time, but I can't help but intuitively feel wronged when my Battle Brothers 40% chance attack doesn't hit.

Charles Babbage said, "On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

Hey sometimes this works now with AI - an LLM figures out what you meant to ask and answers that instead of your dumb typo. Those guys were just 200 years ahead of their time.

Babbage apparently thought the questions [from two MPs, presumably in 1833] were asked in earnest. If so, I think the likeliest explanation is just Clarke's 3rd Law: "Any sufficiently advanced technology is indistinguishable from magic."

I think this is mainly conflict theory hiding behind miscommunication. Because there's no agreed upon standard of using numbers to represent degrees of belief (even the very concept of "degrees of belief" is pretty fraught!), people feel that they have license to dismiss contrary opinions outright and carry on as they were before.

edited Mar 21There is an argument to be made that we should stop using probabilities in public communications if people don't understand how they work.

edited Mar 21Sometimes you might want to transmit useful information to the non-retarded part of the public even at the cost of riling up the others.

What's the probability of that argument working?

Yes and I suspect this problem gets worse when you start getting to things with 10% probability happening, or things with 90% probability not happening, and so on.

I hesitate to imagine what fraction of the public can deal with independent random variables...

I think the answer to that is to try to raise the sanity waterline.

Also, I am not sure that the issue here is just probability illiteracy.

If Nate had predicted that there is a 1/3 chance of a fair dice landing on 1 or 2, nobody would have batted an eye if two came up. Point out that the chances of dying in a round of Russian roulette are just 1/5 or so, and almost nobody will be convinced to play, because most people can clearly distinguish a 20% risk from a 0% risk.

Part of the problem is that politics is the mind killer. There was a clear tendency of the left to parse "p(Trump)=0.3" (along with the lower odds given by other polls) as "Trump won't win", which was a very calming thought on the left. I guess if you told a patient that there was a 30% chance that they had cancer, they would also be upset if you revised this to 100% after a biopsy. (I guess physicians know better than to give odds beforehand for that very reason.)

Sooner or later, everyone wants to interpret probability statements using a frequentist approach. So, sure you can say that the probability of reaching Mars is 5% to indicate that you think it's very difficult to do, and you're skeptical that this will happen. But sooner or later that 5% will become the basis for a frequentist calculation.

If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. It's just inevitable.

It's also very obscure how to assign numerical probabilities to degrees of belief. For instance, suppose we all agree that there is a low probability that we will travel to Mars by 2050. What's the probability value for that? Is it 5%, 0.1%, or 0.000000001%? How do we adjudicate between those values? And how do I know that your 5% degree of belief represents the same thing as my 5% degree of belief?

edited Mar 21Assuming you have no religious or similar objections to gambling, the standard mutually intelligible definition of a 5% degree of belief is a willingness to bet a small fraction of what's in your wallet (for example, one cent if you have $20), at better than 20:1 odds.

It has to be a small fraction because the total value you place on your money is only linear for small changes around a given number of dollars, otherwise it tends to be logarithmic for many.

Exactly

> the standard mutually intelligible definition of a 5% degree of belief is a willingness to bet a small fraction of what's in your wallet (for example, one cent if you have $20), at better than 20:1 odds.

> It has to be a small fraction because the total value you place on your money is only linear for small changes around a given number of dollars, otherwise it tends to be logarithmic for many.

This isn't right; by definition, people are willing to place fraudulent bets that only amount to "a small fraction of what's in their wallet". They do so all the time to make their positions appear more certain than they really are; your operationalization is confounded by the fact that support has a monetary value independently of whether you win or lose. Placing a bet buys you a chance of winning, and it also buys you valuable support.

You can amend this to an anonymous bet.

edited Mar 21It still won't work; consider the belief "I will win the lottery."

Fundamentally, it is correct to say that a 5% degree of belief indicates willingness to bet at 20:1 odds in some window over which the bet size is not too large to be survivable and not too small to be worthwhile, but it is not correct to say that willingness to bet indicates a degree of belief (which is what you're saying when you define degree of belief as willingness to bet), and that is particularly the case when you specify that the amount of the bet is trivial.

edited Mar 21Sure. Also, I do think that it's reasonable to invest trivial amounts of money in fair lottery tickets given certain utility functions. For example, if a loss is negligible but you value extremely highly the possibility of imminent comfortable retirement. I don't do this myself because I believe that in my part of the world lotteries are rigged and the chance to win really big is actually zero.

> Also, I do think that it's reasonable to invest trivial amounts of money in fair lottery tickets given certain utility functions.

I agree. But this is not compatible with the definition of "degree of belief" offered above; that definition requires that lottery ticket purchasers do not believe the advertised odds are correct.

"I will win the lottery" has a large fantasy component, where people spend time thinking about all the things they could buy with that money.

An anonymous bet with a small fraction of your wealth does not have that fantasy component. "Look at all the things I could buy with this 20 cents I won!" just doesn't do the same thing.

Because it's anonymous you're not pre-committing anything. I suppose some people might brag about making that bet?

This is spot on, and a good illustration of why I believe prediction markets are going to have problems as they scale up in size and importance.

Is it actually a problem for prediction markets, though? People betting at inaccurate odds for emotional (or other) reasons just provides a subsidy for the people who focus on winning, resulting in that market more accurately estimating the true likelihood. Certainly markets can be inaccurate briefly, or markets with very few participants can be inaccurate for longer, but it's pretty easy to look past short-term fluctuations.

Or maybe you're thinking of it being a problem in some different way?

> standard mutually intelligible definition

FWIW this is just the de Finetti perspective, and you could have others, more based around subjective expectations.

Like, I think the reduction works a bunch of the time, but I don't think you can reduce subjective belief to willingness to bet

Unless the enemy has studied his Agrippa, which I have.

Of course, this assumes that people will be willing to bet on an outcome given some odds ratio.

It might well be that some people would not rationally bet on something.

For example, given no further information, I might not be willing to bet on "Alice and Bob will be a couple a month from now" if I have no idea who Alice and Bob are and anticipate that other people in the market have more information. Without knowing more (are Alice and Bob just two randomly selected humans? Do they know each other? Are they dating?) the Knightian uncertainty in that question is just too high.

Thank you! You are making **exactly** my point -- that although people start out by talking about subjective "degrees of belief", sooner or later they will fall into some sort of frequentist interpretation. There can be no purer expression of this than making an argument about betting, because ultimately you are going to have to appeal to some sort of expected value over a long run of trials, which is by definition a frequentist interpretation.

Doesn't it just mean if we "ran" our timeline 1000 times, I predict we reach Mars 50 in 50 of those timelines?

In other words, it is still frequentist in a sense, but over hypothetical modalities.

Not necessarily. I think the probability that P=NP is about 5%, but I don’t think that if we “ran” the universe that many times it would be true in 5% of them - it would either be true in all or none.

Instead it means that, if I had to do something right now whose outcome depended on P being equal NP, I would do it if the amount by which it made things better if they were = is more than 20 times better than the amount by which it made things worse if they were unequal. I need some number or other to coordinate my behavior here and guarantee that I don’t create a book of bets that I am collectively guaranteed to lose (like if I paid 50 cents for a bet that gives me 90 cents if horse A wins and also paid 50 cents for a bet that gives me 90 cents if horse A loses). But the number doesn’t have to express anything about frequency - I can use it even for a thing that I think is logically certain one way or the other, if I don’t know which direction that is.

I think the P=NP example (and the Mars example, if you believe in a deterministic universe) can still be approached this way if we define 'rerunning the timeline' as picking one of the set of possible universes that would produce the evidence we have today.

It can be if you accept logical impossibilities as “one of the set of possible universes that would produce the evidence we have today”.

I'm not sure I follow. If the evidence we have today is insufficient to show that P != NP, how is it logically impossible for universes to exist where we have the same evidence and P does or does not equal NP?

Most people suspect that the mathematical tools we have *are* sufficient to show one of the directions of the P and NP conjecture, it's just that no human has yet figured out how to do it.

Even if it turns out not to be provable, it still either is or isn't, right? Universes where the starting conditions and the laws of physics are different are easy to imagine. Universes where *math* is different just bend my mind in circles. If you changed it to switch whether P=NP, how many other theorems would be affected? Would numbers even work anymore?

Interesting that you should say that. I was thinking about the timelines of possible universes going *forward from the present* — and when it comes to the Mars example, what is the set of possible universes that will *prevent* us from getting to Mars in 2050 vs the set that will *allow* us to get to Mars in 2050? think we can agree (or maybe not?) that there is an infinity of universes for the former, but a smaller infinity of universes for the latter. After all, the set of universes where the sequence of events happen in the proper order to get us to Mars would be smaller than the set of universes where the sequence of events didn't happen (or couldn't happen). If these were sets of numbers we could create a bijection (a one-to-one correspondence) between their elements. But no such comparison is possible between these two sets, and the only thing we can say with surety is that they don't have the same cardinality. Trying to calculate the proportionality of the two sets would be impossible, so determining the probability of universes where we get to Mars in 2050 and universes that don't get us to Mars in 2050 would be a nonsensical question. I'm not going to die on this hill, though. Feel free to shoot my arguments down. ;-)

I think this is a fascinating objection, thanks.

I've never been any good at reasoning about infinities in this way (Am I so bad at math? No, it's the mathematicians who are wrong!), but I've spotted a better out so excuse me while I take it:

I do disagree that these are infinite sets; I think they're just unfathomably large. If there are 2^(10^80) possible states of the universe one Planck second after any given state, the number of possible histories at time t quickly becomes unrepresentable within our universe. It's a pseudoinfinite number that is not measurably different from infinity in a practical sense, but saves us all of the baggage of an infinite number in the theoretical sense.

If you accept that premise (and I don't blame you if you don't), I believe we're allowed to do the standard statistics stuff like taking a random sample to get an estimate without Cantor rolling in his grave.

I like your counter-argument. But I'll counter your counter with this — if the number of potential histories of our universe going forward is impossible to represent within our universe, then it's also impossible to represent the chances of getting to Mars in 2050 vs not getting to Mars in 2050. Ouch! My brain hurts!

> Not necessarily. I think the probability that P=NP is about 5%, but I don’t think that if we “ran” the universe that many times it would be true in 5% of them - it would either be true in all or none.

I think this is a semantic/conceptual disagreement. I think there are two points where we can tease it apart:

* You're thinking of the world as deterministic, and I as predicated on randomness to a significant degree. If the future depends on randomness, then it makes no sense to claim it would be true in all or none. Whereas if the future is determine by initial conditions and laws of nature, then yes, it will be. In which case:

* You can adapt my conceptualisation such that it survives a deterministic world. A determinist would believe that the future is determined by known and unknown determinants, but nonetheless fixed, ie the laws of nature and initial conditions both known and unknown. To give x a 5% probability, then, is to say that if, for each "run" of the universe you filled in the unknown determinants randomly, or according to the probability distribution you believe in, you would get event x occurring 50/1000 times.

Correct me if I'm mistaken in my assumptions about your belief, but I don't know how else to make sense of your comment.

I think that the P vs NP claim is most likely a logical truth one way or the other, and that no matter how you modify the known or unknown determinants, it will come out the same way in all universes.

If you have some worries about that, just consider the claim that the Goldbach conjecture is true (or false), or the claim that the 1415926th digit of pi is a 5.

I was with you until the last bit Kenny (per Wolfram Alpha, the 1415926th digit of pi is known to be a 7:-))

Isn’t your 5% estimate essentially meaningless here though? Since it is a proposition about a fundamental law and you know it is actually either 100% or 0%. And more importantly no other prediction you make will bear sufficient similarity to this one that grouping them yields any helpful knowledge about future predictions.

Your first point shows that P=NP doesn’t have a *chance* of 5% - either all physically possible paths give P=NP or none of them do.

Your second point shows that a certain kind of frequentist calibrationism isn’t going to make sense of this either.

But Bayesian probability isn’t about either of those (regardless of what Scott says about calibration). Bayesian probability is just a way of governing your actions in light of uncertainty. I won’t make risky choices that come out very badly if P=NP is false, and I won’t make risky choices that come out 20 times as badly if it’s true compared to how well the choice turns out if it’s false. That’s what it means that my credence (Bayesian probability) is 5%. There is nothing that makes one credence “correct” and another “incorrect” - but there are people who have credence-forming policies that generally lead them well and others that generally lead them badly. And the only policy that avoids guaranteed losses is for your credences to satisfy the probability axioms and to update by Bayesian conditionalization on evidence.

The thing I don’t get is that if it is not a frequentist probability, how can it make sense to applies Bayes theorem to an update for P=NP? Say a highly respected mathematician claims he has a proof of it, then promptly dies. This is supposed to be Bayesian evidence in favour of P=NP. But does it make sense to apply a mathematical update to the chance of P=NP? Surely it is not an event that conforms to a modelable distribution, since as you say it is either wrong or right in all universes.

What Bayes' Theorem says is just that P(A|B)P(B)=P(A&B). (Well, people often write it in another form, as P(A|B)=P(B|A)P(A)/P(B), but that's just a trivial consequence of the previous equation holding for all A and B.)

To a Bayesian, P(A) (or P(B), or P(A&B)) just represents the price you'd be willing to pay for a bet that pays $1 if A (or B, or A&B) is true and nothing otherwise (and they assume you'd also be willing to pay scaled up or down amounts for things with scaled up or down goodness of outcome).

Let's say that P(B|A) is the amount you plan be willing to pay for a bet that pays $1 if B is true and nothing otherwise, in a hypothetical future where you've learned A and nothing else.

The first argument states that this price should be the same amount you'd be willing to pay right now for a bet that pays $1 if A&B are both true, nothing if A is true and B is false, and pays back your initial payment if A is false (i.e., the bet is "called off" if A is false). After all, if the price you'd be willing to pay for this called off bet is higher, then someone could ask you to buy this called off bet right now, as well as a tiny bet that A is false, and then if A turns out to be true they sell you back a bet on B at the lower price you'd be willing to accept after learning A is true. This plan of betting is one you'd be willing to commit to, but it would guarantee that you lose money. There's a converse plan of betting you'd be willing to commit to that would guarantee that you lose money if the price you'd be willing to pay for the called off bet is lower than the price you'd be willing to pay after learning that A is true. The only way to avoid the guaranteed loss is if the price you're willing to pay for the called off bet is precisely equal to the price you'd be willing to pay for a bet on B after learning A.

But a bet on B that is called off if A is false is precisely equal to a sum of three bets that you're willing to make right now - and we can check that if your prices don't precisely satisfy the equation P(B|A)P(A)=P(A&B), then there's a set of bets you're willing to make that collectively guarantee that you'll lose money.

https://plato.stanford.edu/entries/dutch-book/#DiacDutcBookArgu

There's nothing objectively right about the posterior P(B|A), just that if you are currently committed to betting on B at P(B), and you're currently committed to betting on A&B and P(A&B), then you better be committed to updating your price for bets on B to P(B|A) if you learn A (and nothing else) or else your commitments are self-undermining.

All that there is for the Bayesian is consistency of commitment. (I think Scott and some others want to say that some commitments are objectively better than others, such as the commitments of Samotsvety, but I say that we can only understand this by saying that Samotsvety is very skilled at coming up with commitments that work out, and not that they are getting the objectively right commitments.)

And this is my point -- that statements about "degrees of belief" will inevitably be translated into statements about long-run frequencies.

I do not think that 5% to reach Mars could ever be interpreted as a frequentist probability.

If you are one of the mice which run the simulation that is Earth, and decide to copy it 1000 times and count in how many of the instances the humans reach Mars by 2050, then you can determine a frequentist probability.

If you are living on Earth, you would have to be very confused about frequentism to think that such a prediction could ever be a frequentist probability.

<I just couldn't resist>

>If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. _It's_ _just_ _inevitable_.

groan [emphasis added]

</I just couldn't resist>

I wonder if there's a bell curve relationship of "how much you care about a thing" versus "how accurately you can make predictions about that thing". E.g. do a football teams' biggest fans predict their outcomes more accurately or less accurately than non-supporters? I would guess that the superfans would be less accurate.

If that's the case, "Person X has spent loads of time thinking about this question" may be a reason to weigh their opinion less than that of a generally well-calibrated person who has considered the question more briefly.

I think you're conflating bias vs "spent loads of time thinking about this question" as the cause of bad predictions. The latter group includes all experts and scientists studying the question, and probably most of the best predictions. It also includes the most biased people who apply no rational reasoning to the question and have the worst predictions. You're better off considering bias and expertise separately than just grouping them as people who spenta lot of time thinking about something.

I do think spending lots of time thinking about a subject can contribute to developing a bias about it, though

Me too. But you still have to consider the two factors separately. You can't just reason that since physicists have spent a lot of time thinking about physics, they're probably biased and shouldn't be trusted about physics.

Sure you can. Greybeards with pet theories they won't give up because they're so invested in them are a thing. It's why Planck wrote that "science advances one funeral at a time" - eminent physicists really are a hindrance to accurate physics.

Compared to some platonic ideal of Science!, perhaps. Compared to any real-world alternative that *isn't* eminent physicists, the eminent physicists will give you the most accurate understanding of the physical universe.

On the one hand, bookies do find that the home team gets bet on a lot more than their actual odds of winning.

On the other hand, superfan could mean either "I go to all the games" or "I spend a lot of time thinking about the sport". The latter do tend to be able to put aside their affiliation when there's money on the line.

I think the bookies' experience has more to do with a lot of naïve fans flinging out on a game once in a while. It does tend to happen in the playoffs more than the regular season after all.

In answer to your first question, I find good value in betting on things where there is emotional investment, ie elections and sporting events with local heroes involved.

If Eliezer thought that p(AI doom) was 10^-10, then he would not have spent years working on that question.

On the one hand, I would think it is wrong to privilege the opinion on the existence of God of someone with a degree in theology based on the fact that they have deep domain knowledge, because few atheists study theology.

On the other hand, if a Vulcanologist told me to evacuate, I would privilege his opinion over that of a random member of the public. (It might still be overly alarmist because of bad incentives.)

Most other professionals fall somewhere in this spectrum. You are much more likely to study astronomy / climate change / race relations / parapsychology / homeopathy if you believe that stars (as "luminous objects powered by fusion") / antropogenic climate change / systemic racism / ESP / memory of water exists. Of course, once you are in a field, you are disproportionally subject to the mainstream opinions in that field, publication bias and so on. Trying to publish an article arguing that stars are actually immortalized Greek heroes in an astronomy journal will probably not work.

So to judge "is X for real or are all the practitioners deluding themselves" is not easy to answer.

<mild snark>

It is not uncommon , for _many_ fields, for practitioners in that field to answer "Is your field important?" affirmatively. :-)

</mild snark>

What comments triggered this? I saw Yann LeCun has made the Popperian argument that no probability calculus is possible.

A great project would be to trace how language has evolved to talk about probability, from before Pascal to now.

edited Mar 21A great project indeed! A book waiting to be written (if it has not already been written. I 'd put the probability of that at 35 percent at least, though:)

The Emergence of Probability by Ian Hacking covers this topic. Prior to the modern generalized concept of probability, the word meant something very different. Quoting from https://en.wikipedia.org/wiki/Probability:

> According to Richard Jeffrey, "Before the middle of the seventeenth century, the term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.

https://www.amazon.co.uk/Everything-Predictable-Remarkable-Theorem-Explains-ebook/dp/B0BXP3B299

(it's not really the book you're talking about, but it's not COMPLETELY dissimilar and since I wrote it I'm keen to push it)

Thanks. In addition to this and Hacking I've found James Franklin's Science of Conjecture: "The words “reasonable” and “probably” appear in modern English with a frequency and range of application very much higher than their cognates in other European languages, as do words indicating hedges relating to degrees of evidence such as “presumably,” “apparently,” “clearly.” These phenomena indicate an Anglophone mindset that particularly values attention to the uncertainties of evidence.25"

edited Mar 21Your suggestion to call different concepts of probability different names ("shmrobability") for metaphysical reasons actually makes complete sense. Maybe call frequentist probability "frequency", and Bayesian probability "chance" or "belief", with "probability" as an umbrella term. The different concepts are different enough that this would be useful. "The frequency of picking white balls from this urn is 40%." Sounds good. "The frequency of AI destroying mankind by 2050 is 1%" Makes no sense, as it should; it happens or not. "The chance of AI destroying mankind by 2050 is 1%." OK, now it makes sense. There we go!

I don't think the fraction of the ensemble of possible worlds has a word in the English language that means it.

edited Mar 21Hmmm, how about "proportion" or "ratio"? e.g. "I think there's a 1% proportion of AI causing human extinction."

While we are talking about all the different interpretations of "probability", I had might as well plug the Stanford Encyclopedia of Philosophy article on it: https://plato.stanford.edu/entries/probability-interpret/#ClaPro

It would be hard to think of easy English terms for all six main interpretations they list, but hardly impossible. We can even create new words if necessary!

On reflection, I've come around to thinking that any expression that brings up an ensemble of possible worlds would be apt to cause more philosophical sidetracking than "probability" itself!

Now, would that be the quantum mechanical many-worlds-interpretation ensemble, the cosmological multiple-independent-inflationary-bubbles ensemble, or the string theory multiple possible compactifications ensemble? :-)

Why assume that there is a finite number of possible worlds to begin with?

If the number of possible worlds is infinite, which to me seems intuitively likely, then any subset of possible worlds will either be ~0% of the whole (if finite) or an undefined fraction (if the subset is itself infinite).

edited Mar 21I'm not sure they're meaningfully different, at least not in a way that separates flipping a coin from questions like "will we get to Mars before 2050". If I have a fair coin that I've never flipped before, does it make sense to say the frequency of getting heads is 50%? If I've only ever flipped it twice and it came up heads both times, is the frequency of heads 100%? If yes, then the frequency is bad for making predictions. It's just a count of outcomes in some reference class.

But if we consider the frequency of heads to be 50% even though the coin came up heads both times, then it's because we're using "frequency" to mean the frequency you would hypothetically get if you repeated the experiment many times. But this sounds a lot like chance. If you hypothetically repeated the experiment of "will we get to Mars before 2050" many times, there would also a frequency of how many times we'd get there. Sure, we can't actually repeat experiment in real life, but the same is true of a coin that I won't let anyone flip.

For both the coin flip and the chance of getting to Mars, we come up with a mathematical model and that model gives us a probability. The models exist as pure math and have a well defined probability, but they never perfectly match the real life event we're trying to model. E.g. no real coin is perfectly balanced and has exactly a 50% chance of landing on heads.

"For both the coin flip and the chance of getting to Mars, we come up with a mathematical model and that model gives us a probability. The models exist as pure math and have a well defined probability, but they never perfectly match the real life event we're trying to model."

Sure, but its important that the models match the real life events well enough to be useful. There is no way to determine whether that is the case or not other than defining what "well enough" means, and then comparing the outcomes of experiments to the predictions of the model.

I think it's feasible to conduct enough coin flip experiments to validate the mathematical model to within some acceptable tolerance, and that's why I don't object to talking about the probability of a particular outcome for a coin flip.

I don't think it's feasible to conduct enough "getting to Mars" experiments to validate the mathematical models that people might propose, and that's why I object to trying to assign it a probability.

You can try to come up with a reference class for "getting to Mars" that is large enough that you can validate a mathematical model of the reference class, but I don't think it actually helps, because you still have to validate that getting Mars actually belongs in that reference class, and that brings you back to the need for experiments.

You can also try to break "getting to Mars" down into a series of steps that are themselves feasible to model and validate, and then combine them. But I still don't think it helps you, for two reasons. The first goes back to what I already said -- you still have to experimentally verify that you've captured all of the relevant steps, and that still requires experimentation. The second, deeper objection is that I would draw a distinction between reducible and irreducible complexity. I think "getting to Mars" is irreducibly complex, i.e. it can't be broken down into discrete components that can each individually be well modeled. At least not with our current understanding of economics, politics, and sociology.

edited Mar 21The same is true of the coin though. You're using the reference class of other coins that have been flipped and experimentally verified to match the model that p(Heads) = 0.5. But my penny is a different coin. No two pennies are manufactured exactly alike. How do you know that other coins are a good reference class for my coin? You'd have to do more experiments with my coin to ensure it belongs in that reference class.

At the end of the day, you're making the judgement call that me flipping the penny in my pocket is similar to other people flipping other coins. That's a good judgement call. But it's not categorically different than judging that our model for rocket launches will apply to a rocket to Mars. The difference is only one of degree. We're highly confident in applying our coin flipping model to new coins, and somewhat less confident in applying our rocket model to new destinations.

>But my penny is a different coin. No two pennies are manufactured exactly alike. How do you know that other coins are a good reference class for my coin? You'd have to do more experiments with my coin to ensure it belongs in that reference class.

Agreed. One can (almost) never eliminate some reference class tennis. ( Perhaps with elementary particles or identical nuclei one can. Chemistry would look different if electrons were distinguishable... )

I agree that no two pennies, or anything really, are exactly alike. There will always be some tolerance for differences when defining your reference class, and it's always possible to set the tolerances so strictly that the reference class is N=1. But I still think there are qualitative differences between coin flipping and getting to Mars (which I interpreted as "land a group of humans on Mars and bring them back safely").

First, even though no two coins are exactly the same, it's feasible to experimentally measure how similar a large number of coins minted in US mints are when it comes to the distribution of heads and tails. We could even go more granular and look at coins from a particular mint, or coins from a particular production run, and we can still collect enough coins and flip them enough times to model the probability distributions pretty well. You can't do that for getting to Mars.

Of course, there is always a chance that your coin was somehow defective, but we can even quantify the fraction of defective coins pretty well and incorporate that into the model. If you are being sneaky and telling me your coin is a standard quarter, when in fact it's fraudulent. I'll admit that's probably impossible to fully incorporate something like dishonesty into a model. If you think that makes coin flipping and getting to Mars qualitatively similar, then we can agree to disagree.

But even if the reference class really is N=1, then you can at least verify your assumptions by flipping that particular coin a bunch of times. Now we've moved from prediction to postdiction, but I still think this is a meaningful distinction when compared to something like sending people to Mars, that we will struggle to even do once, at least at first.

If "getting to Mars" were merely rocket science, then I think you could argue that it's qualitatively similar to flipping a coin, and it's just a difference of degree. After all, we have successfully (and unsuccessfully) sent rockets to Mars. The sample size may not be as large as the number of coin flips we could do in a day, but it's large enough that there is useful information to be gleaned. And rocket science, while very complex, is closer to what I called reducible complexity than irreducible complexity (I think...I'm not a rocket scientist).

But if by "getting to Mars" we mean "land humans on Mars and bring them back" then I think there is a qualitative difference with flipping a coin. A manned mission to Mars has not been done even once -- it hasn't even been attempted once. And it involves a lot more than just physics and biology. There are economic and political and social factors that I don't think can be modeled and verified in the same way as flipping a coin or even launching a rocket.

Finally, concepts like reducible vs irreducible complexity, how similar two things have to be before you can group them in the same class, and how closely a model has to match observations to be considered "good enough" involve subjectivity, and I think of them as existing on a continuous spectrum. That means you can't set a clear dividing line and say, everything to right is one kind of thing, and everything to the left is another kind of thing. The boundaries are fuzzy. But I still think two things that exist on a spectrum can have qualitative differences when they are are far enough apart.

I'm sympathetic to this approach, but there's an issue where declaring literally anything to *really* be a valid instance of "frequency" invites an argument about interpretations of quantum mechanics. No easy escape from metaphysical arguments in a deterministic universe!

Most philosophers use “chance” for something different from either frequency or credence (our term for Bayesian probability). “Chance” is used for an objective physical probability that may or may not ever be repeatable and that people may or may not ever wonder about to form a credence. Standard interpretations of quantum mechanics suggest that there are chances of this sort (though my own preferred interpretation, the many worlds interpretation, gives those numbers a slightly different meaning than “chance”).

> Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

This seems literally wrong to me. Probability and information both measure (in a very technical sense) how surprising various outcomes are. I think they may literally be isomorphic measures, with the only difference being that information is measured in bits rather than per cents.

Your examples are also off-base here. The probability of a fair coin coming up heads when I'm in California is 1/2, and the probability of a fair coin coming up heads when I'm in New York is 1/2, and we wouldn't say that probability is not the same thing as information because 1/2 does not capture what state I'm flipping the coin in. Similarly, the difference between the first two scenarios is not E[# heads / # flips] but E[(# heads / # flips)^2] - (E[# heads / # flips])^2 aka the expected variance of the distribution is different. This is because (1) is well-modelled by *independent* samples from a known distribution, while in (2) the samples are correlated (aka you need a distribution over hyper parameters if you want to treat the events as independent).

I also noticed you didn't touch logical / computationally-limited probability claims here, like P(the 10^100 digit of pi is 1) = 1/10.

I think he meant "level of information" as in pages of text you could write about it, not the information-theoretic entropy sense of the word that you're rightly saying doesn't work in that sentence.

Right. “This is because 50% isn’t a description of how much knowledge you have, it’s a description of the balance between different outcomes.” He presents scenarios where you have a lot of useless knowledge (the knowledge about the process is made useless by arbitrarily tagging one outcome as heads and the other as tails). Probability is related to the amount of knowledge you have that can be leveraged to cast a prediction. Learning Moby Dick by heart won’t help you predict whether Trump or the other guy will win the election.

edited Mar 21Parts of this are unobjectionable, and other parts are very clearly wrong.

It is perfectly fine to use probabilities to represent beliefs. It is unreasonable to pretend the probabilities are something about the world, instead of something about your state of knowledge. Probabilities are part of epistemology, NOT part of the true state of the world.

You say "there's something special about 17%". No! It's just a belief! Maybe the belief is better than mine, but please don't conflate "belief" with "true fact about the world".

If Samotsvety predicts that aliens exist with probability 11.2%, that means they *believe* aliens to exist to that extent. It does not make the aliens "truly there 11.2% of the time" in some metaphysical sense. I can feel free to disagree with Samotsvety, so long as I take into account their history of good predictions.

(Side note: that history of good predictions may be more about politics and global events than it is about aliens; predicting the former well does not mean you predict the latter well.)

----------

Also, a correction: you say

"It’s well-calibrated. Things that they assign 17% probability to will happen about 17% of the time. If you randomly change this number (eg round it to 20%, or invert it to 83%) you will be less well-calibrated."

This is false. It is easy to use a simple rounding algorithm that guarantees the output is calibrated if the input is calibrated (sometimes you can even *increase* calibration by rounding). If I round 17% to 20% but also round a different 23% prediction to 20%, then it is a mathematical guarantee that if the predictions were calibrated before, they are still calibrated.

Calibration is just a very very bad way to measure accuracy, and you should never use it for that purpose. You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

I think the part about calibration answers the first half of your comment: these numbers have real-world validity in that you can bet on them and make more money on expectation than someone using different numbers.

For the second half of the comment: if I understand your argument correctly, it only holds if there are equal numbers of predictions on either side. If Samotsvety says 17% for impeachment and 23% for Mars (or 10% and 90%), and you round those both to 20% (or 50%) and bet accordingly, then, yes, you'll make as much money as someone who used the unrounded numbers.

But if they predicted 17% (or 10%) for *both* events, and you rounded *both* to 20% (or 50%) and bet accordingly, then you'd lose money on expectation compared to someone using the unrounded numbers.

And this applies even more strongly if there's a whole slate of unrelated events, like in a prediction contest. If you threw away all the answers (from Samotsvety or the wisdom of crowds or whoever) and rounded everything to 50%, or even to one significant digit, then you would be losing information, and would be losing money if you bet based on those rounded numbers.

If two people both have functions that satisfy the probability axioms, then each one makes more money on expectation than the other, calculating expectations by means of their own probability function. This is just a property of probability functions generally, that doesn’t pick out a single special one.

If Samotsvety puts 17% for impeachment, and I round it off to 10% and you round it off to 30%, and we all make a bet, then the person who does best will be either you or I. Neither of us could systematically do better than Samotsvety. But there absolutely could be someone who is perfectly equally good to Samotsvety in general, despite giving slightly different numbers to every event than Samotsvety does.

"these numbers have real-world validity in that you can bet on them and make more money on expectation than someone using different numbers."

There's no "in expectation" when considering a single event.

Also, you're conflating "make money in expectation" with "be calibrated". They are not quite the same. I do agree that the 17% can be interpreted in a way that is meaningful, and that I won't be able to round. Calibration is the wrong way to do this.

That I can't round 17% still does not make 17% "objectively true", since another equally skilled predictor can predict 93% on the same question and they can both be right (in the sense that they both get the same calibration and the same Brier score on their entire batch of predictions). The issue is that you can never judge accuracy or calibration on a single prediction, but only on a batch of them.

It's really much better to think of 17% as a measure of belief, not as an objective fact about the world. If I have insider info, maybe I already know the event will happen 100% of the time, so then what does the 17% mean? It just CAN'T be a fact about the world; it doesn't make sense!

If we suppose that quantum events tend not to influence politics overly much, then 17% for impeachment is likely not true.

If God decided to open Omniscient Forecasts Ltd, he might state the impeachment odds as 99% or 1% instead. If he was correct, he would beat Samotsvety in his prediction score, and everyone would use his predictions.

Samotsvety 17% are just our best guess because we do not have a Laplacian Daemon or the best possible psychological model of every American.

Where in the post did he claim that 17% represented a true fact about the world?

It's implied in several places, including in other posts by Scott. In this particular post, we have, for example:

"I think the best way to describe this kind of special number is “it’s the chance of the thing happening”, for example, the chance that Joe Biden will be impeached."

So? Do you also object to people saying that the chance of a fair coin landing on heads on the next throw is 50%? As far as I can tell, these statements are equivalent.

First you claim Scott never said it, and now you claim he's right to say it?

Anyway, a fair coin comes from an implied set of repeatable events (other fair coin flips). That's not true for most things you predict.

What's the probability that the 100th digit of pi is 5? You can say "10%", but you can also just look it up and see that the digit is 9, so the probability is actually "0%" (unless the source you looked up was lying, I guess, so maybe 1%?). Which one is right? They both are: the probabilities represent *your state of belief*, not an objective fact about the world.

Maybe it would be clearer if I asked you "what's the probability the 1st digit of pi is 5"? That should make it more obvious that the answer is not just "10%" in some objective sense. The answer depends on how many digits of pi you know! It's a property of your state of knowledge, not a property of the world.

>First you claim Scott never said it, and now you claim he's right to say it?

Yes, insofar as it's ordinary use of language, even if it elides certain metaphysical subtleties. You do agree of course, that there's no true fact about a fair coin that is represented by the number 50%?

edited Mar 22Hmm... Maybe one way of thinking about it is that there are estimates of probability that are difficult to impossible to _improve_? In a Newtonian world, if one knew everything about a coin toss of a fair coin to infinite precision and had infinite computing capacity, then the odds of heads or tails on a given toss would be 0%/100% or 100%/0%, but, in practice, no one can have that information. And in many situations, one can often say "The information doesn't exist" e.g. about the results of an experiment that has not yet been run. And, in some of those cases, the best existing estimate of the probability looks less like a subjective description of one's state of mind and more like a publicly known value.

edited Mar 22With a fair coin, you can view it in two metaphysical frames: you could either (a) say the probability it lands heads is "truly" 50%, by viewing the coinflip as an instance out of the larger class of "all fair coinflips", which one can show together approach half heads and half tails; or (b) you can say this given coin, like the rest of the universe, is deterministic and the probability is about our state of knowledge. I allow both frames.

For more practical matters, though, frame (a) does not work. That's because (1) there is no natural class like "all fair coinflips" from which you drew your sample, and (2) not everyone has the same state of belief about the event. There is no longer any way to pretend that 17% is a fact of the universe instead of a property of your belief state. Some individual congressmen might have perfect knowledge of whether Biden will be impeached; how then could 17% be a fact of the universe?

You've tried to gotcha me, and I've answered your question carefully. Now answer mine: is the statement "10% chance the 100th digit of pi is 5" really a factual claim about the world instead of a claim about your state of knowledge? How could that be, when I just looked it up and know the 100th digit?

I don't think that's necessary for the objection to hold. If I understand correctly, Scott says that if you are well calibrated, then 17% represents a true fact about your state of knowledge about the world. But the objection is that two people can be well calibrated, share the same knowledge, and assign totally different probabilities to events.

Yes, Scott is sloppy with his treatment of calibration, but I think that the main thrust of his post is obviously true, and people eager to nitpick him should first acknowledge that, if they prioritize discourse improvements over being excessively pedantic.

>You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

This changed my mind more than any other comment. However, now I wish I could see some empirical data as to how often this happens. Are superforecasters of similar calibration really assigning totally different probabilities to events? I could believe this happening if people just have different knowledge and expertise. But if it's based on publicly available knowledge, then I would expect high quality predictions to cluster.

It's a good question, and I don't know the answer. My guess is that the superforecasters would generally assign similar probabilities to normal questions they see a lot and have kind of developed a model for (e.g. elections) but would probably wildly diverge on some questions, including some that you might care more about (e.g. AI risk).

There's also a clear selection bias for calibration based on time-to-event. Say we're both superforecasters working on predicting the development of display technologies. You and I were both well calibrated on predicting the rise of flat screens and LED, the overhyped 3D movement, and HD/4K. We both predicted which technologies would take hold, and how long it would take them to mature to the point of becoming standard (or not).

You predict with 90% certainty that intra-cranial image projection will happen by 2075, but I predict this is only 10% likely. How can this be true if we're both well calibrated?

All our well-calibrated predictions are on technologies <25 years old, and were likely made with time horizons of less than 10 years. Yet this is the basis for appeals to our accuracy on much longer time horizons.

If someone predicted in 1980 that high definition flat screen TVs with a wider color gamut would be the norm in 2024, that would be interesting. If they accurately predicted date ranges within which these technologies would be adopted and mature, that would give me more confidence that this same person making predictions today had useful insights for 40+ years into the future.

When people say, "sure you predicted events within a 5 year time horizon, but I'm not convinced you're able to predict with accuracy 50 years out" that's not them irrationally ignoring the calibration curves. It's accurately discerning the limits of the data.

I'm curious about the demand that probabilities come with meta-probabilities. Would it not anyway be satisfied by Jayne's A_p distribution?

Assume there is a one-shot event with two possible outcomes, A and B. A generous, trustworthy person offers you a choice between two free deals. With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs. By adjusting X, and under some mild(ish) assumptions, the threshold value of X behaves a helluva lot like a probability.

Or at least, it better. If it doesn’t, then I can either come up with a set of bets you will take that guarantee you lose, or come up with a set of bets that you are collectively guaranteed to win that you won’t take any of.

edited Mar 22Am I being dense, or is there a typo somewhere?

>With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs.

if we scale this down by X we get

With Deal 1, you get $1 if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $1 if B occurs.

Which looks like the threshold probability for switching deals is always 50%, independent of X.

Was the payoff for B in Deal 2 supposed to be $(1-X)? Or maybe $1? That would make X most like a probability (I think).

You're absolutely right! Thanks for the catch. Deal B should be $1. In that case, p(A) is the inverse of X when X is set such that Deals A and B are equally attractive. Falls apart at the ends because people are terrible at small probabilities and large numbers, but it does suggest that there's something that acts an awful lot like a probability, even in the absence of a frequentist interpretation.

Many Thanks!

I feel that it's in a sense a continuation of the argument about whether it's OK to say that there's a 50% chance that bloxors are greeblic (i.e. to share raw priors like that). The section "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" specifically leans into that, and I disagree with it.

Suppose I ask you what are the chances that a biased coin flips heads. You tell me, 33%. It flips heads and I ask you again. In one world you say "50%", in another you say "34%", because in the first world most of your probability estimate came from your prior, while in the second you actually have a lot of empirical data.

That's two very different worlds. It is usually very important for me to know which one I'm in, because sure if you put a gun to my head and tell me to bet immediately, I should go with your estimate either way, but in the real world "collect more information before making a costly decision" is almost always an option.

There's nothing abstruse or philosophical about this situation. You can convey this extra information right this way, "33% but will update to 50% if a thing happens", with mathematical rigor. Though of course it would be nice if Bayesians recognized that it's important and useful and tried to find a better way of conveying the ratio of prior/updates that went into the estimate instead of insisting that a single number should be enough for everyone.

And so, I mean, sure, it's not anyone's job to also provide the prior/updates ratio, however it might look like, to go along with their estimates (unless they are specifically paid to do that of course), and people can ask them for that specifically if they are interested. But then you shouldn't be surprised that even the people who have never heard about the Bayes theorem still might intuitively understand that a number "50%" could come entirely from your prior and should be treated us such, and treat you with suspicion for not disclosing it.

Best comment in the thread so far.

Yes, it would be better to use a distribution to represent our beliefs instead of a single probability. When we have a fair coin, our distribution is over the options: 1) the coin will come up tails, 2) the coin will come up heads.

When we have an unknown coin, the distribution is over the kind of bias the coin could be, and without further knowlesge, we have a uniform distribution over [0,1] representing all the ways in which the coin could be biased. When someone asks us what the probability of heads on the next flip will be, we use this distribution to compute this probability.

Exactly. The distribution for p(heads) of a fair coin is a dirac function at 0.5.

The prior distribution for some process which returns either "head" or "tail" should be that p follows a constant distribution over the unit interval. (depending what you assume about the process, you might want to add small dirac peaks at 0 and 1.)

The expected value of p is 0.5 in either case, but the amount of knowledge you have is very different.

Of course, if you state your prior distribution, that will also tell people how much you would update.

In frequentist terms, if you've flipped a coin 100 times, you know more than if you flipped it 10 times, because you have more data and you can estimate the bias more precisely. The variance tells you this, or if you plot the probability distribution, you get this too.

If we switch to Bayesian reasoning, there is jargon about this, like having a "flat prior". Maybe we should use that more? Predictions about AI seem like the sort of thing where the probability distribution should be pretty flat?

Is there better terminology for talking about this informally?

I am glad, there is no new example of: ' "Do bronkfels shadwimp?" is binary, no one knows, thus: 50% chance. ' As in the "coin which you suspect is biased but you’re not sure to which side" - which IS 50%. - If A ask about bronkfels and knows/pretends to know: may be 50%. If no one around knows: Chance of a specific verb working with a specific noun; which is less than 1%. - "Are the balls in this bag all red?": around 4%. No surprise if they are, even if you did not know. - "Are 20% purple with swissair-blue dots?" I'd be surprised. And would not believe you did not know before. - "Are they showing pictures of your first student?": 50% really?

And in particular, it's certainly inconsistent to give a 50% chance to *all* of a) all the balls in this bag have a photo on them, b) all the balls in this bag have a photo of me on them, c) all the balls in this bag have a photo of Scott on them, d) all the balls in this bag have a photo of your first student on them.

Maybe the bag is 50% to be empty?

> Whenever something happens that makes Joe Biden’s impeachment more likely, this number will go up, and vice versa for things that make his impeachment less likely, and most people will agree that the size of the update seems to track how much more or less likely impeachment is.

There is a close parallel here to the same issue in polling, where the general sense is that the absolute level determined by any given poll is basically meaningless - it's very easy to run parallel polls with very similar questions that give you wildly different numbers - but such polls move in tandem, so the change in polled levels over time is meaningful.

That sounds reasonable.

For some types of polls and some types of comparisons the opposite can happen. "Which nation is happiest", examined by asking "Are you happy or unhappy" on a 1-10 scale, where "happy" is in different languages with somewhat different meanings...can get messy... To some extent, similar questions in a single nation, spread across periods long enough that meanings shift, run into the same problem.

edited Mar 21Something's been bothering me for a while, related to an online dispute between Bret Devereaux and Matthew Yglesias. Yglesias took the position that, if history is supposed to be informative about the present, then that information should come with quantified probabilities attached. Devereaux took the position that Yglesias' position was stupid.

I think Devereaux is right. I want to draw an analogy to the Lorenz butterfly:

The butterfly is a system that resembles a planet orbiting around and between two stars. There is a visualization here: https://marksmath.org/visualization/LorenzExperiment/

It is famous for the fact that its state cannot be predicted far in advance. I was very underwhelmed when I first found a presentation of the effect - it's very easy to predict what will happen, as long as you're vague about it. The point will move around whichever pole it is close to, until it gets close to the other pole, at which point it will flip. Over time, it broadly follows a figure 8.

You can make a lot of very informative comments this way. At any given time, the point is going to lie somewhere within a well-defined constant rectangle. That's already a huge amount of information when we're working with an infinite plane. And at any given time, the point is engaged in thoroughly characteristic orbital behavior. The things that are hard to predict are the details:

1. At time t, will the point be on the left or on the right?

2. How close will it be, within the range of possibilities, to the pole that currently dominates its movement?

3. How many degrees around that pole will it have covered? (In other words, what is the angle from the far pole, through the near pole, to the point?)

4. When the point next approaches the transition zone, will it repeat another orbit around its current pole, or will it switch to the opposite pole?

If you only have a finite amount of information about the point's position, these questions are unanswerable, even though you also have perfect information about the movement of the point. But that information does let us make near-term predictions. And just watching the simulation for a bit will also let you make near-term predictions.

This seems to me like an appropriate model for how the lessons of history apply to the present. There are many possible paths. You can't know which one you're on. But you can know what historical paths are similar to your current situation, and where they went. The demand for probabilities is ill-founded, because the system is not stable enough, 𝘢𝘴 𝘵𝘰 𝘵𝘩𝘦 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴 𝘺𝘰𝘶’𝘳𝘦 𝘢𝘴𝘬𝘪𝘯𝘨, for probabilities to be assessable given realistic limitations on how much information it's possible to have.

edited Mar 21Just because not everything can in principle be long-term predicted, it doesn't mean that nothing can be. Of course, there's the problem that we don't have a universal easy way to separate questions into those categories, but it's not like the only epistemological alternative of "throwing hands up in the air" is particularly enticing.

This is just chaos theory, yes? The sensitivity of deterministic systems to initial conditions is an important consideration and an awful lot of ink can be spilled on it, but it's well-trod ground and I don't think you're getting any metaphysical ramifications out of it.

edited Mar 21A point I have tried to make before wrt Scott's enthusiasm for prediction markets is the difference between information (a) that exists, but is not yet known to you; or (b) that does not exist.

The market is a good way to attract existing information to a central location where it can be easily harvested. But it is not such a good way to learn information that doesn't exist. You can ask whatever question you want, and you'll learn 𝘴𝘰𝘮𝘦𝘵𝘩𝘪𝘯𝘨, but you won't necessarily learn anything 𝘢𝘣𝘰𝘶𝘵 𝘺𝘰𝘶𝘳 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯.

In cases where information about your question doesn't already exist and is difficult to produce, you should not expect to derive much if any benefit from a prediction market.

This framework is applicable here; I suppose I'm arguing that the Lorenz butterfly, and human history, are processes where, for many questions, probabilistic information doesn't really exist, can't exist, and looking for it - and even more so claiming to have it - is a mistake.

Throwing your hands up in the air may not be enticing or prestigious, but that won't make it incorrect.

Quantum mechanics tells us that some outcomes are inherently undetermined, and information about which outcome will happen does not exist in advance. But probability applies to such processes quite well. We can use probability to quantify our ignorance.

Mmm... I'm not sure what's going on there is that we're using probability to quantify our ignorance. I think we quantified our ignorance by counting the outcomes of a large number of experiments, and used probability to describe the quantified ignorance.

But that approach will only work when we can make a large number of similar (hopefully, identical) experiments.

> I suppose I'm arguing that the Lorenz butterfly, and human history, are processes where, for many questions, probabilistic information doesn't really exist, can't exist

"Probabilistic information doesn't really exist" doesn't make too much sense as a claim, let's simplify from a Lorenz system and take the double pendulum: If I tell you that the weight starts on the right, you can confidently say that after a tiny amount of time that the weight will still be on the right, and after a smallish amount of time it will be on the left, and maybe after that it exponentially flattens to a 50/50 chance. But a uniform distribution is most definitely probabilistic information! It's really interesting that the uncertainty in position approaches the maximum so quickly, and throughout the evolution of the system we can absolutely put bounds on the velocity of the weight, how far it is from the pivot, etc.

And of course, it's really really hard to make claims on what *can't* be known. One can sometimes get away with that in certain QM interpretations, but I see you're taking a different track on that downthread.

> In cases where information about your question doesn't already exist and is difficult to produce, you should not expect to derive much if any benefit from a prediction market.

When Scott says "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" he's straightforwardly correct. But we *do* still care about level of information for its own sake e.g. the explore/exploit dichotomy, and in the specific case of prediction markets it's extremely important to know the trade volume as well.

But I think probably the greatest lesson that superforecasting as a learnable discipline has to offer is that what we might think are unique one-off events to which no prior probability can be attributed... might just be a tweaked instantiation of the same patterns.

Is there evidence that historians have any edge in making predictions at all? https://westhunt.wordpress.com/2014/10/20/the-experts/

I'm failing to see why you can't assign probabilities to those questions about the point around the Lorenz butterfly. Maybe for a particular Lorenz butterfly, you find the point spends 54% of its time on the left wing. That's already a basis for giving a probability of which side the point will be on at time t. With more information and a better model, you could make even better predictions.

> Maybe for a particular Lorenz butterfly, you find the point spends 54% of its time on the left wing. That's already a basis for giving a probability of which side the point will be on at time t.

Yes, that's true.

> With more information and a better model, you could make even better predictions.

No, the fact that you can't do this is the point. Or to be more precise, with more information, you can make predictions about times in the near future, and there are very sharply diminishing returns on how far you can extend the reach of "the near future" by gaining more information. If you're halfway around one wing you can make an excellent prediction of how long it will take you to get to the transition zone. Everything immediately after that looks much foggier.

Applying these refinements to history gets us points like:

- We should be able to notice, and become alarmed, when a major event is almost upon us. But maybe not before then.

- The fact that a major event is almost upon us will not be good evidence that that event will actually happen.

- We should be unsurprised† by day-to-day developments almost all of the time.

- We should have no idea, this year, what next year will be like. Maybe the winter solstice will accelerate and occur in August. If that turns out to be what the future holds, we won't know until maybe late July. (Astronomy turns out to be better behaved than history.)

The general idea here, in terms of your analogy, is that we can know that the point spends 54% of its time on the left wing, but that the probabilities we can assign to "will the point be on the left wing at time t" will almost always be restricted to (a) 54%, (b) almost 0%, or (c) almost 100%. Notably, none of those possibilities allows for a prediction market to be helpful; they correspond to "we're quite sure what's about to happen" and "we have no idea what will happen farther out", but in both of those cases, that information is already well-known to everyone.

In the case of history, we can't know (the analogue of the fact) that the point spends 54% of its time on the left wing; the possibility space is too large. Imagine a butterfly that explores 17,000 dimensions instead of 2, with a "cycle length" much longer than all of human history. Without the ability to observe one cycle, how would we justify a figure of 54%? (This question has direct application to "futurist" predictions. What was the empirical probability, in 320 BC, that Iran would have a nuclear bomb in AD 2241? Is it different today than it was 2340 years ago? Is the _theoretical_ probability different today than it was 2340 years ago?)

We can do near-term prediction, identifying a point in the past close to present location, and seeing where it went over a short period. If we're presenting a strong analogy, we probably have a close match on a number of dimensions somewhere in the dozens. Are the other 16,000+ dimensions going to make a difference this time? Yes! Are they going to make a difference on the two dimensions we want to predict? Maybe!

† Why is Firefox flagging "unsurprised" for being misspelled? This is a common word without any alternative spelling!

I think that even in a chaotic system, I can make statistical predictions. The pinnacle of that is thermodynamics.

Climatologists can not predict if a Hurricane will happen two years from now with certainty, because Earth atmosphere is a chaotic system. They can still make a reasonable guess at the number of Hurricanes we will have in 2027.

Likewise, 583 can not divine who will win the next presidential election. However, there are statistical models based on historic data which can give you some probabilities for outcomes.

edited Mar 22Bluntly, so what?

The existence of questions that do have well-defined probabilistic answers is not even theoretical evidence that other questions have equally well-defined probabilistic answers.

I listed many predictions that are easy to make. But when you're thinking about a question, you want to know whether the tools you're using apply to it. Assuming that they do is not a good approach.

edited Mar 28I followed this argument a little at the time, and found Devereaux's position unconvincing. He wanted to say that historical knowledge was beneficial in informing decision-making, while never being pinned down on what conclusions were being justified by what knowledge. If you can't say anything about what is more likely in response to a given question, whether or not you express it numerically, then you are ignorant about that question however much erudition about other facts garlands your ignorance on the specific point. If it doesn't allow you to do that, how is the historical information improving your understanding of the point at issue? To my mind Bret had no answer to this question.

I don't know anything about the Lorenz butterfly, but as best I can read your post it seems that you are saying that there are some aspects of its movement that are reasonably predictable based on the past data about how it has moved and others that are not. In that case, presumably percentages or other such indications of how likely something is could be given at least for those things that can be effectively predicted. I didn't understand Bret to be saying that historical study allowed for probabilistic predictions on the answers to some questions but not to others.

If history is a good guide to at least some questions about the future, it should be possible to give indications of likelihood about those questions. Bret was arguing that this was never possible based on the study of history, yet that the study of history nonetheless informed those predictions in some valuable but unpindownable way. As someone who studied history at university I would like to believe that it has this kind of value, but if you're not willing to defend the claim that it actually improves your ability to predict events - and be put to the test accordingly - I am not convinced that is sustainable.

Just buy a hardcover copy of ET Jaynes' book and throw it on those people's heads, duh

Quoting: “My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology”

Sorry if this is a naive question but if there are RCTs comparing vaccines to placebos (and not using other vaccines as placebos) with a long enough follow up to diagnose autism I would be keen to see a reference. Just asking because I thought while all the people claiming there is a link are frauds, we didn’t actually have evidence at the level of ‘sure about this as we are about anything in biology’.

Are there RCTs for "drinking monkey blood" relationship to autism? What about RCTs for breakfasts causing car crashes - a lot of people end up in a crash after eating breakfast, after all.

You can't expect to have RSTs for every made-up "connection", there has to be at least a plausible mechanism, or no useful work will ever be done. Wakefield literally made up the numbers for his publication, to the everlasting shame of whoever peer-reviewed his garbage and approved it for publishing.

> Are there RCTs for "drinking monkey blood" relationship to autism? What about RCTs for breakfasts causing car crashes - a lot of people end up in a crash after eating breakfast, after all.

Yeah, but the point is that Scott specifically distinguishes vaccines from these sorts of cases, saying:

> My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology

So Michael is asking for references to those.

I'm not sure that's what he's asking for? Hope he'll chime in!

He's specifically asking for placebo-controlled RCTs. Not every well-conducted study has to be that specific kind.

Sure, and if you said 'not every well-conducted study has to be that specific kind', that would be a valid response.

But responding to a request for RCTs by comparing it to the monkey blood example ignores that Scott had specifically contrasted the level of evidence between the two, and said that there was so much evidence that "we're as sure about this as we are about anything in biology".

The claim that vaccines don't cause autism particularly *un*like the claim that monkey blood doesn't cause autism, specifically in that it *does* have positive evidence against it. So saying, to a request for that evidence, 'well monkey blood causing autism doesn't have that kind of evidence either' doesn't seem like an appropriate response.

I was just under the impression that RCTs are the gold standard and since Scott said that we are about as sure of this as anything in biology, I assumed that meant multiple large, long term RCTs.

But just asked because he said dozen of studies exist, so I was curious if anyone could expand on that a bit.

Kenny Easwaran posted a link to a pretty good meta-analysis.

There are few medical questions about which we are so sure as about this. I remember an interview about recommendation for funding agencies. The recommendation explicitly mentioned that no more money should be spent on the vaccine-autism question because this was one of the few questions that were answered beyond any reasonable doubts.

The thing is that at some point it seemed like a realistic claim that vaccines caused autism (because some fraud paper made up data), so this was studied really intensively for several decades. It is one of the most researched questions in medicine, and the results were super-clear after the fraud papers had been retracted.

I would absolutely not say “as sure as we are about anything in biology”. Nothing based on a few dozen studies of tens of thousands of individuals could make us as confident about an effect size being zero as we are about the general truth of evolution, or that humans have eyes or whatever.

But there are good studies: https://kettlemag.co.uk/wp-content/uploads/2016/01/meta-analysis_vaccin_autism_2014.pdf

Reading that meta-analysis, they report the “odds ratio” of autism among vaccinated and unvaccinated children (that is, the odds of autism with vaccination (number with autism divided by number without) divided by the odds without vaccination (number with aitism divided by number without)) as “OR”. Almost all the studies had a point estimate of OR that was less than 1, and all of them had confidence intervals that included 1, except for a couple studies where the entire confidence interval was below 1.

edited Mar 21I don't have any objections to phrasing things as "x% likely", and I do this colloquially sometimes, and I know lots of people who take this very seriously and get all the quoted benefits from it, and my constant thought when asked to actually do it myself or to take any suggested "low probability" numbers at face value is "oh God", because I'm normed on genetic disorders.

Any given disorder is an event so low-probability that people use their at-birth prevalences as meme numbers when mentioning things they think are vanishingly unlikely ("Oh, there's no *way* that could happen, it's, like, a 0.1% chance"). It turns out "1 in 1000" chances happen! When I look at probability estimates as percentages, I think of them in terms of "number of universes diverging from this point for this thing to likely happen/not happen". Say a 5% chance, so p=1–(0.95)^n. The 50%-likelihood-this-happens-in-one-universe marker is about p=1–(0.95)^14, which is actually a little above 50%. So 14 paths from this, it'll "just above half the time" happen in one of those paths? ~64% likelihood in 20 paths, ~90% in 45 paths (often more than once, in 45 paths). The probabilities people intuitively ascribe to "5%" feel noticeably lower than this.

"There's a 1 in 100,000 chance vaccines cause autism"? I've known pretty well, going about my daily life, absolutely unselected context, not any sort of reason to overrepresent medically unusual people, someone with a disorder *way* rarer than 1/100k! Probability estimates go much lower than people tend to intuit them. We think about "very unlikely" and say numbers that are more likely than people's "very unlikely" tends to be, when you push them on it, or when you watch how they react to those numbers (people estimating single-digit-percentage chances of a nuclear exchange this year don't seem to think of it as that likely).

I would never get on a plane that had a 1 in 1,000 chance of crashing, and wouldn’t like a 1 in 100,000 chance either! But I think for a general claim like “vaccines cause autism”, it’s very hard to reasonably come to probabilities lower than those, while it’s much easier for single cases like “this plane will crash”, when we can repeat the trial hundreds of thousands of times a day around the world.