505 Comments

The funniest thing about those arguments you're rebutting is that the average of a large number of past 0-or-1 events is only an estimate of the probability of drawing a 1. In other words, the probabilities they're saying are the only ones that exist, are unknowable!

Expand full comment

That's right. Even in a simple case of drawing balls from a jar where you don't know how many balls of each color there are, the last ball has a 100% 'real' probability of being a specific color, you just don't know which one.

I'd say that all probabilities are about "how to measure a lack of information". If you have a fair coin it's easy to pretend that probability is objective because any people present have the exactly same information. But as long as there are information differences then different people will give different probabilities. And it's not the case that some of them are wrong. They just lack different information. But for some reason people expect that if you're giving a number instead of a feeling then you are claiming objectivity. They're just being silly. There are no objective probabilities outside of quantum mechanics and maybe not even there. It's all lack of information, it's just that there are some scenarios where it's easy for everyone to have exactly the same lack.

Expand full comment

Okay, after years of this I think I have a better handle on what's going on. It's reasonable to pull probabilities because you can obviously perform an operation where something obviously isn't 1% and obviously isn't 99%, so then you're just 'arguing price.' On the other hand, it's reasonable for people to call this out as secretly rolling in reference class stuff and *not* having done the requisite moves to do more complex reasoning about probabilities, namely, defining the set of counterfactuals you are reasoning about and their assumptions and performing the costly cognitive operations of reasoning through those counterfactuals (what else would be different, what would be expect to see?). When people call BS on those not showing their work, they are being justly suspicious of summary statistics.

Expand full comment

The thing is, if they’re wanting to call out those things, they should do that, rather than attacking the concept of probability wholesale.

Like, there are good reasons to not invest in Ponzi schemes, but “but money is just a made-up concept” is one of the least relevant possible objections.

Expand full comment

👏🏼 👏🏼 👏🏼

Expand full comment

A legend about accuracy versus precision that you may have heard but that I think is applicable:

As the story* goes: When cartographers went to measure the height of Mount Everest, they had a hard time because the top is eternally snow-capped, making it nigh-impossible to get an exact reading. They decided to take several different readings at different times of year and average them out.

The mean of all their measurements turned out to be exactly 29,000 feet.

There was concern that such a number wouldn’t be taken seriously. People would wonder what digit they rounded to. A number like that heavily implies imprecision. The measures might explain and justify in person, but somewhere down the line that figure ends up in a textbook (not to mention Guinness), stripped of context.

It’s a silly problem to have, but it isn’t a made-up one. Their concerns were arguably justified. We infuse a lot of meaning into a number, and sometimes telling the truth obfuscates reality.

I’ve heard more than one version of exactly how they arrived at 29,028 feet, instead—the official measurement for decades. One account says they took the median instead of the mean. Another says they just tacked on some arbitrary extra.

More recently, in 2020, Chinese and Nepali authorities officially established the height to be 29,031 feet, 8.5 inches. Do you trust them any more than the cartographers? I don’t.

All of which is to say, it makes sense that our skepticism is aroused when we encounter what looks like an imbalance of accuracy and precision. Maybe the percentage-giver owes us a little bit more in some cases.

* apocryphal maybe, but illustrative, and probably truth-adjacent at least

Expand full comment

>More recently, in 2020, Chinese and Nepali authorities officially established the height to be 29,031 feet, 8.5 inches. Do you trust them any more than the cartographers? I don’t.

For your amusement: I just typed

mount everest rising

into Google, and got back

>Mt. Everest will continue to get taller along with the other summits in the Himalayas. Approximately 2.5 inches per year because of plate tectonics. Everest currently stands at 29,035 feet.Jul 10, 2023

Expand full comment

They should have just measured in meters, problem solved.

Expand full comment

Mentioning Yoshua Bengio attracts typos:

"Yoshua Bengio said the probability of AI causing a global catastrophe everybody is 20%"

"Yosuha Bengio thinks there’s 20% chance of AI catastrophe"

Expand full comment

So it seems, indeed.

Expand full comment

> you’re just forcing them to to say unclear things like “well, it’s a little likely, but not super likely, but not . . . no! back up! More likely than that!”, and confusing everyone for no possible gain.

There's something more to that than meets the eye. When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many. 20%? 30%? Would anyone stake much on the difference?

That's why (in an alternate world where weird new systems were acceptable for twitter dialouge), writing probabilities in decimal binary makes more sense. 0b0.01 express 25% with no false implication that it's not 26%. Now, nobody will learn binary just for this, but if you read it from right to left, it says "not likely (0), but it might happen (1)." 0b0.0101 would be, "Not likely, but it might happen, but it's less likely than that might be taken to imply, but I can see no reason why it could not come to pass." That would be acceptable with a transition to writing normal decimal percentages after three binary digits, when the least significant figure fell below 1/10.

Expand full comment

I like this idea. And you can add more zeroes on the end to convey additional precision! (I was going to initially write here about an alternate way of doing this that is even closer to the original suggestion, but then I realized it didn't have that property, so oh well. Of course one can always manually specificy precision, but...)

Expand full comment

but isn't this against the theory? if we were mathematically perfect beings we'd have every probability to very high precision regardless of the "amount of evidence". before reading this post I was like hell yea probabilities rock, but now I'm confused by the precision thing.

I guess we might not know until we figure out the laws of resource-constrained optimal reasoning 😔

Expand full comment

Unrelated nitpick, but we already know quite a lot about the laws of resource-constrained optimal reasoning, for example AIXI-tl, or logical induction. It's not the end-all solution for human thinking, but I don't think anyone is working on resource-constrained reasoning on the hope of making humans think like that, because humans are hardcodedly dumb about some things.

Expand full comment
Mar 22·edited Mar 22

Is there a tl;dr; for what resource-constrained reasoning says about how many digits to transmit to describe a measurement with some roughly known uncertainty? My knee-jerk reaction is to think of the measurement as having maybe a gaussian with sort-of known width centered on a mean value, and that reporting more and more digits for the mean value is moving a gaussian with a rounded mean closer and closer to distribution known to the transmitter.

Is there a nice model for the cost of the error of the rounding, for rounding errors small compared to the uncertainty? I can imagine a bunch of plausible metrics, but don't know if there are generally accepted ones. I assume that the cost of the rounding error is going to go down exponentially with the number of digits of the mean transmitted, but, for reasonable metrics, is it linear in the rounding error? Quadratic? Something else?

Expand full comment

If you think your belief is going to change up or down by 5% on further reflection, that's your precision estimate. Rounding error can be propagated through the normal techniques for error propagation (see any source on scientific calculation).

There is no rule or formula for the precision of resource-constrained reasoning, because you aren't guaranteed to order the terms in the process of deliberation from greatest to smallest. Instead, I use repeated experiments as my example of a belief you're expecting to change within known bounds in the future, to show why most probabilities have limited precision.

Expand full comment

Many Thanks!

>Instead, I use repeated experiments as my example of a belief you're expecting to change within known bounds in the future, to show why most probabilities have limited precision.

Sure, that makes sense.

>Rounding error can be propagated through the normal techniques for error propagation (see any source on scientific calculation).

True. Basically propagating through the derivatives of whatever downstream calculation consumes the probability distribution estimate... For the case of a bet, I _think_ this comes down to an expected cost per bet (against an opponent who has precisely calibrated probabilities) that is the value of the bet times the difference between the rounded mean and the actual mean. Is that it?

Expand full comment

If you are trying to figure out whether a coin is fair, the average number of heads per flip among a large number of experimental trials serves as your best estimate of its bias towards heads. Although you have that estimate to an infinite number of digits of precision, your estimate is guaranteed to change as soon as you flip another coin. That means the "infinite precision of belief," although you technically have it, is kind of pointless.

To put it another way, if you expect the exact probabilistic statement of your beliefs to change as expected but unpredictable-in-the-specifics new information comes in, such as further measurements in the presence of noise, there's no point in printing the estimate past the number of digits that you expect to stay the same.

Expand full comment

Here's a way to interpret that "infinite precision of belief": if you bet according to different odds than precisely that estimator, you'll lose money on average. In that sense, the precision is useful however far you compute it, losing any of that precision will lose you some ability to make decisions.

Your conclusion about forgoing the precision that is guaranteed to be inexact is wrong. Consider this edge case: a coin will be flipped that is perfectly biased; it either always comes up head or always tail; you have no information about which one it is biased towards. The max-entropy guess in that situation is 1:1 head or tail, with no precision at all (your next guess is guaranteed to be either 1:0 or 0:1). Nonetheless, this guess still allows you to make bets on the current flip whereas you'd just refuse any bet if you followed your advice.

Expand full comment

> losing any of that precision will lose you some ability to make decisions.

The amount of money you'd lose though opportunity cost in a betting game like that decreases exponentially with the number of digits of precision you're using. To quote one author whose opinions on the subject agree with mine,

"That means the 'infinite precision of belief,' although you technically have it, is kind of pointless."

;)

Compare this situation with the issue of reporting a length of a rod that you found to be 2.015mm, 2.051mm, and 2.068mm after three consecutive measurements. I personally would not write an average to four digits of precision.

Expand full comment

Wouldn't it be best to assign a range to represent uncertainty? Or give error bars?

So, for a generic risk you could say something like 6(-2/+5)% chance of X outcome occurring.

Expand full comment

Yes, and the next step is to give a probability distribution.

Expand full comment

I'm wondering how to interpret a range or distribution for a single future event probability. My P(heads) for a future flip of a fair coin, and for a coin with unknown fairness, would both be 0.5. In both cases I have complete uncertainty of the outcome. Any evidence favoring one outcome or the other would shift my probability towards 0 or 1. Even knowing all parameters of the random process that determines some binary outcome, shouldn't I just pick the expected value to maximize accuracy? In other words, what kind of uncertainty isn't already expressed by the probability?

Expand full comment

It's epistemic vs aleatory uncertainty. The way the coin spins in mid air is aleatory i.e. "true random", while the way it's weighted is a fact that you theoretically could know, but you don't. The distribution should represent your epistemic uncertainty (state of knowledge) about the true likelihood of the coin coming up heads. You can improve on the epistemic part by learning more.

Sometimes it gets tough to define a clear line between the two - maybe Laplace's demon could tell you in advance which way the fair coin will go. But in many practical situations you can separate them into "things I, or somebody, might be able to learn" and "things that are so chaotic and unpredictable that they are best modeled as aleatory."

Expand full comment

Epistemic vs aleatory is just fancy words for Bayesian vs frequentist, no? Frequentists only measure aleatory uncertainty, Bayesian probability allows for both aleatory and epistemic

Expand full comment

Hmmm... frequentists certainly acknowledge epistemic uncertainty. I guess they're sometimes shy about quantifying it. But when you say p < 0.05, that's a statement about your epistemic uncertainty (if not quite as direct as giving a beta distribution).

Expand full comment

It's the probability that you will encounter relevant new information.

You could read it as "My probability is 0.5, and if I did a lot of research I predict with 80% likelihood that my probability would still lie in the range 0.499--0.501," whereas for the coin you suspect to be weighted that range might be 0.9--0.1 instead.

Small error bars mean you predict that you've saturated your evidence, large error bars mean you predict that you could very reasonably change your estimate if you put more effort into it. With a coin that I have personally tested extensively, my probability is 0.5 and I would be *shocked* if I ever changed my mind, whereas if a magician says "this is my trick coin" my probability might be 0.5 but I'm pretty sure it won't be five minutes from now

Expand full comment

Single future event probabilities, in the context of events that are unrelated to anything you can learn more about during the time before the event, are the cases where the meaning of "uncertain probability" is less clear. That is why rationalists, who prioritize thinking about AI apocalypses and the existence of God, will tell you that "uncertain probability" doesn't mean anything.

However in science, the expectation that your belief will change in the future is the rule, not the exception. You don't know which way it will change, but if you're aware of the precision of your experiments so far you'll be able to estimate by how much it's likely to change. That's what an "uncertain probability" is.

Expand full comment

This is the language way to interpret probabilities, and so is correct. If you say you found half of people are Democrat, it means something different than 50.129% of people you found to be Democrat.

Yet it's subject to abuse, especially to those with less knowledge of statistics, math, or how to lie. If my study finds 16.67% of people to go to a party store on a Sunday, it's not obvious to everyone that my study likely had only six people in it.

There are at least three kinds of lies: lies, damned lies, and statistics.

Expand full comment

Why not “1/4”?

Expand full comment

Because it's hard to pronounce 1/4 with your tongue in your cheek. ;-)

Expand full comment

A central issue when discussing significant digits is the sigmoidal behaviour, eg the difference between 1% and 2% is comparable to the difference between 98% and 99%, but NOT the same as the difference between 51% and 52%. So arguments about significant digits in [0, 1] probabilities are not well-founded. If you do a log transformation you can discuss significant digits in a sensible way.

Expand full comment

What would I search for to get more information on that sigmoidal behavior as it applies to probabilities? I've noticed the issue myself, but don't know what to look for to find discussion of it. The Wikipedia page for 'Significant figures' doesn't (on a very quick read) touch on the topic.

Expand full comment

Try looking up the deciban, the unit for that measure: https://en.m.wikipedia.org/wiki/Hartley_(unit)

Expand full comment

Ah, yeah, that does seem like a good starting point, thanks. For anyone else who's interested, this short article is good:

http://rationalnumbers.james-kay.com/?p=306

Expand full comment

Many Thanks!

Expand full comment

This has been on my mind recently, especially when staring at tables of LLM benchmarks. The difference between 90% and 91% is significantly larger than 60% and 61%.

I've been mentally transforming probabilities into log(p/(1-p)), and just now noticed from the other comments that this actually has a name, "log odds". Swank.

Expand full comment

Why are you adding this prefix “0b0” to the notation? If you want a prefix that indicates it’s not decimal, why not use something more transparent, like “bin” or even “binary” or “b2” (for “base 2”)?

Expand full comment

That notation is pretty standard in programming languages. I do object to this being called "decimal binary" though. I'm not sure what exactly to call it, but not that. Maybe "binary fractions".

Expand full comment

I think "binary floating-point" is probably the least confusing term.

Expand full comment

It's not actually floating point though. That'd be binary scientific notation, like 0b1.011e-1100.

Expand full comment

I kind of like it in principle, but...

— Why not use 25% then? That's surely how everyone would actually mentally translate it and (think of/use) it: "ah, he means a 25% chance."

— Hold on, also I realize I'm not sure how ².01 implies precision any less than 25%: in either case, one could say "roughly" or else be interpreted as meaning "precisely this quantity."

— Per Scott's original post, 23% often *is*, in fact, just precise enough (i.e., is used in a way meaningfully distinct from 22% and 24%, such that either of those would be a less-useful datum).

— [ — Relatedly: Contra an aside in your post, one sigfig is /certainly/ NOT too many: 20% vs 30% — 1/3 vs 1/5 — is a distinction we can all intuitively grasp and all make use of IRL, surely...!]

— And hey, hold on a second time: no one uses "94.284" or whatever, anyway! This is solving a non-existent problem!

-------------------------

— Not that important, and perhaps I just misread you, but the English interpretation you give of ².0101 implies (to my mind) an event /less likely/ than ².01 — (from "not likely but maybe" to "not likely but maybe but more not likely than it seems even but technically possible") — but ².0101 ought actually be read as *more* sure, no? (25% vs 31.25%)

— ...Actually, I'm sorry my friend, it IS a neat idea but the more I think about those "English translations" you gave the more I hate them. I wouldn't know WTF someone was really getting at with either one of those, if not for the converted "oh he means 25%" floatin' around in my head...

Expand full comment

> Per Scott's original post, 23% often *is*, in fact, just precise enough

I strongly object to your use of the term "often." I would accept "occasionally" or "in certain circumstances"

(Funnily enough, the difference between "occasionally" and "in certain circumstances" is what they imply about meta-probability. The first indicates true randomness, the second indicates certainty but only once you obtain more information)

Expand full comment

I intuitively agree that any real-world probability estimate will have a certain finite level of precision, but I'm having trouble imagining what that actually means formally. Normally to work out what level of precision is appropriate, you estimate the probability distribution of the true value and how much that varies, but with a probability, if you have a probability distribution on probabilities, you just integrate it back to a single probability.

One case where having a probability distribution on probabilities is appropriate is as an answer to "What probability would X assign to outcome Y, if they knew the answer to question Z?" (where the person giving this probability distribution does not already know the answer to Z, and X and the person giving the meta-probabilities may be the same person). If we set Z to something along the lines of "What are the facts about the matter at hand that a typical person (or specifically the person I'm talking to) already knows?" or "What are all the facts currently knowable about this?", then the amount of variation in the meta-probability distribution gives an indication of how much useful information the probability (which is the expectation of the meta-probability distribution) conveys. I'm not sure to what extent this lines up with the intuitive idea of the precision of a probability though.

Expand full comment

I was thinking something vaguely along these lines while reading the post. It seems like the intuitive thing that people are trying to extract from the number of digits in a probability is "If I took the time to fully understand your reasoning, how likely is it that I'd change my mind?"

In your notation, I think that would be something like "What is the probability that there is a relevant question Z to which you know the answer and I do not?"

Expand full comment

It is really easy to understand what the finite number of digits means if you think about how the probability changes with additional measurements. If you expect the parameter to change by 1% up or down after you learn a new fact, that's the precision of your probability. For example, continually rolling a loaded dice to figure out what its average value is involves an estimate that converges to the right answer at a predictable rate. At any point in the experiment, you can calculate how closely you've converged to the rate of rolling a 6 within 95% confidence intervals.

It's only difficult to see this when you're thinking about questions that have no streams of new information to help you answer them - like the existence of God, or the number of aliens in the galaxy.

Expand full comment

I like Scott's wording of "lightly held" probabilities. I think this matches what you are describing about the sensitivity of a probability estimate to the answer of an as-yet unanswered question Z.

Expand full comment

Okay, hear me out: only write probabilities as odds ratios (or fractions if you prefer), and the number of digits is the number of Hartleys/Bans of information; you have to choose the best approximation available with the number of digits you're willing to venture.

Less goofy answer: Name the payout ratios at which you'd be willing to take each particular side of a small bet on the event. The further apart they are, the less information you're claiming to have.

Expand full comment

The idea of separate bid and ask prices is a very good way to communicate this concept to finance people, thanks for that.

Expand full comment
Mar 22·edited Mar 22

>When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many.

Ok, though isn't this question orthogonal to whether the number represents probabilities?

This sounds more like a general question of whether to represent uncertainty in some vanilla measurement (say of the weight of an object) with the number of digits of precision or guard digits plus an explicit statement of the uncertainty. E.g. if someone has measured the weight of an object 1000 times on a scale on a vibrating table, and got a nice gaussian distribution, and reported the mean of the distribution as 3.791 +- 0.023 pounds (1 sigma (after using 1/sqrt(N))), it might be marginally more useful than reporting 3.79 +- 0.02 if the cost of the error from using the rounded distribution exceeds the cost of reporting the extra digits.

Expand full comment

Yes, this is exactly the same. In your example you are measuring a mass, in my examples you're measuring the parameter of a Bernoulli distribution. For practical reasons, there's always going to be a limited number of digits that it's worth telling someone when communicating your belief about the most likely value of a hidden parameter.

Expand full comment

Many Thanks!

Expand full comment

This is one of those societal problem where the root is miscommunication. And frankly it's less of a problem than just the fact of life. I remember Trevor Noah grilling Nate silver that how could Trump win presidency when he predicted that Trump has only in 1/3 chance of winning. It was hilarious is some sense. Now this situation is reverse of what Scott is describing where the person using the probability is using it accurately but the dilemma is same: lack of clear communication.

Expand full comment

Lack of clear/logical thinking on Trevor Noah's side.

Expand full comment

Yes but most people think of probability like that. They think that probability of below 50% equates to an event being virtually impossible. It's like how many scientists make stupid comments on economics without understanding it's terms.

Expand full comment

Most people ... . Most people - outside Korea and Singapore - can not do basic algebra (TIMMS). Most people are just no good with stochastics in new contexts. Most journalists suck at statistics. Many non-economists do not get most concepts of macro/micro - at least not without working on it. Does not make the communication of economists or mathematicians or Nate Silver less clear. 37+73 is 110. Clear. Even if my cat does not understand. - Granted, Nate on TV could have said: "Likely Hillary, but Trump has a real chance." - more adapted to IQs below 100 (not his general audience!). Clearer? Nope.

Expand full comment

"more adapted to IQs below 100 (not his general audience!)"

Huh, have you seen the comments section on his substack? It's an absolute cesspool. I don't think I've read another substack with such a high proportion of morons and/or trolls in the comments (though I haven't read many).

Expand full comment

I did and agree. He is not writing those comments, is he? - Writing: "who will win: Trump or Biden" will attract morons/trolls. Honey attracts flies just as horseshit does. - MRU comments are mostly too bad to read either.

Expand full comment

I didn't blame him for the comments (though I assume he's publically decided not to moderate them?), I was responding to "his general audience".

Expand full comment

I agree that the comments are a cesspool, but as far as I know its idiocy borne out of misdirected intellect rather than actual lack of mental horsepower.

If I see a mistake, oftentimes it's something like "Scott says to take X, Y and Z into account. I am going to ignore that he has addressed Y and Z and claim that his comments in X are incorrect!" I would expect someone dumb to not even comprehend that X was mentioned, much less be able to give a coherent (but extremely terrible) argument for this.

I think it's also partially confounded by the existence of... substack grabbers? Don't know what a good term for this type of person is. But when I see a low quality comment, without the background that an ACX reader """should""" have, I'll scroll up and see it's a non regular writing a substack. Which I would guess means that they're sampling from the general substack or internet population.

Expand full comment

I have encountered people who use the term "50/50" as an expression of likelihood with no intent of actual numerical content but merely as a rote phrase meaning "unpredictable." On one occasion I asked for a likelihood estimate and was told "50/50," but when I had them count up past occurrences it turned out the ratio was more like 95/5.

Expand full comment

I still intuitively feel this way. I know that 40% chance things will happen almost half the time, but I can't help but intuitively feel wronged when my Battle Brothers 40% chance attack doesn't hit.

Expand full comment

Charles Babbage said, "On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

Expand full comment

Hey sometimes this works now with AI - an LLM figures out what you meant to ask and answers that instead of your dumb typo. Those guys were just 200 years ahead of their time.

Expand full comment

Babbage apparently thought the questions [from two MPs, presumably in 1833] were asked in earnest. If so, I think the likeliest explanation is just Clarke's 3rd Law: "Any sufficiently advanced technology is indistinguishable from magic."

Expand full comment

I think this is mainly conflict theory hiding behind miscommunication. Because there's no agreed upon standard of using numbers to represent degrees of belief (even the very concept of "degrees of belief" is pretty fraught!), people feel that they have license to dismiss contrary opinions outright and carry on as they were before.

Expand full comment
Mar 21·edited Mar 21

There is an argument to be made that we should stop using probabilities in public communications if people don't understand how they work.

Expand full comment

Sometimes you might want to transmit useful information to the non-retarded part of the public even at the cost of riling up the others.

Expand full comment

What's the probability of that argument working?

Expand full comment

Yes and I suspect this problem gets worse when you start getting to things with 10% probability happening, or things with 90% probability not happening, and so on.

Expand full comment

I hesitate to imagine what fraction of the public can deal with independent random variables...

Expand full comment

I think the answer to that is to try to raise the sanity waterline.

Also, I am not sure that the issue here is just probability illiteracy.

If Nate had predicted that there is a 1/3 chance of a fair dice landing on 1 or 2, nobody would have batted an eye if two came up. Point out that the chances of dying in a round of Russian roulette are just 1/5 or so, and almost nobody will be convinced to play, because most people can clearly distinguish a 20% risk from a 0% risk.

Part of the problem is that politics is the mind killer. There was a clear tendency of the left to parse "p(Trump)=0.3" (along with the lower odds given by other polls) as "Trump won't win", which was a very calming thought on the left. I guess if you told a patient that there was a 30% chance that they had cancer, they would also be upset if you revised this to 100% after a biopsy. (I guess physicians know better than to give odds beforehand for that very reason.)

Expand full comment

Sooner or later, everyone wants to interpret probability statements using a frequentist approach. So, sure you can say that the probability of reaching Mars is 5% to indicate that you think it's very difficult to do, and you're skeptical that this will happen. But sooner or later that 5% will become the basis for a frequentist calculation.

If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. It's just inevitable.

It's also very obscure how to assign numerical probabilities to degrees of belief. For instance, suppose we all agree that there is a low probability that we will travel to Mars by 2050. What's the probability value for that? Is it 5%, 0.1%, or 0.000000001%? How do we adjudicate between those values? And how do I know that your 5% degree of belief represents the same thing as my 5% degree of belief?

Expand full comment

Assuming you have no religious or similar objections to gambling, the standard mutually intelligible definition of a 5% degree of belief is a willingness to bet a small fraction of what's in your wallet (for example, one cent if you have $20), at better than 20:1 odds.

It has to be a small fraction because the total value you place on your money is only linear for small changes around a given number of dollars, otherwise it tends to be logarithmic for many.

Expand full comment

> the standard mutually intelligible definition of a 5% degree of belief is a willingness to bet a small fraction of what's in your wallet (for example, one cent if you have $20), at better than 20:1 odds.

> It has to be a small fraction because the total value you place on your money is only linear for small changes around a given number of dollars, otherwise it tends to be logarithmic for many.

This isn't right; by definition, people are willing to place fraudulent bets that only amount to "a small fraction of what's in their wallet". They do so all the time to make their positions appear more certain than they really are; your operationalization is confounded by the fact that support has a monetary value independently of whether you win or lose. Placing a bet buys you a chance of winning, and it also buys you valuable support.

Expand full comment

You can amend this to an anonymous bet.

Expand full comment
Mar 21·edited Mar 21

It still won't work; consider the belief "I will win the lottery."

Fundamentally, it is correct to say that a 5% degree of belief indicates willingness to bet at 20:1 odds in some window over which the bet size is not too large to be survivable and not too small to be worthwhile, but it is not correct to say that willingness to bet indicates a degree of belief (which is what you're saying when you define degree of belief as willingness to bet), and that is particularly the case when you specify that the amount of the bet is trivial.

Expand full comment
Mar 21·edited Mar 21

Sure. Also, I do think that it's reasonable to invest trivial amounts of money in fair lottery tickets given certain utility functions. For example, if a loss is negligible but you value extremely highly the possibility of imminent comfortable retirement. I don't do this myself because I believe that in my part of the world lotteries are rigged and the chance to win really big is actually zero.

Expand full comment

> Also, I do think that it's reasonable to invest trivial amounts of money in fair lottery tickets given certain utility functions.

I agree. But this is not compatible with the definition of "degree of belief" offered above; that definition requires that lottery ticket purchasers do not believe the advertised odds are correct.

Expand full comment

"I will win the lottery" has a large fantasy component, where people spend time thinking about all the things they could buy with that money.

An anonymous bet with a small fraction of your wealth does not have that fantasy component. "Look at all the things I could buy with this 20 cents I won!" just doesn't do the same thing.

Because it's anonymous you're not pre-committing anything. I suppose some people might brag about making that bet?

Expand full comment

This is spot on, and a good illustration of why I believe prediction markets are going to have problems as they scale up in size and importance.

Expand full comment

Is it actually a problem for prediction markets, though? People betting at inaccurate odds for emotional (or other) reasons just provides a subsidy for the people who focus on winning, resulting in that market more accurately estimating the true likelihood. Certainly markets can be inaccurate briefly, or markets with very few participants can be inaccurate for longer, but it's pretty easy to look past short-term fluctuations.

Or maybe you're thinking of it being a problem in some different way?

Expand full comment

> standard mutually intelligible definition

FWIW this is just the de Finetti perspective, and you could have others, more based around subjective expectations.

Like, I think the reduction works a bunch of the time, but I don't think you can reduce subjective belief to willingness to bet

Expand full comment

Unless the enemy has studied his Agrippa, which I have.

Expand full comment

Of course, this assumes that people will be willing to bet on an outcome given some odds ratio.

It might well be that some people would not rationally bet on something.

For example, given no further information, I might not be willing to bet on "Alice and Bob will be a couple a month from now" if I have no idea who Alice and Bob are and anticipate that other people in the market have more information. Without knowing more (are Alice and Bob just two randomly selected humans? Do they know each other? Are they dating?) the Knightian uncertainty in that question is just too high.

Expand full comment

Thank you! You are making **exactly** my point -- that although people start out by talking about subjective "degrees of belief", sooner or later they will fall into some sort of frequentist interpretation. There can be no purer expression of this than making an argument about betting, because ultimately you are going to have to appeal to some sort of expected value over a long run of trials, which is by definition a frequentist interpretation.

Expand full comment

Doesn't it just mean if we "ran" our timeline 1000 times, I predict we reach Mars 50 in 50 of those timelines?

In other words, it is still frequentist in a sense, but over hypothetical modalities.

Expand full comment

Not necessarily. I think the probability that P=NP is about 5%, but I don’t think that if we “ran” the universe that many times it would be true in 5% of them - it would either be true in all or none.

Instead it means that, if I had to do something right now whose outcome depended on P being equal NP, I would do it if the amount by which it made things better if they were = is more than 20 times better than the amount by which it made things worse if they were unequal. I need some number or other to coordinate my behavior here and guarantee that I don’t create a book of bets that I am collectively guaranteed to lose (like if I paid 50 cents for a bet that gives me 90 cents if horse A wins and also paid 50 cents for a bet that gives me 90 cents if horse A loses). But the number doesn’t have to express anything about frequency - I can use it even for a thing that I think is logically certain one way or the other, if I don’t know which direction that is.

Expand full comment

I think the P=NP example (and the Mars example, if you believe in a deterministic universe) can still be approached this way if we define 'rerunning the timeline' as picking one of the set of possible universes that would produce the evidence we have today.

Expand full comment

It can be if you accept logical impossibilities as “one of the set of possible universes that would produce the evidence we have today”.

Expand full comment

I'm not sure I follow. If the evidence we have today is insufficient to show that P != NP, how is it logically impossible for universes to exist where we have the same evidence and P does or does not equal NP?

Expand full comment

Most people suspect that the mathematical tools we have *are* sufficient to show one of the directions of the P and NP conjecture, it's just that no human has yet figured out how to do it.

Expand full comment

Even if it turns out not to be provable, it still either is or isn't, right? Universes where the starting conditions and the laws of physics are different are easy to imagine. Universes where *math* is different just bend my mind in circles. If you changed it to switch whether P=NP, how many other theorems would be affected? Would numbers even work anymore?

Expand full comment

Interesting that you should say that. I was thinking about the timelines of possible universes going *forward from the present* — and when it comes to the Mars example, what is the set of possible universes that will *prevent* us from getting to Mars in 2050 vs the set that will *allow* us to get to Mars in 2050? think we can agree (or maybe not?) that there is an infinity of universes for the former, but a smaller infinity of universes for the latter. After all, the set of universes where the sequence of events happen in the proper order to get us to Mars would be smaller than the set of universes where the sequence of events didn't happen (or couldn't happen). If these were sets of numbers we could create a bijection (a one-to-one correspondence) between their elements. But no such comparison is possible between these two sets, and the only thing we can say with surety is that they don't have the same cardinality. Trying to calculate the proportionality of the two sets would be impossible, so determining the probability of universes where we get to Mars in 2050 and universes that don't get us to Mars in 2050 would be a nonsensical question. I'm not going to die on this hill, though. Feel free to shoot my arguments down. ;-)

Expand full comment

I think this is a fascinating objection, thanks.

I've never been any good at reasoning about infinities in this way (Am I so bad at math? No, it's the mathematicians who are wrong!), but I've spotted a better out so excuse me while I take it:

I do disagree that these are infinite sets; I think they're just unfathomably large. If there are 2^(10^80) possible states of the universe one Planck second after any given state, the number of possible histories at time t quickly becomes unrepresentable within our universe. It's a pseudoinfinite number that is not measurably different from infinity in a practical sense, but saves us all of the baggage of an infinite number in the theoretical sense.

If you accept that premise (and I don't blame you if you don't), I believe we're allowed to do the standard statistics stuff like taking a random sample to get an estimate without Cantor rolling in his grave.

Expand full comment

I like your counter-argument. But I'll counter your counter with this — if the number of potential histories of our universe going forward is impossible to represent within our universe, then it's also impossible to represent the chances of getting to Mars in 2050 vs not getting to Mars in 2050. Ouch! My brain hurts!

Expand full comment

> Not necessarily. I think the probability that P=NP is about 5%, but I don’t think that if we “ran” the universe that many times it would be true in 5% of them - it would either be true in all or none.

I think this is a semantic/conceptual disagreement. I think there are two points where we can tease it apart:

* You're thinking of the world as deterministic, and I as predicated on randomness to a significant degree. If the future depends on randomness, then it makes no sense to claim it would be true in all or none. Whereas if the future is determine by initial conditions and laws of nature, then yes, it will be. In which case:

* You can adapt my conceptualisation such that it survives a deterministic world. A determinist would believe that the future is determined by known and unknown determinants, but nonetheless fixed, ie the laws of nature and initial conditions both known and unknown. To give x a 5% probability, then, is to say that if, for each "run" of the universe you filled in the unknown determinants randomly, or according to the probability distribution you believe in, you would get event x occurring 50/1000 times.

Correct me if I'm mistaken in my assumptions about your belief, but I don't know how else to make sense of your comment.

Expand full comment

I think that the P vs NP claim is most likely a logical truth one way or the other, and that no matter how you modify the known or unknown determinants, it will come out the same way in all universes.

If you have some worries about that, just consider the claim that the Goldbach conjecture is true (or false), or the claim that the 1415926th digit of pi is a 5.

Expand full comment

I was with you until the last bit Kenny (per Wolfram Alpha, the 1415926th digit of pi is known to be a 7:-))

Expand full comment

Isn’t your 5% estimate essentially meaningless here though? Since it is a proposition about a fundamental law and you know it is actually either 100% or 0%. And more importantly no other prediction you make will bear sufficient similarity to this one that grouping them yields any helpful knowledge about future predictions.

Expand full comment

Your first point shows that P=NP doesn’t have a *chance* of 5% - either all physically possible paths give P=NP or none of them do.

Your second point shows that a certain kind of frequentist calibrationism isn’t going to make sense of this either.

But Bayesian probability isn’t about either of those (regardless of what Scott says about calibration). Bayesian probability is just a way of governing your actions in light of uncertainty. I won’t make risky choices that come out very badly if P=NP is false, and I won’t make risky choices that come out 20 times as badly if it’s true compared to how well the choice turns out if it’s false. That’s what it means that my credence (Bayesian probability) is 5%. There is nothing that makes one credence “correct” and another “incorrect” - but there are people who have credence-forming policies that generally lead them well and others that generally lead them badly. And the only policy that avoids guaranteed losses is for your credences to satisfy the probability axioms and to update by Bayesian conditionalization on evidence.

Expand full comment

The thing I don’t get is that if it is not a frequentist probability, how can it make sense to applies Bayes theorem to an update for P=NP? Say a highly respected mathematician claims he has a proof of it, then promptly dies. This is supposed to be Bayesian evidence in favour of P=NP. But does it make sense to apply a mathematical update to the chance of P=NP? Surely it is not an event that conforms to a modelable distribution, since as you say it is either wrong or right in all universes.

Expand full comment

What Bayes' Theorem says is just that P(A|B)P(B)=P(A&B). (Well, people often write it in another form, as P(A|B)=P(B|A)P(A)/P(B), but that's just a trivial consequence of the previous equation holding for all A and B.)

To a Bayesian, P(A) (or P(B), or P(A&B)) just represents the price you'd be willing to pay for a bet that pays $1 if A (or B, or A&B) is true and nothing otherwise (and they assume you'd also be willing to pay scaled up or down amounts for things with scaled up or down goodness of outcome).

Let's say that P(B|A) is the amount you plan be willing to pay for a bet that pays $1 if B is true and nothing otherwise, in a hypothetical future where you've learned A and nothing else.

The first argument states that this price should be the same amount you'd be willing to pay right now for a bet that pays $1 if A&B are both true, nothing if A is true and B is false, and pays back your initial payment if A is false (i.e., the bet is "called off" if A is false). After all, if the price you'd be willing to pay for this called off bet is higher, then someone could ask you to buy this called off bet right now, as well as a tiny bet that A is false, and then if A turns out to be true they sell you back a bet on B at the lower price you'd be willing to accept after learning A is true. This plan of betting is one you'd be willing to commit to, but it would guarantee that you lose money. There's a converse plan of betting you'd be willing to commit to that would guarantee that you lose money if the price you'd be willing to pay for the called off bet is lower than the price you'd be willing to pay after learning that A is true. The only way to avoid the guaranteed loss is if the price you're willing to pay for the called off bet is precisely equal to the price you'd be willing to pay for a bet on B after learning A.

But a bet on B that is called off if A is false is precisely equal to a sum of three bets that you're willing to make right now - and we can check that if your prices don't precisely satisfy the equation P(B|A)P(A)=P(A&B), then there's a set of bets you're willing to make that collectively guarantee that you'll lose money.

https://plato.stanford.edu/entries/dutch-book/#DiacDutcBookArgu

There's nothing objectively right about the posterior P(B|A), just that if you are currently committed to betting on B at P(B), and you're currently committed to betting on A&B and P(A&B), then you better be committed to updating your price for bets on B to P(B|A) if you learn A (and nothing else) or else your commitments are self-undermining.

All that there is for the Bayesian is consistency of commitment. (I think Scott and some others want to say that some commitments are objectively better than others, such as the commitments of Samotsvety, but I say that we can only understand this by saying that Samotsvety is very skilled at coming up with commitments that work out, and not that they are getting the objectively right commitments.)

Expand full comment

And this is my point -- that statements about "degrees of belief" will inevitably be translated into statements about long-run frequencies.

Expand full comment

I do not think that 5% to reach Mars could ever be interpreted as a frequentist probability.

If you are one of the mice which run the simulation that is Earth, and decide to copy it 1000 times and count in how many of the instances the humans reach Mars by 2050, then you can determine a frequentist probability.

If you are living on Earth, you would have to be very confused about frequentism to think that such a prediction could ever be a frequentist probability.

Expand full comment

<I just couldn't resist>

>If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. _It's_ _just_ _inevitable_.

groan [emphasis added]

</I just couldn't resist>

Expand full comment

I wonder if there's a bell curve relationship of "how much you care about a thing" versus "how accurately you can make predictions about that thing". E.g. do a football teams' biggest fans predict their outcomes more accurately or less accurately than non-supporters? I would guess that the superfans would be less accurate.

If that's the case, "Person X has spent loads of time thinking about this question" may be a reason to weigh their opinion less than that of a generally well-calibrated person who has considered the question more briefly.

Expand full comment

I think you're conflating bias vs "spent loads of time thinking about this question" as the cause of bad predictions. The latter group includes all experts and scientists studying the question, and probably most of the best predictions. It also includes the most biased people who apply no rational reasoning to the question and have the worst predictions. You're better off considering bias and expertise separately than just grouping them as people who spenta lot of time thinking about something.

Expand full comment

I do think spending lots of time thinking about a subject can contribute to developing a bias about it, though

Expand full comment

Me too. But you still have to consider the two factors separately. You can't just reason that since physicists have spent a lot of time thinking about physics, they're probably biased and shouldn't be trusted about physics.

Expand full comment

Sure you can. Greybeards with pet theories they won't give up because they're so invested in them are a thing. It's why Planck wrote that "science advances one funeral at a time" - eminent physicists really are a hindrance to accurate physics.

Expand full comment
founding

Compared to some platonic ideal of Science!, perhaps. Compared to any real-world alternative that *isn't* eminent physicists, the eminent physicists will give you the most accurate understanding of the physical universe.

Expand full comment

On the one hand, bookies do find that the home team gets bet on a lot more than their actual odds of winning.

On the other hand, superfan could mean either "I go to all the games" or "I spend a lot of time thinking about the sport". The latter do tend to be able to put aside their affiliation when there's money on the line.

I think the bookies' experience has more to do with a lot of naïve fans flinging out on a game once in a while. It does tend to happen in the playoffs more than the regular season after all.

Expand full comment

In answer to your first question, I find good value in betting on things where there is emotional investment, ie elections and sporting events with local heroes involved.

Expand full comment

If Eliezer thought that p(AI doom) was 10^-10, then he would not have spent years working on that question.

On the one hand, I would think it is wrong to privilege the opinion on the existence of God of someone with a degree in theology based on the fact that they have deep domain knowledge, because few atheists study theology.

On the other hand, if a Vulcanologist told me to evacuate, I would privilege his opinion over that of a random member of the public. (It might still be overly alarmist because of bad incentives.)

Most other professionals fall somewhere in this spectrum. You are much more likely to study astronomy / climate change / race relations / parapsychology / homeopathy if you believe that stars (as "luminous objects powered by fusion") / antropogenic climate change / systemic racism / ESP / memory of water exists. Of course, once you are in a field, you are disproportionally subject to the mainstream opinions in that field, publication bias and so on. Trying to publish an article arguing that stars are actually immortalized Greek heroes in an astronomy journal will probably not work.

So to judge "is X for real or are all the practitioners deluding themselves" is not easy to answer.

Expand full comment

<mild snark>

It is not uncommon , for _many_ fields, for practitioners in that field to answer "Is your field important?" affirmatively. :-)

</mild snark>

Expand full comment

What comments triggered this? I saw Yann LeCun has made the Popperian argument that no probability calculus is possible.

A great project would be to trace how language has evolved to talk about probability, from before Pascal to now.

Expand full comment

A great project indeed! A book waiting to be written (if it has not already been written. I 'd put the probability of that at 35 percent at least, though:)

Expand full comment

The Emergence of Probability by Ian Hacking covers this topic. Prior to the modern generalized concept of probability, the word meant something very different. Quoting from https://en.wikipedia.org/wiki/Probability:

> According to Richard Jeffrey, "Before the middle of the seventeenth century, the term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.

Expand full comment

https://www.amazon.co.uk/Everything-Predictable-Remarkable-Theorem-Explains-ebook/dp/B0BXP3B299

(it's not really the book you're talking about, but it's not COMPLETELY dissimilar and since I wrote it I'm keen to push it)

Expand full comment

Thanks. In addition to this and Hacking I've found James Franklin's Science of Conjecture: "The words “reasonable” and “probably” appear in modern English with a frequency and range of application very much higher than their cognates in other European languages, as do words indicating hedges relating to degrees of evidence such as “presumably,” “apparently,” “clearly.” These phenomena indicate an Anglophone mindset that particularly values attention to the uncertainties of evidence.25"

Expand full comment
Mar 21·edited Mar 21

Your suggestion to call different concepts of probability different names ("shmrobability") for metaphysical reasons actually makes complete sense. Maybe call frequentist probability "frequency", and Bayesian probability "chance" or "belief", with "probability" as an umbrella term. The different concepts are different enough that this would be useful. "The frequency of picking white balls from this urn is 40%." Sounds good. "The frequency of AI destroying mankind by 2050 is 1%" Makes no sense, as it should; it happens or not. "The chance of AI destroying mankind by 2050 is 1%." OK, now it makes sense. There we go!

Expand full comment

I don't think the fraction of the ensemble of possible worlds has a word in the English language that means it.

Expand full comment
Mar 21·edited Mar 21

Hmmm, how about "proportion" or "ratio"? e.g. "I think there's a 1% proportion of AI causing human extinction."

While we are talking about all the different interpretations of "probability", I had might as well plug the Stanford Encyclopedia of Philosophy article on it: https://plato.stanford.edu/entries/probability-interpret/#ClaPro

It would be hard to think of easy English terms for all six main interpretations they list, but hardly impossible. We can even create new words if necessary!

Expand full comment

On reflection, I've come around to thinking that any expression that brings up an ensemble of possible worlds would be apt to cause more philosophical sidetracking than "probability" itself!

Expand full comment

Now, would that be the quantum mechanical many-worlds-interpretation ensemble, the cosmological multiple-independent-inflationary-bubbles ensemble, or the string theory multiple possible compactifications ensemble? :-)

Expand full comment

Why assume that there is a finite number of possible worlds to begin with?

If the number of possible worlds is infinite, which to me seems intuitively likely, then any subset of possible worlds will either be ~0% of the whole (if finite) or an undefined fraction (if the subset is itself infinite).

Expand full comment
Mar 21·edited Mar 21

I'm not sure they're meaningfully different, at least not in a way that separates flipping a coin from questions like "will we get to Mars before 2050". If I have a fair coin that I've never flipped before, does it make sense to say the frequency of getting heads is 50%? If I've only ever flipped it twice and it came up heads both times, is the frequency of heads 100%? If yes, then the frequency is bad for making predictions. It's just a count of outcomes in some reference class.

But if we consider the frequency of heads to be 50% even though the coin came up heads both times, then it's because we're using "frequency" to mean the frequency you would hypothetically get if you repeated the experiment many times. But this sounds a lot like chance. If you hypothetically repeated the experiment of "will we get to Mars before 2050" many times, there would also a frequency of how many times we'd get there. Sure, we can't actually repeat experiment in real life, but the same is true of a coin that I won't let anyone flip.

For both the coin flip and the chance of getting to Mars, we come up with a mathematical model and that model gives us a probability. The models exist as pure math and have a well defined probability, but they never perfectly match the real life event we're trying to model. E.g. no real coin is perfectly balanced and has exactly a 50% chance of landing on heads.

Expand full comment

"For both the coin flip and the chance of getting to Mars, we come up with a mathematical model and that model gives us a probability. The models exist as pure math and have a well defined probability, but they never perfectly match the real life event we're trying to model."

Sure, but its important that the models match the real life events well enough to be useful. There is no way to determine whether that is the case or not other than defining what "well enough" means, and then comparing the outcomes of experiments to the predictions of the model.

I think it's feasible to conduct enough coin flip experiments to validate the mathematical model to within some acceptable tolerance, and that's why I don't object to talking about the probability of a particular outcome for a coin flip.

I don't think it's feasible to conduct enough "getting to Mars" experiments to validate the mathematical models that people might propose, and that's why I object to trying to assign it a probability.

You can try to come up with a reference class for "getting to Mars" that is large enough that you can validate a mathematical model of the reference class, but I don't think it actually helps, because you still have to validate that getting Mars actually belongs in that reference class, and that brings you back to the need for experiments.

You can also try to break "getting to Mars" down into a series of steps that are themselves feasible to model and validate, and then combine them. But I still don't think it helps you, for two reasons. The first goes back to what I already said -- you still have to experimentally verify that you've captured all of the relevant steps, and that still requires experimentation. The second, deeper objection is that I would draw a distinction between reducible and irreducible complexity. I think "getting to Mars" is irreducibly complex, i.e. it can't be broken down into discrete components that can each individually be well modeled. At least not with our current understanding of economics, politics, and sociology.

Expand full comment
Mar 21·edited Mar 21

The same is true of the coin though. You're using the reference class of other coins that have been flipped and experimentally verified to match the model that p(Heads) = 0.5. But my penny is a different coin. No two pennies are manufactured exactly alike. How do you know that other coins are a good reference class for my coin? You'd have to do more experiments with my coin to ensure it belongs in that reference class.

At the end of the day, you're making the judgement call that me flipping the penny in my pocket is similar to other people flipping other coins. That's a good judgement call. But it's not categorically different than judging that our model for rocket launches will apply to a rocket to Mars. The difference is only one of degree. We're highly confident in applying our coin flipping model to new coins, and somewhat less confident in applying our rocket model to new destinations.

Expand full comment

>But my penny is a different coin. No two pennies are manufactured exactly alike. How do you know that other coins are a good reference class for my coin? You'd have to do more experiments with my coin to ensure it belongs in that reference class.

Agreed. One can (almost) never eliminate some reference class tennis. ( Perhaps with elementary particles or identical nuclei one can. Chemistry would look different if electrons were distinguishable... )

Expand full comment

I agree that no two pennies, or anything really, are exactly alike. There will always be some tolerance for differences when defining your reference class, and it's always possible to set the tolerances so strictly that the reference class is N=1. But I still think there are qualitative differences between coin flipping and getting to Mars (which I interpreted as "land a group of humans on Mars and bring them back safely").

First, even though no two coins are exactly the same, it's feasible to experimentally measure how similar a large number of coins minted in US mints are when it comes to the distribution of heads and tails. We could even go more granular and look at coins from a particular mint, or coins from a particular production run, and we can still collect enough coins and flip them enough times to model the probability distributions pretty well. You can't do that for getting to Mars.

Of course, there is always a chance that your coin was somehow defective, but we can even quantify the fraction of defective coins pretty well and incorporate that into the model. If you are being sneaky and telling me your coin is a standard quarter, when in fact it's fraudulent. I'll admit that's probably impossible to fully incorporate something like dishonesty into a model. If you think that makes coin flipping and getting to Mars qualitatively similar, then we can agree to disagree.

But even if the reference class really is N=1, then you can at least verify your assumptions by flipping that particular coin a bunch of times. Now we've moved from prediction to postdiction, but I still think this is a meaningful distinction when compared to something like sending people to Mars, that we will struggle to even do once, at least at first.

If "getting to Mars" were merely rocket science, then I think you could argue that it's qualitatively similar to flipping a coin, and it's just a difference of degree. After all, we have successfully (and unsuccessfully) sent rockets to Mars. The sample size may not be as large as the number of coin flips we could do in a day, but it's large enough that there is useful information to be gleaned. And rocket science, while very complex, is closer to what I called reducible complexity than irreducible complexity (I think...I'm not a rocket scientist).

But if by "getting to Mars" we mean "land humans on Mars and bring them back" then I think there is a qualitative difference with flipping a coin. A manned mission to Mars has not been done even once -- it hasn't even been attempted once. And it involves a lot more than just physics and biology. There are economic and political and social factors that I don't think can be modeled and verified in the same way as flipping a coin or even launching a rocket.

Finally, concepts like reducible vs irreducible complexity, how similar two things have to be before you can group them in the same class, and how closely a model has to match observations to be considered "good enough" involve subjectivity, and I think of them as existing on a continuous spectrum. That means you can't set a clear dividing line and say, everything to right is one kind of thing, and everything to the left is another kind of thing. The boundaries are fuzzy. But I still think two things that exist on a spectrum can have qualitative differences when they are are far enough apart.

Expand full comment

I'm sympathetic to this approach, but there's an issue where declaring literally anything to *really* be a valid instance of "frequency" invites an argument about interpretations of quantum mechanics. No easy escape from metaphysical arguments in a deterministic universe!

Expand full comment

Most philosophers use “chance” for something different from either frequency or credence (our term for Bayesian probability). “Chance” is used for an objective physical probability that may or may not ever be repeatable and that people may or may not ever wonder about to form a credence. Standard interpretations of quantum mechanics suggest that there are chances of this sort (though my own preferred interpretation, the many worlds interpretation, gives those numbers a slightly different meaning than “chance”).

Expand full comment

> Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

This seems literally wrong to me. Probability and information both measure (in a very technical sense) how surprising various outcomes are. I think they may literally be isomorphic measures, with the only difference being that information is measured in bits rather than per cents.

Your examples are also off-base here. The probability of a fair coin coming up heads when I'm in California is 1/2, and the probability of a fair coin coming up heads when I'm in New York is 1/2, and we wouldn't say that probability is not the same thing as information because 1/2 does not capture what state I'm flipping the coin in. Similarly, the difference between the first two scenarios is not E[# heads / # flips] but E[(# heads / # flips)^2] - (E[# heads / # flips])^2 aka the expected variance of the distribution is different. This is because (1) is well-modelled by *independent* samples from a known distribution, while in (2) the samples are correlated (aka you need a distribution over hyper parameters if you want to treat the events as independent).

I also noticed you didn't touch logical / computationally-limited probability claims here, like P(the 10^100 digit of pi is 1) = 1/10.

Expand full comment

I think he meant "level of information" as in pages of text you could write about it, not the information-theoretic entropy sense of the word that you're rightly saying doesn't work in that sentence.

Expand full comment

Right. “This is because 50% isn’t a description of how much knowledge you have, it’s a description of the balance between different outcomes.” He presents scenarios where you have a lot of useless knowledge (the knowledge about the process is made useless by arbitrarily tagging one outcome as heads and the other as tails). Probability is related to the amount of knowledge you have that can be leveraged to cast a prediction. Learning Moby Dick by heart won’t help you predict whether Trump or the other guy will win the election.

Expand full comment
Mar 21·edited Mar 21

Parts of this are unobjectionable, and other parts are very clearly wrong.

It is perfectly fine to use probabilities to represent beliefs. It is unreasonable to pretend the probabilities are something about the world, instead of something about your state of knowledge. Probabilities are part of epistemology, NOT part of the true state of the world.

You say "there's something special about 17%". No! It's just a belief! Maybe the belief is better than mine, but please don't conflate "belief" with "true fact about the world".

If Samotsvety predicts that aliens exist with probability 11.2%, that means they *believe* aliens to exist to that extent. It does not make the aliens "truly there 11.2% of the time" in some metaphysical sense. I can feel free to disagree with Samotsvety, so long as I take into account their history of good predictions.

(Side note: that history of good predictions may be more about politics and global events than it is about aliens; predicting the former well does not mean you predict the latter well.)

----------

Also, a correction: you say

"It’s well-calibrated. Things that they assign 17% probability to will happen about 17% of the time. If you randomly change this number (eg round it to 20%, or invert it to 83%) you will be less well-calibrated."

This is false. It is easy to use a simple rounding algorithm that guarantees the output is calibrated if the input is calibrated (sometimes you can even *increase* calibration by rounding). If I round 17% to 20% but also round a different 23% prediction to 20%, then it is a mathematical guarantee that if the predictions were calibrated before, they are still calibrated.

Calibration is just a very very bad way to measure accuracy, and you should never use it for that purpose. You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

Expand full comment

I think the part about calibration answers the first half of your comment: these numbers have real-world validity in that you can bet on them and make more money on expectation than someone using different numbers.

For the second half of the comment: if I understand your argument correctly, it only holds if there are equal numbers of predictions on either side. If Samotsvety says 17% for impeachment and 23% for Mars (or 10% and 90%), and you round those both to 20% (or 50%) and bet accordingly, then, yes, you'll make as much money as someone who used the unrounded numbers.

But if they predicted 17% (or 10%) for *both* events, and you rounded *both* to 20% (or 50%) and bet accordingly, then you'd lose money on expectation compared to someone using the unrounded numbers.

And this applies even more strongly if there's a whole slate of unrelated events, like in a prediction contest. If you threw away all the answers (from Samotsvety or the wisdom of crowds or whoever) and rounded everything to 50%, or even to one significant digit, then you would be losing information, and would be losing money if you bet based on those rounded numbers.

Expand full comment

If two people both have functions that satisfy the probability axioms, then each one makes more money on expectation than the other, calculating expectations by means of their own probability function. This is just a property of probability functions generally, that doesn’t pick out a single special one.

If Samotsvety puts 17% for impeachment, and I round it off to 10% and you round it off to 30%, and we all make a bet, then the person who does best will be either you or I. Neither of us could systematically do better than Samotsvety. But there absolutely could be someone who is perfectly equally good to Samotsvety in general, despite giving slightly different numbers to every event than Samotsvety does.

Expand full comment

"these numbers have real-world validity in that you can bet on them and make more money on expectation than someone using different numbers."

There's no "in expectation" when considering a single event.

Also, you're conflating "make money in expectation" with "be calibrated". They are not quite the same. I do agree that the 17% can be interpreted in a way that is meaningful, and that I won't be able to round. Calibration is the wrong way to do this.

That I can't round 17% still does not make 17% "objectively true", since another equally skilled predictor can predict 93% on the same question and they can both be right (in the sense that they both get the same calibration and the same Brier score on their entire batch of predictions). The issue is that you can never judge accuracy or calibration on a single prediction, but only on a batch of them.

It's really much better to think of 17% as a measure of belief, not as an objective fact about the world. If I have insider info, maybe I already know the event will happen 100% of the time, so then what does the 17% mean? It just CAN'T be a fact about the world; it doesn't make sense!

Expand full comment

If we suppose that quantum events tend not to influence politics overly much, then 17% for impeachment is likely not true.

If God decided to open Omniscient Forecasts Ltd, he might state the impeachment odds as 99% or 1% instead. If he was correct, he would beat Samotsvety in his prediction score, and everyone would use his predictions.

Samotsvety 17% are just our best guess because we do not have a Laplacian Daemon or the best possible psychological model of every American.

Expand full comment

Where in the post did he claim that 17% represented a true fact about the world?

Expand full comment

It's implied in several places, including in other posts by Scott. In this particular post, we have, for example:

"I think the best way to describe this kind of special number is “it’s the chance of the thing happening”, for example, the chance that Joe Biden will be impeached."

Expand full comment

So? Do you also object to people saying that the chance of a fair coin landing on heads on the next throw is 50%? As far as I can tell, these statements are equivalent.

Expand full comment

First you claim Scott never said it, and now you claim he's right to say it?

Anyway, a fair coin comes from an implied set of repeatable events (other fair coin flips). That's not true for most things you predict.

What's the probability that the 100th digit of pi is 5? You can say "10%", but you can also just look it up and see that the digit is 9, so the probability is actually "0%" (unless the source you looked up was lying, I guess, so maybe 1%?). Which one is right? They both are: the probabilities represent *your state of belief*, not an objective fact about the world.

Maybe it would be clearer if I asked you "what's the probability the 1st digit of pi is 5"? That should make it more obvious that the answer is not just "10%" in some objective sense. The answer depends on how many digits of pi you know! It's a property of your state of knowledge, not a property of the world.

Expand full comment

>First you claim Scott never said it, and now you claim he's right to say it?

Yes, insofar as it's ordinary use of language, even if it elides certain metaphysical subtleties. You do agree of course, that there's no true fact about a fair coin that is represented by the number 50%?

Expand full comment
Mar 22·edited Mar 22

Hmm... Maybe one way of thinking about it is that there are estimates of probability that are difficult to impossible to _improve_? In a Newtonian world, if one knew everything about a coin toss of a fair coin to infinite precision and had infinite computing capacity, then the odds of heads or tails on a given toss would be 0%/100% or 100%/0%, but, in practice, no one can have that information. And in many situations, one can often say "The information doesn't exist" e.g. about the results of an experiment that has not yet been run. And, in some of those cases, the best existing estimate of the probability looks less like a subjective description of one's state of mind and more like a publicly known value.

Expand full comment
Mar 22·edited Mar 22

With a fair coin, you can view it in two metaphysical frames: you could either (a) say the probability it lands heads is "truly" 50%, by viewing the coinflip as an instance out of the larger class of "all fair coinflips", which one can show together approach half heads and half tails; or (b) you can say this given coin, like the rest of the universe, is deterministic and the probability is about our state of knowledge. I allow both frames.

For more practical matters, though, frame (a) does not work. That's because (1) there is no natural class like "all fair coinflips" from which you drew your sample, and (2) not everyone has the same state of belief about the event. There is no longer any way to pretend that 17% is a fact of the universe instead of a property of your belief state. Some individual congressmen might have perfect knowledge of whether Biden will be impeached; how then could 17% be a fact of the universe?

You've tried to gotcha me, and I've answered your question carefully. Now answer mine: is the statement "10% chance the 100th digit of pi is 5" really a factual claim about the world instead of a claim about your state of knowledge? How could that be, when I just looked it up and know the 100th digit?

Expand full comment

I don't think that's necessary for the objection to hold. If I understand correctly, Scott says that if you are well calibrated, then 17% represents a true fact about your state of knowledge about the world. But the objection is that two people can be well calibrated, share the same knowledge, and assign totally different probabilities to events.

Expand full comment

Yes, Scott is sloppy with his treatment of calibration, but I think that the main thrust of his post is obviously true, and people eager to nitpick him should first acknowledge that, if they prioritize discourse improvements over being excessively pedantic.

Expand full comment

>You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

This changed my mind more than any other comment. However, now I wish I could see some empirical data as to how often this happens. Are superforecasters of similar calibration really assigning totally different probabilities to events? I could believe this happening if people just have different knowledge and expertise. But if it's based on publicly available knowledge, then I would expect high quality predictions to cluster.

Expand full comment

It's a good question, and I don't know the answer. My guess is that the superforecasters would generally assign similar probabilities to normal questions they see a lot and have kind of developed a model for (e.g. elections) but would probably wildly diverge on some questions, including some that you might care more about (e.g. AI risk).

Expand full comment

There's also a clear selection bias for calibration based on time-to-event. Say we're both superforecasters working on predicting the development of display technologies. You and I were both well calibrated on predicting the rise of flat screens and LED, the overhyped 3D movement, and HD/4K. We both predicted which technologies would take hold, and how long it would take them to mature to the point of becoming standard (or not).

You predict with 90% certainty that intra-cranial image projection will happen by 2075, but I predict this is only 10% likely. How can this be true if we're both well calibrated?

All our well-calibrated predictions are on technologies <25 years old, and were likely made with time horizons of less than 10 years. Yet this is the basis for appeals to our accuracy on much longer time horizons.

If someone predicted in 1980 that high definition flat screen TVs with a wider color gamut would be the norm in 2024, that would be interesting. If they accurately predicted date ranges within which these technologies would be adopted and mature, that would give me more confidence that this same person making predictions today had useful insights for 40+ years into the future.

When people say, "sure you predicted events within a 5 year time horizon, but I'm not convinced you're able to predict with accuracy 50 years out" that's not them irrationally ignoring the calibration curves. It's accurately discerning the limits of the data.

Expand full comment

I'm curious about the demand that probabilities come with meta-probabilities. Would it not anyway be satisfied by Jayne's A_p distribution?

Expand full comment

Assume there is a one-shot event with two possible outcomes, A and B. A generous, trustworthy person offers you a choice between two free deals. With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs. By adjusting X, and under some mild(ish) assumptions, the threshold value of X behaves a helluva lot like a probability.

Expand full comment

Or at least, it better. If it doesn’t, then I can either come up with a set of bets you will take that guarantee you lose, or come up with a set of bets that you are collectively guaranteed to win that you won’t take any of.

Expand full comment
Mar 22·edited Mar 22

Am I being dense, or is there a typo somewhere?

>With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs.

if we scale this down by X we get

With Deal 1, you get $1 if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $1 if B occurs.

Which looks like the threshold probability for switching deals is always 50%, independent of X.

Was the payoff for B in Deal 2 supposed to be $(1-X)? Or maybe $1? That would make X most like a probability (I think).

Expand full comment

You're absolutely right! Thanks for the catch. Deal B should be $1. In that case, p(A) is the inverse of X when X is set such that Deals A and B are equally attractive. Falls apart at the ends because people are terrible at small probabilities and large numbers, but it does suggest that there's something that acts an awful lot like a probability, even in the absence of a frequentist interpretation.

Expand full comment

Many Thanks!

Expand full comment

I feel that it's in a sense a continuation of the argument about whether it's OK to say that there's a 50% chance that bloxors are greeblic (i.e. to share raw priors like that). The section "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" specifically leans into that, and I disagree with it.

Suppose I ask you what are the chances that a biased coin flips heads. You tell me, 33%. It flips heads and I ask you again. In one world you say "50%", in another you say "34%", because in the first world most of your probability estimate came from your prior, while in the second you actually have a lot of empirical data.

That's two very different worlds. It is usually very important for me to know which one I'm in, because sure if you put a gun to my head and tell me to bet immediately, I should go with your estimate either way, but in the real world "collect more information before making a costly decision" is almost always an option.

There's nothing abstruse or philosophical about this situation. You can convey this extra information right this way, "33% but will update to 50% if a thing happens", with mathematical rigor. Though of course it would be nice if Bayesians recognized that it's important and useful and tried to find a better way of conveying the ratio of prior/updates that went into the estimate instead of insisting that a single number should be enough for everyone.

And so, I mean, sure, it's not anyone's job to also provide the prior/updates ratio, however it might look like, to go along with their estimates (unless they are specifically paid to do that of course), and people can ask them for that specifically if they are interested. But then you shouldn't be surprised that even the people who have never heard about the Bayes theorem still might intuitively understand that a number "50%" could come entirely from your prior and should be treated us such, and treat you with suspicion for not disclosing it.

Expand full comment

Best comment in the thread so far.

Expand full comment

Yes, it would be better to use a distribution to represent our beliefs instead of a single probability. When we have a fair coin, our distribution is over the options: 1) the coin will come up tails, 2) the coin will come up heads.

When we have an unknown coin, the distribution is over the kind of bias the coin could be, and without further knowlesge, we have a uniform distribution over [0,1] representing all the ways in which the coin could be biased. When someone asks us what the probability of heads on the next flip will be, we use this distribution to compute this probability.

Expand full comment

Exactly. The distribution for p(heads) of a fair coin is a dirac function at 0.5.

The prior distribution for some process which returns either "head" or "tail" should be that p follows a constant distribution over the unit interval. (depending what you assume about the process, you might want to add small dirac peaks at 0 and 1.)

The expected value of p is 0.5 in either case, but the amount of knowledge you have is very different.

Of course, if you state your prior distribution, that will also tell people how much you would update.

Expand full comment

In frequentist terms, if you've flipped a coin 100 times, you know more than if you flipped it 10 times, because you have more data and you can estimate the bias more precisely. The variance tells you this, or if you plot the probability distribution, you get this too.

If we switch to Bayesian reasoning, there is jargon about this, like having a "flat prior". Maybe we should use that more? Predictions about AI seem like the sort of thing where the probability distribution should be pretty flat?

Is there better terminology for talking about this informally?

Expand full comment

I am glad, there is no new example of: ' "Do bronkfels shadwimp?" is binary, no one knows, thus: 50% chance. ' As in the "coin which you suspect is biased but you’re not sure to which side" - which IS 50%. - If A ask about bronkfels and knows/pretends to know: may be 50%. If no one around knows: Chance of a specific verb working with a specific noun; which is less than 1%. - "Are the balls in this bag all red?": around 4%. No surprise if they are, even if you did not know. - "Are 20% purple with swissair-blue dots?" I'd be surprised. And would not believe you did not know before. - "Are they showing pictures of your first student?": 50% really?

Expand full comment

And in particular, it's certainly inconsistent to give a 50% chance to *all* of a) all the balls in this bag have a photo on them, b) all the balls in this bag have a photo of me on them, c) all the balls in this bag have a photo of Scott on them, d) all the balls in this bag have a photo of your first student on them.

Expand full comment

Maybe the bag is 50% to be empty?

Expand full comment

> Whenever something happens that makes Joe Biden’s impeachment more likely, this number will go up, and vice versa for things that make his impeachment less likely, and most people will agree that the size of the update seems to track how much more or less likely impeachment is.

There is a close parallel here to the same issue in polling, where the general sense is that the absolute level determined by any given poll is basically meaningless - it's very easy to run parallel polls with very similar questions that give you wildly different numbers - but such polls move in tandem, so the change in polled levels over time is meaningful.

Expand full comment

That sounds reasonable.

For some types of polls and some types of comparisons the opposite can happen. "Which nation is happiest", examined by asking "Are you happy or unhappy" on a 1-10 scale, where "happy" is in different languages with somewhat different meanings...can get messy... To some extent, similar questions in a single nation, spread across periods long enough that meanings shift, run into the same problem.

Expand full comment
Mar 21·edited Mar 21

Something's been bothering me for a while, related to an online dispute between Bret Devereaux and Matthew Yglesias. Yglesias took the position that, if history is supposed to be informative about the present, then that information should come with quantified probabilities attached. Devereaux took the position that Yglesias' position was stupid.

I think Devereaux is right. I want to draw an analogy to the Lorenz butterfly:

The butterfly is a system that resembles a planet orbiting around and between two stars. There is a visualization here: https://marksmath.org/visualization/LorenzExperiment/

It is famous for the fact that its state cannot be predicted far in advance. I was very underwhelmed when I first found a presentation of the effect - it's very easy to predict what will happen, as long as you're vague about it. The point will move around whichever pole it is close to, until it gets close to the other pole, at which point it will flip. Over time, it broadly follows a figure 8.

You can make a lot of very informative comments this way. At any given time, the point is going to lie somewhere within a well-defined constant rectangle. That's already a huge amount of information when we're working with an infinite plane. And at any given time, the point is engaged in thoroughly characteristic orbital behavior. The things that are hard to predict are the details:

1. At time t, will the point be on the left or on the right?

2. How close will it be, within the range of possibilities, to the pole that currently dominates its movement?

3. How many degrees around that pole will it have covered? (In other words, what is the angle from the far pole, through the near pole, to the point?)

4. When the point next approaches the transition zone, will it repeat another orbit around its current pole, or will it switch to the opposite pole?

If you only have a finite amount of information about the point's position, these questions are unanswerable, even though you also have perfect information about the movement of the point. But that information does let us make near-term predictions. And just watching the simulation for a bit will also let you make near-term predictions.

This seems to me like an appropriate model for how the lessons of history apply to the present. There are many possible paths. You can't know which one you're on. But you can know what historical paths are similar to your current situation, and where they went. The demand for probabilities is ill-founded, because the system is not stable enough, 𝘢𝘴 𝘵𝘰 𝘵𝘩𝘦 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴 𝘺𝘰𝘶’𝘳𝘦 𝘢𝘴𝘬𝘪𝘯𝘨, for probabilities to be assessable given realistic limitations on how much information it's possible to have.

Expand full comment
Mar 21·edited Mar 21

Just because not everything can in principle be long-term predicted, it doesn't mean that nothing can be. Of course, there's the problem that we don't have a universal easy way to separate questions into those categories, but it's not like the only epistemological alternative of "throwing hands up in the air" is particularly enticing.

Expand full comment

This is just chaos theory, yes? The sensitivity of deterministic systems to initial conditions is an important consideration and an awful lot of ink can be spilled on it, but it's well-trod ground and I don't think you're getting any metaphysical ramifications out of it.

Expand full comment
Mar 21·edited Mar 21

A point I have tried to make before wrt Scott's enthusiasm for prediction markets is the difference between information (a) that exists, but is not yet known to you; or (b) that does not exist.

The market is a good way to attract existing information to a central location where it can be easily harvested. But it is not such a good way to learn information that doesn't exist. You can ask whatever question you want, and you'll learn 𝘴𝘰𝘮𝘦𝘵𝘩𝘪𝘯𝘨, but you won't necessarily learn anything 𝘢𝘣𝘰𝘶𝘵 𝘺𝘰𝘶𝘳 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯.

In cases where information about your question doesn't already exist and is difficult to produce, you should not expect to derive much if any benefit from a prediction market.

This framework is applicable here; I suppose I'm arguing that the Lorenz butterfly, and human history, are processes where, for many questions, probabilistic information doesn't really exist, can't exist, and looking for it - and even more so claiming to have it - is a mistake.

Throwing your hands up in the air may not be enticing or prestigious, but that won't make it incorrect.

Expand full comment

Quantum mechanics tells us that some outcomes are inherently undetermined, and information about which outcome will happen does not exist in advance. But probability applies to such processes quite well. We can use probability to quantify our ignorance.

Expand full comment

Mmm... I'm not sure what's going on there is that we're using probability to quantify our ignorance. I think we quantified our ignorance by counting the outcomes of a large number of experiments, and used probability to describe the quantified ignorance.

But that approach will only work when we can make a large number of similar (hopefully, identical) experiments.

Expand full comment

> I suppose I'm arguing that the Lorenz butterfly, and human history, are processes where, for many questions, probabilistic information doesn't really exist, can't exist

"Probabilistic information doesn't really exist" doesn't make too much sense as a claim, let's simplify from a Lorenz system and take the double pendulum: If I tell you that the weight starts on the right, you can confidently say that after a tiny amount of time that the weight will still be on the right, and after a smallish amount of time it will be on the left, and maybe after that it exponentially flattens to a 50/50 chance. But a uniform distribution is most definitely probabilistic information! It's really interesting that the uncertainty in position approaches the maximum so quickly, and throughout the evolution of the system we can absolutely put bounds on the velocity of the weight, how far it is from the pivot, etc.

And of course, it's really really hard to make claims on what *can't* be known. One can sometimes get away with that in certain QM interpretations, but I see you're taking a different track on that downthread.

> In cases where information about your question doesn't already exist and is difficult to produce, you should not expect to derive much if any benefit from a prediction market.

When Scott says "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" he's straightforwardly correct. But we *do* still care about level of information for its own sake e.g. the explore/exploit dichotomy, and in the specific case of prediction markets it's extremely important to know the trade volume as well.

But I think probably the greatest lesson that superforecasting as a learnable discipline has to offer is that what we might think are unique one-off events to which no prior probability can be attributed... might just be a tweaked instantiation of the same patterns.

Expand full comment

Is there evidence that historians have any edge in making predictions at all? https://westhunt.wordpress.com/2014/10/20/the-experts/

Expand full comment

I'm failing to see why you can't assign probabilities to those questions about the point around the Lorenz butterfly. Maybe for a particular Lorenz butterfly, you find the point spends 54% of its time on the left wing. That's already a basis for giving a probability of which side the point will be on at time t. With more information and a better model, you could make even better predictions.

Expand full comment

> Maybe for a particular Lorenz butterfly, you find the point spends 54% of its time on the left wing. That's already a basis for giving a probability of which side the point will be on at time t.

Yes, that's true.

> With more information and a better model, you could make even better predictions.

No, the fact that you can't do this is the point. Or to be more precise, with more information, you can make predictions about times in the near future, and there are very sharply diminishing returns on how far you can extend the reach of "the near future" by gaining more information. If you're halfway around one wing you can make an excellent prediction of how long it will take you to get to the transition zone. Everything immediately after that looks much foggier.

Applying these refinements to history gets us points like:

- We should be able to notice, and become alarmed, when a major event is almost upon us. But maybe not before then.

- The fact that a major event is almost upon us will not be good evidence that that event will actually happen.

- We should be unsurprised† by day-to-day developments almost all of the time.

- We should have no idea, this year, what next year will be like. Maybe the winter solstice will accelerate and occur in August. If that turns out to be what the future holds, we won't know until maybe late July. (Astronomy turns out to be better behaved than history.)

The general idea here, in terms of your analogy, is that we can know that the point spends 54% of its time on the left wing, but that the probabilities we can assign to "will the point be on the left wing at time t" will almost always be restricted to (a) 54%, (b) almost 0%, or (c) almost 100%. Notably, none of those possibilities allows for a prediction market to be helpful; they correspond to "we're quite sure what's about to happen" and "we have no idea what will happen farther out", but in both of those cases, that information is already well-known to everyone.

In the case of history, we can't know (the analogue of the fact) that the point spends 54% of its time on the left wing; the possibility space is too large. Imagine a butterfly that explores 17,000 dimensions instead of 2, with a "cycle length" much longer than all of human history. Without the ability to observe one cycle, how would we justify a figure of 54%? (This question has direct application to "futurist" predictions. What was the empirical probability, in 320 BC, that Iran would have a nuclear bomb in AD 2241? Is it different today than it was 2340 years ago? Is the _theoretical_ probability different today than it was 2340 years ago?)

We can do near-term prediction, identifying a point in the past close to present location, and seeing where it went over a short period. If we're presenting a strong analogy, we probably have a close match on a number of dimensions somewhere in the dozens. Are the other 16,000+ dimensions going to make a difference this time? Yes! Are they going to make a difference on the two dimensions we want to predict? Maybe!

† Why is Firefox flagging "unsurprised" for being misspelled? This is a common word without any alternative spelling!

Expand full comment

I think that even in a chaotic system, I can make statistical predictions. The pinnacle of that is thermodynamics.

Climatologists can not predict if a Hurricane will happen two years from now with certainty, because Earth atmosphere is a chaotic system. They can still make a reasonable guess at the number of Hurricanes we will have in 2027.

Likewise, 583 can not divine who will win the next presidential election. However, there are statistical models based on historic data which can give you some probabilities for outcomes.

Expand full comment
Mar 22·edited Mar 22

Bluntly, so what?

The existence of questions that do have well-defined probabilistic answers is not even theoretical evidence that other questions have equally well-defined probabilistic answers.

I listed many predictions that are easy to make. But when you're thinking about a question, you want to know whether the tools you're using apply to it. Assuming that they do is not a good approach.

Expand full comment
Mar 28·edited Mar 28

I followed this argument a little at the time, and found Devereaux's position unconvincing. He wanted to say that historical knowledge was beneficial in informing decision-making, while never being pinned down on what conclusions were being justified by what knowledge. If you can't say anything about what is more likely in response to a given question, whether or not you express it numerically, then you are ignorant about that question however much erudition about other facts garlands your ignorance on the specific point. If it doesn't allow you to do that, how is the historical information improving your understanding of the point at issue? To my mind Bret had no answer to this question.

I don't know anything about the Lorenz butterfly, but as best I can read your post it seems that you are saying that there are some aspects of its movement that are reasonably predictable based on the past data about how it has moved and others that are not. In that case, presumably percentages or other such indications of how likely something is could be given at least for those things that can be effectively predicted. I didn't understand Bret to be saying that historical study allowed for probabilistic predictions on the answers to some questions but not to others.

If history is a good guide to at least some questions about the future, it should be possible to give indications of likelihood about those questions. Bret was arguing that this was never possible based on the study of history, yet that the study of history nonetheless informed those predictions in some valuable but unpindownable way. As someone who studied history at university I would like to believe that it has this kind of value, but if you're not willing to defend the claim that it actually improves your ability to predict events - and be put to the test accordingly - I am not convinced that is sustainable.

Expand full comment

Just buy a hardcover copy of ET Jaynes' book and throw it on those people's heads, duh

Expand full comment

Quoting: “My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology”

Sorry if this is a naive question but if there are RCTs comparing vaccines to placebos (and not using other vaccines as placebos) with a long enough follow up to diagnose autism I would be keen to see a reference. Just asking because I thought while all the people claiming there is a link are frauds, we didn’t actually have evidence at the level of ‘sure about this as we are about anything in biology’.

Expand full comment

Are there RCTs for "drinking monkey blood" relationship to autism? What about RCTs for breakfasts causing car crashes - a lot of people end up in a crash after eating breakfast, after all.

You can't expect to have RSTs for every made-up "connection", there has to be at least a plausible mechanism, or no useful work will ever be done. Wakefield literally made up the numbers for his publication, to the everlasting shame of whoever peer-reviewed his garbage and approved it for publishing.

Expand full comment

> Are there RCTs for "drinking monkey blood" relationship to autism? What about RCTs for breakfasts causing car crashes - a lot of people end up in a crash after eating breakfast, after all.

Yeah, but the point is that Scott specifically distinguishes vaccines from these sorts of cases, saying:

> My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology

So Michael is asking for references to those.

Expand full comment

I'm not sure that's what he's asking for? Hope he'll chime in!

He's specifically asking for placebo-controlled RCTs. Not every well-conducted study has to be that specific kind.

Expand full comment

Sure, and if you said 'not every well-conducted study has to be that specific kind', that would be a valid response.

But responding to a request for RCTs by comparing it to the monkey blood example ignores that Scott had specifically contrasted the level of evidence between the two, and said that there was so much evidence that "we're as sure about this as we are about anything in biology".

The claim that vaccines don't cause autism particularly *un*like the claim that monkey blood doesn't cause autism, specifically in that it *does* have positive evidence against it. So saying, to a request for that evidence, 'well monkey blood causing autism doesn't have that kind of evidence either' doesn't seem like an appropriate response.

Expand full comment

I was just under the impression that RCTs are the gold standard and since Scott said that we are about as sure of this as anything in biology, I assumed that meant multiple large, long term RCTs.

But just asked because he said dozen of studies exist, so I was curious if anyone could expand on that a bit.

Expand full comment

Kenny Easwaran posted a link to a pretty good meta-analysis.

Expand full comment

There are few medical questions about which we are so sure as about this. I remember an interview about recommendation for funding agencies. The recommendation explicitly mentioned that no more money should be spent on the vaccine-autism question because this was one of the few questions that were answered beyond any reasonable doubts.

The thing is that at some point it seemed like a realistic claim that vaccines caused autism (because some fraud paper made up data), so this was studied really intensively for several decades. It is one of the most researched questions in medicine, and the results were super-clear after the fraud papers had been retracted.

Expand full comment

I would absolutely not say “as sure as we are about anything in biology”. Nothing based on a few dozen studies of tens of thousands of individuals could make us as confident about an effect size being zero as we are about the general truth of evolution, or that humans have eyes or whatever.

But there are good studies: https://kettlemag.co.uk/wp-content/uploads/2016/01/meta-analysis_vaccin_autism_2014.pdf

Reading that meta-analysis, they report the “odds ratio” of autism among vaccinated and unvaccinated children (that is, the odds of autism with vaccination (number with autism divided by number without) divided by the odds without vaccination (number with aitism divided by number without)) as “OR”. Almost all the studies had a point estimate of OR that was less than 1, and all of them had confidence intervals that included 1, except for a couple studies where the entire confidence interval was below 1.

Expand full comment
Mar 21·edited Mar 21

I don't have any objections to phrasing things as "x% likely", and I do this colloquially sometimes, and I know lots of people who take this very seriously and get all the quoted benefits from it, and my constant thought when asked to actually do it myself or to take any suggested "low probability" numbers at face value is "oh God", because I'm normed on genetic disorders.

Any given disorder is an event so low-probability that people use their at-birth prevalences as meme numbers when mentioning things they think are vanishingly unlikely ("Oh, there's no *way* that could happen, it's, like, a 0.1% chance"). It turns out "1 in 1000" chances happen! When I look at probability estimates as percentages, I think of them in terms of "number of universes diverging from this point for this thing to likely happen/not happen". Say a 5% chance, so p=1–(0.95)^n. The 50%-likelihood-this-happens-in-one-universe marker is about p=1–(0.95)^14, which is actually a little above 50%. So 14 paths from this, it'll "just above half the time" happen in one of those paths? ~64% likelihood in 20 paths, ~90% in 45 paths (often more than once, in 45 paths). The probabilities people intuitively ascribe to "5%" feel noticeably lower than this.

"There's a 1 in 100,000 chance vaccines cause autism"? I've known pretty well, going about my daily life, absolutely unselected context, not any sort of reason to overrepresent medically unusual people, someone with a disorder *way* rarer than 1/100k! Probability estimates go much lower than people tend to intuit them. We think about "very unlikely" and say numbers that are more likely than people's "very unlikely" tends to be, when you push them on it, or when you watch how they react to those numbers (people estimating single-digit-percentage chances of a nuclear exchange this year don't seem to think of it as that likely).

Expand full comment

I would never get on a plane that had a 1 in 1,000 chance of crashing, and wouldn’t like a 1 in 100,000 chance either! But I think for a general claim like “vaccines cause autism”, it’s very hard to reasonably come to probabilities lower than those, while it’s much easier for single cases like “this plane will crash”, when we can repeat the trial hundreds of thousands of times a day around the world.

Expand full comment
Mar 21·edited Mar 21

1/100k is definitely in the sphere where "how many universes does it take?" is "...a lot", and has been not-very-distinguishably "a lot" for some time. I pick at the particular vaccines-autism (or monkey blood-autism, either work, monkey blood might actually be a better example) because I'm familiar with the background of the particular claim/its context in the sphere of what-causes-X-disorder discussions, and agree with Scott on every point he makes regarding how we've demonstrated how unlikely it is, and instantly think "what, you think it's *that* likely?!" when it's phrased as "1 in 100k chance". I know the number is rhetorical, and that the real intuitive number is probably way lower than "the chance you have two different (non-identical-twin) consecutive kids with sex chromosome aneuploidies", or "the chance you have a kid with Sanfilippo syndrome", or even the way, way lower "the chance you have a kid with Lesch-Nyhan syndrome" (condition these all on the necessary background to be having a kid with them at all, and being able to split out the number of universes). That's precisely what makes it stand out, because it demonstrates a break between "this is not something that could realistically be true" and the numbers people give for that.

Expand full comment

Are you really more confident than 99,999/100,000 that vaccination doesn’t cause a change in frequency of autism on the order of 1 in a billion?

I’m quite confident that vaccines don’t increase the frequency of autism by 1 in 1,000 but no study ever done has been sensitive enough to detect 1 in a billion shifts.

Expand full comment
Mar 21·edited Mar 21

I don't see any realistic way we live in the universe where vaccines are a cause of autism. I don't put a probability estimate on it because *probability estimates in that range aren't reliable*. It might be out in the Lesch-Nyhan sphere (too rare to estimate prevalence at birth or otherwise; there is one person in New Zealand, population 5 million, who has Lesch-Nyhan syndrome). It might be further than that, Andromeda-by-2050. These aren't ranges in which people can think well.

Definitions of "vaccines", "cause", and "autism" that don't involve vaccines causing autism are much more likely. That is to say: some kids have outlier-bad vaccine reactions, in the same way some kids have outlier-bad reactions to any medical procedure. (Anything that can cause fever *could* cause seizures; anything that can cause seizures *could* cause severe brain damage.) Current diagnostic practice dramatically overdiagnoses autism in severely disabled people. My best probability-estimate for "do we live in a world where someone, of every child born in the DSM-IV or DSM-5 era, has had an outlier-bad vaccine reaction causing severe brain damage, and the resulting severe disabilities have been misdiagnosed as autism?" is "it's reasonably likely -- moreso than not -- that we live in a universe where this series of events has occurred at least once, and not-shocking if we live in one where it's happened several times, but it's inherently limited because those reactions are so unlikely as is, so it'd be much more surprising if it makes up any noticeable percentage of the diagnosed population, and we have signs of not living in this universe (like how replications of Wakefield went)".

I hedge my bets for this more than for other causes of severe brain damage misdiagnosed as autism, because paradoxically the level of attention paid to vaccines-autism has probably made that particular misdiagnosis less likely than "misdiagnosis following a TBI" or "misdiagnosis in Down syndrome" or every other common misdiagnosis. This is a very different thing to "do vaccines cause people to become autistic", which occupies the "no number I can put on this properly encapsulates it, it's Andromeda-by-2050" sphere.

Expand full comment

There really isn't any possibility of opting out. Even if you reject using probabilities to reason about the future, you have to act as if you did. Clearly assuming a 50/50 chance on any outcome would not be particularly effective, and is not the way anyone actually thinks. I guess I can understand why some people might want to keep their probabilities implicit. Personally, I think reasoning explicitly in % terms helps me notice when some part of my world model is wrong.

Expand full comment

Agreed. Nobody likes looking dumb by being wrong. Some of us value accuracy in our world model enough that we're willing to look dumb if it'll help us be less wrong.

Expand full comment

"You don't use science to show that you're right, you use science to become right."

https://xkcd.com/701/

Expand full comment

This is one of the posts that gave me more sympathy for the opposing view than I had coming in. Now I think only superforecasters should be allowed to use percent-scale probabilities, and all the rest of us mooks shouldn't risk implying that we have the combination of information plus calibration necessary to make such statements.

Expand full comment

Sorry if the following has already been said multiple times (it is an obvious observation, so it has probably been mentioned before):

Regarding item (3.) of the text: It is nonsensical, inconsistent and arbitrary to assign 50% probability in a situation where you have zero information and make up two outcomes.

Assume you have an object of unkonwn color. Then: 50% it is red, 50% it is not? But also: 50% it is green, 50% it is purple, 50% it is mauve; 50% it is reddish?

Assume you find in an alien spaceship a book in an unknown language that for some reason uses latin letters. What is its first sentence? 50% that it is "Hello earthling"? 50% for "Kjcwkhgq zqwgrzq blorg"?

What is the probability that all blx are blorgs? 50% of course! What is the probability that *some* blx are blorg? 50%! What is the probability that there is a blx? 50% That there are more than 10? 50% Less than 7? 50% More than a million? 50% More than 10^1000? 50%

Expand full comment

The scenarios you describe do not have "zero knowledge."

Expand full comment

I have zero knowledge about blx and blorg

Expand full comment

I intuit that you imply *some* information about blx and blorg, for example that they behave at least vaguely similar to other things in our experience. But if not, then straightforwardly yes, the answer to each of those yes/no questions is 50%

To prove this, run the following experiment. Randomly chose a whole bunch of nouns from the dictionary. (This will introduce some bias because, e.g., dictionaries don't list a noun for all possible things that do and do not exist, but I bet this will be close enough.). Then generate every possible pairing between those nouns and answer the yes/no questions you listed for blx and blorg. You'll get yes about 50% of the time. (Again, with skews driven by the bias in the source of nouns.)

If this experiment doesn't map to your definition of "probability," then you're making a metaphysical argument as addressed in the post.

Expand full comment

Yup. There's an almost evolutionary argument baked-in behind every real existing word, that it has won a competition to stay in use as part of a living language and therefore serves some purpose in human communication. There's no word for "not a dark cyan golf ball" because that's not a descriptor that usefully divides reality.

If blx and blorg are one-syllable words.... they're probably not *that* esoteric. In the context of alien spaceship libraries, at least.

(Or in metaphysics of language!)

Expand full comment

This is testable! So let's test it.

I have a list of 6801 nouns, which I sample twice at random, 12 times.

I get the following sentences:

repair is a sill

everyone is a cloakroom

humanity is a cheek

beaver is a sailboat

cape is a communicant

insulation is a period

shelter is a bellows

unity is a specialist

engineering is a terrapin

shack is a lesson

suitcase is a carving

larch is a obsidian

Guess you're wrong!

Expand full comment

Agree. I'd give very low odds to grop being grlik, because grlik is a category and I don't know how many categories there are. If I think your question is random, my odds will be near zero. If I think your question is a genuine, interested one and that you think grop MIGHT VERY WELL be grlik, I'd put the odds at maybe 5%. Never at 50%, unless you tell me there are only two categories.

So it ultimately depends on how well I read the questioner, and how do I know how well I have read the questioner? All we really know for sure is that everyone is a cloakroom.

Expand full comment

Yeah, fair enough. The structure of the disagreement is captured in how the questions are phrased. "Is a blx a blorg?" tells you that blx and blorg are different labels, which is relevant information. This makes it a terrible question to illustrate how a flat priors are accurate with zero relevant information.

The underlying crux isn't whether flat priors are correct, it's whether you can possibly ask a question that conveys zero relevant information. I agree that you can't ask a question that conveys zero information, but if you're careful you can avoid disclosing relevant information, and I think you'll agree for those questions that flat priors are correct.

Example: does this blx have more blorg than the median blx?

Expand full comment

There is no such thing as “zero knowledge” in the sense Scott wants.

Expand full comment

You have nonzero information about "is an instance of" classes of questions, which is why you knew to construct this particular example in the first place. Same deal with language. If you didn't have that information, you couldn't have written this objection!

50% only applies if you have zero information about a binary proposition (A or B) - but notice that whether or not a proposition is binary is itself information. Colors are an excellent example of this; you know there are more than two colors. You have information!

You used this information to construct these examples. You used your knowledge that there are more than two colors to construct the color examples. You used your knowledge that there are lots of possible sentences to construct the unknown language example. You used your knowledge about "is an instance of" questions to construct the blx/blorg instance-of questions (most instance-of questions have an answer of "no", as demonstrated by Sam Atman below). You used mathematical knowledge to construct the numeric examples (you know there are more than two numbers).

To see whether a probability is genuinely 50%, see how you feel about randomizing the "is" versus "is not" formulation. So, "What is the probability that all blx are blorgs" becomes a 50/50 shot between "What is the probability that all blx are blorgs" and "What is the probability that not all blx are blorgs". This is what zero information -actually looks like-; an indifference to the way the question is framed. If you have a preference for a particular framing, or object to the randomization described above as destroying information (after all, it forces a 50/50 likelihood on the resultant distribution), then you don't actually have zero information about the question, as revealed by your preference for a particular formulation. Zero information means zero information - we are indifferent to the difference between "A" and "Not A" for the given question A.

"No information" is actually an incredibly hard standard to meet, and is mostly a mathematical backstop for certain kinds of probabilistic questions.

Expand full comment

My comment was, as stated, an objection to the following part of Scott's text:

"Consider some object or process which might or might not be a coin - perhaps it’s a dice, or a roulette wheel, or a US presidential election. We divide its outcomes into two possible bins - evens vs. odds, reds vs. blacks, Democrats vs. Republicans - one of which I have arbitrarily designated “heads” and the other “tails” (you don’t get to know which side is which). It may or may not be fair. What’s the probability it comes out heads?

The answer [...] is exactly [...] - 50% -"

I think that this is nonsense. Just because you can formulate a binary choice and have "no" information about it (in my examples I certainly did not have more information about it than in Scott's?) it makes no sense to assign the numerical value 50%.

Expand full comment

"you don't get to know which side is which" is a key part of that, and is equivalent to randomizing between "is" and "is not".

Let's frame one of your questions in those terms:

You have an object of unknown color, what are the odds that it (is/is not) red?

You don't get to know in advance whether parenthetical in the question is "is" or whether it is "is not".

So what are the odds that the answer to the undefined question is "Yes"?

[Edited for clarity]

Expand full comment

"You have an object of unknown color, what are the odds that it (is/is not) red?"

is that relevant for anything? if you have a red object, then the odd that it (is/is not) red is also 50% (assuming you select the "is" with 50%). this is true, but seems entirely pointless?

Expand full comment

If it wasn't pointless, it would contain informational content, and thus would violate our "no information" rule.

Expand full comment

OK, some commenters point out that I do not have "zero knowledge". let me stress that I refer to Scott's item (3.); i.e., "as little knowledge as Scott thinks warrants the assignment of 50%"

If "no knowledge" is defined by "it has 50% probability" in whatever system of probability you use, then sure, it makes sense to assign 50% in this case.

So the claim that I find objectionable is: If you have "very little" information (in some practical everyday sense) about a binary outcome, it makes sense to assign 50% probability to each.

Let us assume (to make my objection clearer) that Scott claims that with four possible outcomes and hardly any information we should assign 25%. So let us ask: What percentage of the world population has tapeworms?

(a1) < 2%

(a2) 2% -4%

(a2) 4%-10%

(a3) >10%

I have no clue whatsoever, so I assign 25% to each outcome. But when I ask:

(b1) < 2%

(b2) > 2%

then I get 50% for the same event (<2%) instead of 25%.

If you only use binary outcomes, the effect is basically the same, but a bit less clear to see (Q1: <2% Q2: <4% Q4: 2%-4% Q5:>4% etc)

In this sense it does not seem to be very consistent/useful/... to assign a numerical value using such a general rule.

Of course this example is not deep or interesting, and one obvious possible "solution" is that I could think of a model/distribution (even given the very little information I have) and then I read off all answers from this model. But in my understanding that is not what scott wants, he propagates "model free probability".

Expand full comment

Given the information you have available, the actual "very little information" answer to the first question is:

a.) 2%

b) 2%

c) 6%

d) 90%

And the answer to the second question is:

a) 2%

b) 98%

Because what you're actually asking is, what are the odds that a number is some particular number in the range of 0-100. You have, for a given level of significant figures, 100 (technically 101, but whatever) possible answers, which the structure which you've built into your "answers" is obscuring.

However, if you reframe your second question as:

(b1) [Either <2% or >2%, you won't know in advance]

(b2) [Either >2% or <2%, you won't know in advance]

Then, having eliminated a source of information about your answers, you're back to 50% odds.

Expand full comment

Note! "Very little information" can bring us to some crazy and unintuitive places, because it can skew probabilities in crazy ways based on what information is and is not included!

Like, what are the odds that Donald Trump will win the US presidential election, given the following information:

Every citizen of the US is eligible to run for president

Every citizen of the US is eligible to vote for president

There are 300,000,000 citizens in the US

Whoever gets the most votes wins the election.

The naive "very little information" approach might say "50%", but, notice, I didn't give you the information that Donald Trump is one of two people likely to win their party primaries. The actual pool of possibilities, for who is elected president, includes *all three hundred million citizens*. Donald Trump's chances of winning, in this "very little information" scenario, is approximately 0%.

However, once you establish that the effective choice is between Joe Biden and Donald Trump, you've eliminated 299,999,998 possibilities, and you're back to 50%. Assuming I constructed everything correctly above, anyways.

Expand full comment

"If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%."

As you have anticipated, I object to this for two reasons. Firstly, as a matter of theory, the assignment of priors is arbitrary. The agent need only choose some set of positive numbers which sum to 1. There is nothing to say that one choice is better than another. In particular, there is no reason to think that all outcomes are equally likely and attempting to assign priors based on this rule (the indifference principle) leads to well known paradoxes.

Secondly, as a matter of practice, if our fully uninformed agent assigns a probability of 50% to the world being destroyed by AI, it should also assign the same probability to the world being destroyed by a nuclear war, a pandemic, an asteroid strike, awakening an ancient diety, an accident involving avocado toast, and so on. But the world can only be destroyed once, so these probabilities are inconsistent.

Expand full comment

Your objection either adds knowledge (30% chance) or posits a metaphysical objection (10%) chance, or both (60% chance.) The article answers these.

Expand full comment

If this is a metaphysical objection, it is a different metaphysical objection to the one addressed in the post. In the discussion, the (hypothetical) 17% probability assigned by Samotsvety is being assigned from a state of knowledge and not of ignorance. Also, I am not saying that probabilities assigned from a state of ignorance are meaningless or somehow not true probablities, but merely that they are arbitrary (subject to the important constraint of summing to 1 over all possible outcomes).

The use of flat priors in a state of ignorance has a strong intuitive appeal, but it simply does not work, essentially because it gives contradictory answers depending on how you choose to divide the outcome space.

On the other point, I admit that if you simply do not know what any of the words mean in the statement "An AI will destroy the world", then you should assign it 50% probability, because you are being asked whether a random proposition is true (which must be 50% by symmetry). But in that case, you haven't meaingfully made a prediction as how likely it is that AI will destroy the world. Provided that you do know what the world is and what it means to destroy something, even if you don't know what an AI is, you shouldn't be assigning a probability of 50%, because in effect you've been asked how likely the world is to be destroyed by some unknown thing. But (even without knowing anything about how often the world is destroyed), it is no more likely to be destroyed by that thing than any other random thing.

Expand full comment

Ok, progress.

1. Flat priors adjust to how you divide the outcome space, so there's no contradiction there.

2. If you don't know what an AI is, you still know what "destroy the world" means. Even if you don't know how often the world is destroyed, that's still a lot of information with which to inform priors.

Expand full comment

Doesn't that then render the initial statement invalid? Since any statement about actual concepts renders information, there is no such thing as a statement from ignorance at all.

Expand full comment

I dunno, I can think of plenty of ways to ask a yes/no question about which you have no knowledge on which to build priors. There's the usual approach of just you don't even understand the question, but you can easily build more substantive examples, e.g. "is this math paper more or less impactful than the median math paper?" I don't think I know enough about math or about the quality of math papers to answer that. So, unmoved by relevant information, my prior of 50% for a yes/no question gets no update.

Expand full comment

But that's not ignorance, is it? You know that math papers have impact, that this impact can be quantified and averaged, and therefore you can make a prediction... and that prediction isn't even 50% (since the number can be AT the median!).

You cam craft a question so the answer is 50-50, but that choice comes from knowledge, not ignorance.

Expand full comment
Mar 21·edited Mar 21

Agreeing with madasario, my understanding is that truly having no prior knowledge about something requires essentially not even understanding anything at all about it. Imagine that someone asked you: "What do you think is the probability of flarnap-majogam?" Since you don't know what flarnap-majogam is, you can't be certain that there even are any alternatives to it. Maybe it means 2+2 = 4, maybe it means the sun rising tomorrow, maybe it means everything spontaneously ceasing to exist in the next five minutes.

Expand full comment

The article really doesn’t give any justification for the existence of objectively correct priors. It just asserts them. It also doesn’t justify the implicit claim that the Samptsvety probabilities are the only ones that do as well as theirs given the information they have. Take the top five performers on whatever superforecasting test you like. They will all have performance that is quite good, and very close to each other. But they won’t report the same numbers (though they will usually be fairly close). There’s no argument given in the post that they couldn’t all be precisely equally best despite giving numbers. It just says you or I would do better by adopting the numbers of these better forecasters, not that one of these forecasters is right and the others wrong.

Expand full comment

I'm not sure I understand your point. You may be confusing priors with posteriors - priors are what you update when you get new information, posteriors are what you've updated your priors to after integrating that information. Scott asks, "what is Samptsvety better at than you?" Can you answer this?

Expand full comment

Priors and posteriors have exactly the same issue. Posteriors are just what you get when you add evidence to the prior. You can't get uniquely best posteriors unless you have uniquely best priors.

Samotsvety are better than me at practically making certain kinds of predictions in ways that are relatively effective for certain kinds of actions. Other people are as well. The only way someone could be perfect is if they gave all 0's for things that don't happen and 1's for things that do happen - if they do anything else, then it is logically possible for someone to do better. Just because we haven't found anyone who reliably does better than some particular predictor doesn't mean there couldn't be someone who reliably does better - and it is quite possible to find someone who makes somewhat different predictions but is overall reliably just as good.

Expand full comment

Maybe I'm missing something, but there seems to be an easier way to communicate one's level of certainty without inappropriately invoking probabilities. If instead of saying "There is a 50% probability of X happening", you say "I'm 50% sure X will happen," I don't think anyone would object.

Expand full comment

Exactly. Giving a probability of an event like that is obfuscating the fact that it’s a description of your confidence.

Expand full comment

Only if you interpret “probability” in a specific way. “Probability” is best understood as a technical term in mathematics like “geodesic”. There are several real world phenomena that are correctly described by probabilities (including frequencies, reasonable confidences, and objective single-case chances) just as there are several real world phenomena correctly described as geodesics (the path of light in relativistic soacetime, great circles on the surface of the earth, straight lines in Euclidean space).

Expand full comment

I dunno, what else could that possibly mean? "There is 50% chance of X" only communicates my belief about the chance of X the same way that "my hat is blue" only communicates my belief about the color of my hat.

Expand full comment

“There is a 50% chance of X” communicates that I am highly confident that X is chancy, and that its chance is close to 50%. That is quite different from “I am 50% confident that the 200th digit of pi is odd” - I think that outcome isn’t chance at all, but I have 50% confidence on one side and 50% confidence on the other.

Expand full comment

This is the metaphysical objection addressed in the article.

Expand full comment

This is not an objection. This is a distinction between two claims. Both are perfectly meaningful, and sometimes we care about the distinction between them (but not always).

Expand full comment

That's... what a metaphysical objection looks like? Anyway if you insist on distinguishing between frequentist and bayesian probabilities, fine. Do what the article says and call them bayesian shmrobabilities.

Expand full comment

When a person says “X is true”, there is an implicit claim that lots of people hear, namely, “and any disagreement is incorrect.” Two different persons likely will have different probability for something like “AI will kill us all.” Disagreement is expected here because we all have different priors and can’t measure the outcome here.

Saying, “I think” before you make a likely-to-contested claim is a form of acknowledging that you might be wrong. It’s a kind of intellectual courtesy, like saying please or thank you. If you make a likely-to-be-contested claim without acknowledging its controversial nature, you’re going to antagonize people who disagree, because you’re implying that you will interpret their disagreement as being a form of error rather than an invitation to exchanging experiences.

Now, perhaps you intend to do that, but I’m taking it as a given that this isn’t your goal.

Expand full comment

When you say "and any disagreement is incorrect," you can only possibly mean that you believe any disagreement is incorrect.

I agree with your point about antagonizing people, this is often an important consideration in how I phrase my beliefs. But again, it seems like you really want to talk about human psychology, while I (and I believe this article) want to talk about math. I'm sure there are plenty of other folks who would love to talk about psychology with you!

Expand full comment

But if it’s not a probability, you would have no basis for applying probability theory to the prediction? Eg it would not make sense to adjust priors using Bayes theorem.

Expand full comment

If I am understanding you correctly, yes, I agree - I don't think it makes sense to adjust priors in that way to what are essentially expressions of the degree of confidence in a given prediction.

You may informally adjust priors based on the credibility/knowledgeability of the person making the prediction, but it would (should) not be rigidly applied as one would apply to a calculation based on frequentist probability determined by data from empirical observation.

Expand full comment

It makes sense to adjust priors using Bayes theorem if “priors” are the things you use to govern your actions and you want to avoid guaranteed losses. Mathematicians will say that makes these numbers probabilities, because they satisfy the axioms. It’s not because these are probabilities that they satisfy the axioms - it’s because they satisfy the axioms that they are probabilities.

Expand full comment

Yep, exactly what I was going to say. There's multiple meanings to the word "probability" and it's easy to forget which one is really being deployed. In most of Scott's examples above, he is using the "confidence" meaning. What tends to confuse people at times is that there is also the more mathematical definition, which, interestingly, Scott starts with.

Expand full comment

The actual mathematical definition is just axiomatic (it’s an assignment to statements of numbers that are non-negative and add up to 1 and take the value 0 for any statement that is impossible). Frequencies satisfy it, chances satisfy it, confidences do better when they satisfy it. Some people in the early 20th century argues that frequencies are the only real probabilities, because chances are metaphysical and confidences are subjective, but that was because they were in the weird grip of a philosophical view that denied that metaphysics or subjective things could be real.

Expand full comment
Mar 21·edited Mar 21

Why.

It might be shocking to learn, but an Anti-Rationalist like myself will point out as evidence for a Frequentist position that saying "I think X is more likely than Y" is merely a rhetorical phrase that could mean any number of things, and one of the things that it is on average (oh look, some a rhetorical use of the phrase "on average") very unlikely to mean is "I have calculated the probability of X and Y." Probability as a concept is not ontologically interchangeable with the manners of speech that may signify it. Rhetorically mentioning probability is not evidence that probability was used somewhere, or happened, or that the words are a proxy for a probability.

We see here Scott, like with most things, doesn't understand what probability is. But there are other interpretations of what's going on with probability, so sure, we could say he just has a Bayesian interpretation. Except Scott also seemingly doesn't understand what the argument is about, because his arguments don't align with classical bayesian probability either. They DO align with Rationalism. As usual, we find that Rationalists simply take it for granted that their extremely bizarre assumptions about things are correct, and we get weird arguments like, "If you give argument X, obviously you need to read what you're criticising, because I can't possibly be fundamentally missing the point of your argument." It is simply a tenant of Rationalism that Bayesian probability is not merely an interpretation of probability, it's a fundamental aspect of all human reasoning. Or as Scott says here, "A probability is the output of a reasoning process."

Sure, in the sense that mathematics is a reasoning process, and a percentage chance is a probability. But the very fact that "Maybe 1%?" means "I don't know, as of the evidence I have off the top of my head it seems unlikely", is to show that there is no probabilities happening here, non-Frequentist or otherwise. I could summarise I suppose by saying there's equivocation happening between Bayesian probability, which Scott is pretending to defend, and Bayesian Epistemology, which is what he's actually trying to defend.

Which brings about the bigger problem. This post a motte and bailey. "Oh, when I say 20% I don't REALLY mean a 20% chance, I just mean it might happen but I think it might happen less than that guy who said 30%." If that was the case, I would never see attempted mathematical exegeses about AI risk. Unfortunately you see this all the time, including with Scott, and of course with things that aren't AI risk. The real criticism here is that the mathematical language is merely a patina meant to mask the fact that the arguments are fundamentally badly reasoned and rest on faulty assumptions. Accusations that the social media-ites need to just read the papers or trust the science assumes they didn't or don't. Instead one could assume Scott is missing the point completely.

Expand full comment

Are you a mathematician, or do you have evidence that mathematicians disagree with Scott on probability?

Expand full comment

Is this really the extent of your rebuttal? If I said yes, I am a mathematician, would you back off and say, "woah I guess I was wrong"? What if I said I was an actuary? What if I was a mathematician, but I was a topologist who specialised in Knot Theory?

Taking the topic here, views of probability, seriously, this isn't even a subject for mathematicians. This is a subject for philosophers, the same way Scientists don't know anything about Philosophy of Science. Mathematicians, like physicists, are generally told to shut up and calculate. And the only introspection they'll have is if they are a Nominalist or a Platonist in regards to the existence of numbers. Though I've also seen the "shut up and calculate" position stringently defended by mathematicians as well.

Expand full comment

I find this an incredibly bizarre objection. Of course mathematicians are interested in the philosophy of math, and scientists in the philosophy of science?! Theoretical math research has very little to do with shutting up and calculating and often encounters the kinds of proof that are possible and the sorts of axioms we take. Every mathematician I know is deeply interested in the philosophy and history of math, at least for how it informs their pedagogy in the classroom if not their direct research. Similarly, most scientists (or the good ones at least) grapple with their own epistemics and their implementation of the scientific method, to consider what their experiments actually tell them about the world.

I don’t understand why we have to retreat to the realm of the philosophical when Scott clearly outlines the material evidence these numbers have meaning. It can be and has been demonstrated that when a well calibrated forecaster offers a figure if 20%, it doesn’t mean “some ambiguous chance relatively less than 30%” it means on average, that event will happen 20% of the time. You will make more money if you act as if this is true, you will see more success as a leader, or we can just count on our fingers how often these things happen, and you will do all of these things better if you treat it as a sincere 20% than say, 25% because it’s just less than 30% and therefore ambiguous.

Expand full comment

I have literally met more than one Mathematician in person who will stringently tell me they do not care about Philosophy of Math, and I have been yelled at for bringing it up. One of these people, one who did not yell at me but did tell me they did not care about this, I am still friends with.

I'm sorry, Scott clearly outlining "the material evidence" is already retreating "to the realms of the philosophical." The material evidence? For numbers? I am a mathematical Platonist. Numbers aren't material things. But that's clearly too pedantic, and isn't the real argument. The real point is that your assumptions are assumptions. I can't help it that you're blind to them.

I really really don't care what your Seers say or how accurate you think they are. I am confident in my Seers as well. But at least I call them mystics. But as you are unlikely to understand this critique, let me try a different tactic.

I am a Frequentist on probability. Whatever superforcasters are doing, it's not "probability", because "probability" is the long-run frequency of outcomes from a large series of similar trials. A future event is a trial consisting of either 0 or 1. Either way you're not repeating that event again. There is no experiment happening, not even hypothetically, because it is in principal impossible to rewind time and perform, say, an election, or the start of a war, multiple times (uh oh, we delve into the realms of the philosophical again!). And if the question really is a question of probability interpretation, I'm clearly not going to accept a Bayesian account. Pointing to the supposed accuracy of superforcasters and saying "ahah, that proves it!" is what is called "begging the question."

Retreating to hypothetical monetary gains, or practical effectiveness is not the issue and is not convincing, as I am not an Instrumentalist, nor do I think multiple people betting on a single outcome somehow constitutes multiple trials. Were it completely true that I'd make money taking your Seers at face value, that would not speak of the question of what probability is and how you should use it in your blog posts.

I will also point out that Scott himself tried to argue both sides of this point, on one hand highlighting the rhetorical senses of "probability", and on the other, trying to talk about how affective his Seers are in reality. I read the blog post, after all.

Finally, to try to head off this likely argument off, again, I read the blog post. I am aware of Scott's "shmrobability" objection. It's really silly, because again, whether or not these numbers behave as probability is the question.

Expand full comment

> it is in principal impossible to rewind time and perform, say, an election, or the start of a war, multiple times

Reproducing a global set of circumstances is impossible,. Reproducing a local set -- ie., initial conditions in a laboratory -- would be good enough , so long as locality holds. In physics, probability is related to determinism is related to locality.

Expand full comment

> Of course mathematicians are interested in the philosophy of math, and scientists in the philosophy of science?!

Unfortunately, "interested in" doesn't mean "skilled in". Rationalists have a further problem of trying to get there philosophy-of-maths and philosophy-of-science from scientists and mathematicians who have never specialised in the philosophy-of bit. (They also tend to judge authority by how close academics are to the rat-sphere -- whether they have ever interacted with rationalists. Tyler Cowan is THE economist, Scott Aaronson is THE physicist, etc.).

Expand full comment

It's relevant as it would constitute evidence that Scott is using terms differently from mathematicians, who defined probability theory.

Expand full comment
Mar 22·edited Mar 22

Did they? Can you name me the mathematicians who defined probability theory? Was it Cardano, Fermat, or Pascal who did this or somebody else? Perhaps you're thinking of Kolmogorov, a full 200 years after probability existed as a named subject? Do all these mathematicians all agree with each other? Of course the answer is no. I find this a really strange semantic avenue to go down. Really, why do you think I'm going to change my mind because some guy somewhere said something different?

My contention is that Scott is not even talking about probability, he's so off the mark. He's talking about Bayesian Epistemology and doesn't realise it, if he's talking about anything at all beyond "Bad Arguments For Why I Am Always Right About My Assumptions, Blog Post 20149." If a student sits down in math class and they learn about probability, the teacher would be a very poor one if he accepted "I don’t know, seems unlikely, maybe 1%," as a top-of-the-head answer, because "Sometimes this reasoning process is very weak."

Expand full comment

Multiple mathematicians defined probability theory. And if they spoke of it differently, then Scott would hardly seem to be out of line with them by having his own way of discussing it.

If an actual probability teacher told Scott he was that off the mark, that would be evidence. But you aren't claiming to be that, and haven't provided evidence that such a person would react that way.

Expand full comment

So we're going with the reddit "source?! Sooource?!?!" meme?

Pass.

Arguments are evidence. Telling me I'm wrong because I refuse to accept your arbitrary demand for very specific yet specious "evidence" is a really pathetic argument. So you know, just don't believe me.

Expand full comment

No, the way I use language, terms like "slightly unlikely," "very unlikely", etc. don't translate well into numbers. Medical personnel are always asking patients to rank pain on a 1-10 scale. I know what they mean, but my mind rebels against doing this. It sounds way too exact. "Not the worst pain I've ever had, but deep and persistent," is how I described one, and I just don't know how to put that as a number.

Expand full comment

I can attest that drinking monkey blood does not cause autism. Don't ask

Expand full comment

N of 1 is insufficient.

Expand full comment

I can only assume N >> 1.

Expand full comment

You don't know that they were the only one doing it.

Expand full comment

<mild snark>

RCT! RCT! RCT! :-)

</mild snark>

Expand full comment

"But with humanity landing on Mars, aren’t you just making this number up?" No Scott, because I calibrate.

Expand full comment

Usually on the Internet, a debate about X isn't really about X. When someone argues that a bayesian probability is invalid, they're usually just lawyering for their position. More "screw the billionaires and their space hobby," less thoughtful attempt to coverage on accurate world models.

Source: I have a lot of success changing gears in these conversations by a) switching the topic to their true objection, and b) expressing their opinion as a bayesian probability. So when I hear "there's no way you can know the probability of getting to Mars," I might respond with "I mean, sure, there's like a 95% chance that musk is a jerk who deserves to live paycheck to paycheck like the rest of us."

Well. By "success" I mean watching them turn red when, after they respond, I point out their inconsistent treatment of bayesian probability: immediate recognition and scathing dismissal when it goes against their position, routine acceptance when it supports their position. This allows a "successful" disengagement when they respond and I say "you don't have a strong opinion on Mars, you just don't like musk. That's ok, but look - that person over there might want to talk about musk. I'm going to look for someone who wants to talk about Mars. Cheers!"

One day I'll do this to someone who responds with "see, there you go again, that 95% number doesn't mean anything" and the rest of the conversation will be either wonderful or terrible...

Expand full comment

I am 95% confident this is one of the top 10 comments on the internet today.

I'm joking. I'm actually 100% confident it is the #1 comment on the internet today. This comment makes me sad and LOL simultaneously.

Expand full comment

There’s a value claim being made here: that there is symmetric risk between false confidence and false humility.

I don’t know about you, but my experience is that false confidence is far more destructive than false humility.

Expand full comment

Only if you don't account for the loss of counterfactual gains of which false humility deprives us.

Expand full comment

There are also losses due to false confidence. How can you know which set of losses is greater? This is a value claim, not a fact claim.

I think people being more confident in what they personally can accomplish would be great. But that seems to be in direct contradiction to trying to have well calibrated probabilities on events beyond your ability to influence or directly experience. You only have so much attention, and most uses of it are total wastes. Caring too much about events outside of your control, in order to see some subset of the world more accurately , is less useful to everyone than getting in there, getting your hands dirty, taking risks, and failing.

Expand full comment

I can't tell if you agree or disagree.

Expand full comment

I think false confidence that nuclear power will kill us has seriously held us back as a civilization. “The probability that AI will kill us” sounds to me suspiciously like “the probability that nuclear power will kill us.”

False confidence that centralized experts can make decisions better than organic processes have seriously held us back as a civilization.

As individuals, I think most people have too much confidence in their own wisdom, which has the consequence of making them have too little confidence in their capacity to act. Humans are really good at self deception. Until a person really gasps their capacity for self deception and understands it’s unavoidable, what happens is they confidently stay in the little boxes where they are unhappy but at least not frequently encountering evidence they are wrong. My experience is that the more someone watches the news or cares about politics, the more confident they sound about things they can’t influence, and the less confident they are in their own ability to grow and develop and thrive.

So one kind of confidence (ie people being confident in their ability to act and grow and thrive) is valuable and I want more of it, but I think that’s constrained by too much confidence that “my beliefs are correct and it’s only my rationality that makes me sad and anxious”

Expand full comment

Ah, ok. You're talking about human psychology, I'm talking about math. No wonder I couldn't tell whether we agreed.

I differ from what I think you describe, but only in some ways. In any case it's a conversation for another time.

Expand full comment

On the other hand ...

I often find, when, say, ordering something by phone, or otherwise asking for a service to be performed that will take some time, an absolute refusal to give a time estimate. Apparently they're afraid to give a number for fear it will be taken as a guarantee.

I treat this by asking about absurdly short and long periods. "Will it take ten minutes?" No, not that quick. "Will it take two years?" No, not that long. "OK, then, we have a range. Let's see if we can narrow it down a bit more." This is frequently successful, and I wind up with something like, for instance, "about 3 or 4 weeks." That's helpful.

Expand full comment

I do the same thing, but I find it linguistically easier to do "one of X" where X goes up quickly.

"Will it take 1 hour? 1 day? 1 week? 1 month?..." Usually I pick 3 or 4 timeframes, the first meant to be absurdly short, the last absurdly long. Then I don't have to iterate in the conversation. I find they sort of get the idea there.

As you say, you do need to reassure them you're not looking for a guarantee, just a way to plan your life with more precision than "sometime in the next 10 years, I might get my dinner delivered."

Expand full comment

I've had much the same experience - oddly, with both people and LLMs.

When the nurse attending my late wife had mentioned the possibility that she might lose the ability to swallow, I had a very hard time getting her to tell me the odds. IIRC, what finally worked was asking: Is this a 1-in-10 chance? a 50-50 chance? a 9-in-10 chance? analogous to your "one of X"

With LLMs (most frequently ChatGPT), I've found that they will often (75% of the time?) answer a numerical question with weasel words rather than a numerical answer. Though in that case usually forcefully and repeatedly asking for a number usually gives me one (which may be hallucinated, of course).

Expand full comment

Calibration troubles me because probabilities presuppose some sort of regularity to the underlying phenomena, but predictions may have nothing in common at all. As Scott himself once said, you could game for perfect calibration at 17% by nominating 17 sure things and 83 sure misses. Superforecaster skill cannot be evenly distributed over all domains, so the level of trust you place on a superforecaster’s 17% is laden with assumptions. Can anyone resolve this for me? Why should one trust calibration?

Expand full comment

I don't think you can trust calibration in a vacuum, but it seems to me that if you can see the kinds of predictions they are making, you can trust them within their domain.

Scott himself is doing this when he refuses to significantly (factor of ten) discount his AI risk probability in the face of all the superforecasters who disagree, assuming he has domain-specific knowledge that they don't.

Expand full comment

One point you miss is that superforecasters are not allowed to choose the questions, they are chosen for them. In some contests they are allowed to reject some questions, but there is still no obvious strategy to game this.

And first of all, it is possible to define the probabilities formally. If you draw one of the 17% answers of a superforecaster *uniformly* at random, then you expect a probability of 17% that it occurs. This means that you put no knowledge about the superforecaster into the prediction.

But you CAN put some such knowledge into it as soon as you have more than one piece of evidence. Assume you have two superforecasters A and B, where A is expert with great track record and area 1 and B for area 2. Then you can combine the two sets of prediction into a better one by taking answer about area 1 from A, and about area 2 from B. (Or combining them in a weighted way or whatever.)

But if both forecasters agrees on the percentage, then you can't extract additional information. Consider a coin flip. Let's assume that John Doe has a 50/50 chance for heads. Let's assume that a physicist has measured the coin so well that she knows position AND velocity of half of the atoms in the coin. (She invented new physics to do that.) And after years of experimentation and modelling, she says that the chance is 50/50. Well, then your only way of combining both predictions is to stay at 50/50. You don't gain any better estimates about the probability of the coin flip.

But you do get better estimates of OTHER events. For example, you are pretty sure that there is no easy way to improve John Doe's prediction, because the physicist already tried that. So, if John Doe reads a lot about coin flipping and starts doing research, you would expect that his accuracy in prediction will not go up with that. This is a question about John Doe, not about the coin, and on that you improve.

Expand full comment

That’s precisely my trouble - you need to know all that detail about the precise thing being predicted. Hearing a SF predicts something at 17% doesn’t give much basis for action unless I know the parameters of the contest in which they earned their SF status, the domains in which they made predictions in that contest, and the degree of similarity between the new prediction and those previous domains.

Expand full comment

There’s a lot of reasons why calibration isn’t good evidence. You can be perfectly calibrated by just assigning 50% to a lot of things. You can’t be perfectly calibrated if you assign 80% to a number of things that isn’t divisible by 5.

Better to evaluate someone by a scoring rule. Sum up the squares of their distances from the truth (that is from 0 or 1, once we find which it is). Or sum up the logs of the probabilities they assigned to things that didn’t happen. Either of these scoring rules is “proper”, though they give different weight to precision at different numerical values.

Expand full comment

It is the wrong question: Will AI destroy the world? The right question is whether we will use AI to destroy the world. Will people use AI predictive power to manipulate us to kill one another more readily than we now do. If AI has developed a theory of mind beyond that of any human it will predict our desires better than we can predict them ourselves. That creates a new lethality for people to use against each other as dangerous as nuclear bomb lethality. Will we do this to ourselves?

Expand full comment

Hmm...

>Will people use AI predictive power to manipulate us to kill one another more readily than we now do.

Of the various ways that AI could end humanity, I'd guess that that is one of the least likely. We've had to deal with deceptive shamans for 300,000 years. While our defenses against being talked into crap have holes in them, we _do_ have them.

For warfare-related AI threats, I'd guess higher odds for:

- improved weapons

- improved production of weapons

For more general threats, I'd guess higher odds for

- simply building replacements for human workers, and essentially outcompeting humans, squeezing them out of the economy and eventually out of existence

and then there is Yudkowsky's

- something like a superintelligent paperclip maximizer winds up using all accessible resources, including ones humans need to live, as subgoals of some task

Expand full comment
Mar 21·edited Mar 21

One counter-argument might be that people can use probabilities as a motte-and-bailey.

The motte is: I say "my probability of X is 71%", you say "you're no more of an expert than me, how can you claim such precise knowledge?", and I reply "I didn't say I could, a probability like that is such a shorthand for fairly but not completely confident".

The bailey is: I say "my probability of X is 71%", you say "okay, well I think X is fairly unlikely", and I reply "since I gave a precise probability, I've *clearly* thought about this much more than you! So it's clear whose opinion should be trusted."

I don't for sure if people do that, but I bet they do.

Expand full comment

There is something a bit weird about the whole quest-for-accurate-predictions thing. Perhaps it helps to ask: why do we care if someone states that something is very likely/likely/50-50/unlikely/very unlikely to happen?

In one situation, you care because you want to place a bet on an outcome. Then you hope that nothing will change after you placed the bet, that may change the initial probabilities you assigned to various outcomes.

In another situation, you care because you want to do preventive action, i.e. precisely because you want to motivate yourself – or others – to do something to change the probabilities.

These motives are very different, and so is your motive for offering predictions in the first place. Plus, they change your motive for listening to the probabilities other people assign to events (including how you infer/speculate about the motives they have for offering their probabilities).

Edit: And an added twist: Your own motive for consulting probabilities may be a desire to place bets on various outcomes. But others may interpret the same probabilities as a call to action, leading them to do stuff that changes the initial probabilities. Making it less probable that your initial probabilities are really good estimates to use to place bets. (Unless you can also factor in the probability that others will see the same initial probabilities as a call to action to change those probabilities.)

…all of the above concerns difficulties related to how to factor in the “social” in the biopsychosocial frame of reference when trying to predict (& assign probabilities to) future events where “future human action” has a bearing on what may happen.

Expand full comment

This is really good! Is there terminology for that distinction? Something similar to the distinction between normative vs descriptive statements.

Expand full comment

"Is there terminology for that distinction"

...not that I have seen. Maybe someone should invent some terms to capture this distinction:-)

(I added an edit, to elaborate the point.)

Expand full comment

If I place a bet, I don't want nothing to change. I want evidence to change in favor of the position I bet!

Expand full comment

Yeah well…

The distinction I was/is driving at is the difference between a situation where you do not want to use the probabilities as a starting point for some type of action to influence the probabilities - you just want to get them as accurate as possible. (Alternatively: You do not assume that you or anyone else can do much to change the probabilities.)

Versus a situation where you have an interest in probabilities that are high (or low). Where your motive for making probabilities in the first place is as a starting point for judging if you should do something to change them – or in order to prod others to do something to change them.

“Placing a bet” was meant as an example of the first type of situation. You may be right that it is perhaps not a 100 percent good example. Having said that, I would add that if you think something or someone might act in a way that influence (increase) the probability that what you have placed a bet on, actually happens, you should have factored in the probability that this might happen before you placed your bet😊

Expand full comment

>In one situation, you care because you want to place a bet on an outcome. Then you hope that nothing will change after you placed the bet, that may change the initial probabilities you assigned to various outcomes.

>In another situation, you care because you want to do preventive action, i.e. precisely because you want to motivate yourself – or others – to do something to change the probabilities.

Good points, and a good distinction!

I think that there is a third possibility:

Even if one has no ability to change the probabilities, you may want to decide if you should take some sort of protective action. I can do nothing (to a good approximation) to change the odds that Putin will launch a nuclear strike against the USA, but, if I thought the odds were high enough, I could build a fallout shelter in my basement. In a very general sense, one could think of this as analogous to a bet, but I think of it as different enough to have a different category.

Expand full comment

I’m pretty sure a government bureaucrat labeling one side of a discussion misinformation and then shutting off debate has a high probability of making the world much worse than it is now…

Expand full comment

counterpoint: 97.26% of the silliest things "rationalists" do could be avoided if they just resisted the temptation to assign arbitrary numbers to things that are unquantifiable and then do math to those numbers as if they're meaningful

Expand full comment

I used to think it was 65.3% but your post has caused me to update my beliefs to match yours so thank you for that

Expand full comment

now that we have attached the appropriate ritual signifiers to this process, we can pretend it has been rigorously mathematical the whole time!

Expand full comment

If I try to assign a probability to a belief I sense that I will have to think more in depth and rigorously about it

One option for me is to reject this framing of beliefs because I do not want to do that and will feel like those that do will be signifying that on average they have thought about it more than me

Probably that is why a lot of people object. However I have observed that the averages of probability answers of groups are more accurate than almost all individuals, and that some individuals and small groups are reliably better than the group averages

Therefore my analysis must hang on how much I believe in that fact just being a stupendous statistical aberration. It doesn't matter if I say I'm extremely certain that answers given as probabilities are on average reliably providing much more information or whether I say I think it's 99% likely, it's clear that the general principle is correct without significant evidence to the contrary

Expand full comment

i've identified a 94.8% probability that that's a sloppy and failure-prone heuristic

Expand full comment
founding

What are some examples of things you consider "unquantifiable", and do you think they are in some sense "inherently" unquantifiable, or just unquantifiable at our current tech level? (There are plenty of things that are practically quantifiable today but not at various points in the past, like say the probability it will rain tomorrow.)

Expand full comment
Mar 22·edited Mar 22

I'm not Tom J, but personally I'd consider something unquantifiable if it has a high level of model uncertainty or if it is affected by the act of making predictions.

Incidentally, what do you think the probability is that you've been given drugs/have a brain tumor/etc. and are mentally addled to the point where your understanding of math is incorrect and inconsistent? Good luck quantifying that!

Expand full comment

the post we are both commenting on is full of examples in which the author admits that there is no objective or mathematically verifiable way to assign a number to the chance of a thing happening, but that he would rather see the thing assigned a number anyway, for a variety of convoluted reasons

Expand full comment
founding

Oh, I thought you had other things in mind, because most if not all of the stuff from this post seemed very reasonable to me. And I don't know which reasons you're calling convoluted, but the ones in the "What Is Samotsvety Better Than You At?" section seemed particularly straightforward, e.g. in a capitalist society, it seems pretty hard to argue that "doing X makes you more money" is a "convoluted" reason for doing X (I'm not trying to get into whether it's moral or something, just that it's quite straightforward). This seems like a testable prediction about the world (though not a conveniently testable one; I'm not going to go start a company and run it on bad epistemics for science). So which part do you disagree with: is it

a) you disagree on the experimental result we would likely get - you think that companies which use shmrobabilities from Samotsvety would not actually make more money than ones which either eschew shmrobabilities entirely or use numbers from other less-well-calibrated sources, or

b) you agree that the experimental result would probably show that shmrobabilities are useful for making money, but just object philosophically to calling them probabilities because they're not "objective or mathematically verifiable", or

c) something else I'm not thinking of?

Expand full comment

a) there are many examples beyond the scope of this article as well--you can go to places like lesswrong if you want (though i wouldn't) and you will find lots of people trying to quantify things like shrimp suffering, or the status of their polycules, or the outcomes of thought experiments they're running fully in their heads about machine gods...

b) there are obviously some things which are amenable to being sorta quantified in the manner that e.g. markets assign numbers to things--for example the stock value of a company is a reflection of some things which are objectively "real" (such as its cash flow) and some things which are not directly measurable but can resolve to a real number (like investor sentiment, press coverage of the company, etc). the mistake is thinking that because some things are objectively quantifiable, and because some things are sorta subjectively fuzzy quantifiable if we're ok with using proxies, that therefore everything important or meaningful must benefit from being assigned a number (as long as we follow the appropriate mathematical rituals in the process)

Expand full comment

I think most people are way too averse to probabilities, but also rationalists are too enthusiastic about them. I used to share the enthusiasm, so here's what convinced me that they don't deserve so much credit as rationalists give them:

You can think of reality as being determined by a bunch of qualitatively distinct dynamics, which in probabilistic terms you might think of as being represented by a large number of long-tailed, highly skewed random variables. The outcomes you ask about for forecasting are affected by all of these variables, and so the probabilistic forecasts represent an attempt to aggregate the qualitative factors by their effects on the variables you forecast for.

This is fine if you directly care about the outcome variables, which you sometimes do - but not as often as you'd think. The two main issues are interventions on the outcomes and the completeness of the outcomes.

For intervention, you want to know what the long-tailed causes are so you can special-design your interventions for them. For instance if a political party is losing the favor of voters, they're gonna need to know why, as you are presumably doing just about everything generic they can to gain popularity, and so their main opportunity is to stay on top of unique challenges that pop up.

But even if you just care about prediction, the underlying causes likely matter. Remember, the uncertainty in the outcome is driven by qualitatively different long-tailed factors - the long-tailedness allows them to have huge effects on other variables than the raw outcomes, and the qualitative differences mean that those effects will hugely differ depending on the cause. (There's a difference between "Trump wins because he inflames ethnic tensions" and "Trump wins because black and hispanic voters get tired of the democrats".)

Expand full comment

Which is to say, even if you have probabilities that are "accurate" in some sense, they are typically not going to be useful without much more context that is poorly described with probabilities. It's likely that you can set up your probabilities to beat other systems in this "accuracy" measure, but that's not helpful unless your "accuracy" measure assesses what you need, which it typically does not. This is innocent silliness in isolation, but a serious problem if you use probabilities to replace other forecasting systems that perform useful work by other criteria.

Expand full comment

It appears to me that the crux of your argument is that probabilities about events which seem to be one-off events make sense because there's secretly a larger reference class they belong to.

So superforecasters have a natural reference class consisting of all the topics they feel qualified to make predictions on, and you calibrate them across this reference class they have chosen.

So it appears to me that your argument is much simpler than you make it out be - it revolves around the fact that everyone gets to choose their reference class, and while sometimes there's an obvious reference class (like flipping a coin), this isn't so material.

Conversely, if you don't have many examples of predictions to judge people on, then their probability statements are indeed meaningless. If some stranger tells me that there is a 50% chance that aliens land tomorrow, then I really don't know how to integrate this information into my worldview.

Expand full comment

> It seems like having these terms is strictly worse than using a simple percent scale

Lost you here. The idea that we can and should stack rank likelihoods of different events is a good point and seems right, but converting the ranking to percents seems to be where people are just pulling numbers out of thin air.

You sort of address that in the next paragraph but I feel like I need more than a few sentences on this to be convinced. I was getting ready to agree from the racing argument.

Expand full comment

I’ll take a shot—first why ambiguous phrases like “very unlikely” are harmful. I’m lifting an example from Philip Tetlock’s Superforcasting which really stuck with me. In 1951 American Intelligence officials spent months writing an assessment of the USSRs strategy towards Yugoslavia, concluding that an attack is a “serious possibility”. In discussing the report with State department officials, he was startled they though this indicated a low over probability, since he intended to communicate roughly a 65% chance. Asking his staff who all signed on to the report, the universally agreed on “serious possibility” was intended to convey 80/20 odds in favor by some staff members, and 20/80 by others. Communicating with ambiguity only gets in the way of proper understanding of the likelihood of events.

Second, in defense of assigning concrete probabilities to ambiguous events, I’ll offer a hypothetical.

If I have a friend who says they’ll roll a D6 and asks me to bet on rolling a one, I should use a probability of 1 in 6. If I know my friend has three dice, a D4, a D6, and a D12, and tells me they’ve randomly chosen a die, and to bet on the outcome, you could say the probability of a 1 lies somewhere between 1/12 and 1/4, but it’s more meaningful to say that it’s *still* the average 1/6, and that will be the best performing prediction in the long run. The “true” probability might be different—my friend might know they already chose the D4—but i can still optimize my estimation of the probability from the info I have.

The point being if I have worked out a probability seems between 10 and 30%, but I am truly impartial between every probability in that interval, it *fundamentally equivalent* to say i evaluate a 20% chance (in the single shot case). Now if we repeat the event, we would have different expected distributions, but that’s another layer of complication.

If you have uncertainty in your probability estimation of an event, it’s as (and perhaps more) informative to average it out into a probability rather than leaving it as an implied range.

Expand full comment

>Asking his staff who all signed on to the report, the universally agreed on “serious possibility” was intended to convey 80/20 odds in favor by some staff members, and 20/80 by others.

Yikes!! Thanks very much for the example.

Expand full comment

Wouldn’t the probability of rolling a 1 in your example be 3/22? The probability of rolling a 12 seems like it would be 1/22 to me.

Expand full comment

Just like probabilities are meaningful in that they compare two outcomes, confidence is only meaningful when it compares two questions. Since we like to think about probabilities as betting odds, certainty tells you how you would distribute your capital between a large number of questions. It’s always smartest to bet your best guess, but you’ll put more weight on the ones with higher confidence. That’s how I think about it anyways.

Expand full comment

Can we reconcile frequentists and non-frequentists by saying that it's all about the frequencies across multiple events for a given forecaster? I use forecaster as a concept of something that is able to make multiple forecasts, whether it's a single machine learning model, a human being, a group of people (like the wisdom of crowd, or predictions from random people), or a process to combine any such input to make a forecast. All the arguments you make about forecasts on non repeatable events being useful revolve around the fact that if you look at a collection of such events, they are indeed useful. So why not talk about the collection itself and not single events in isolation, which inevitably gets some people uneasy? We can just consider that this collection of predictions is a specific forecaster and peacefully discuss it.

If I have a model that I use once, no one has any info on how it's been built, and it tells you that it's going to rain at 90% probability tomorrow, what would you do with this? The only way to use this information in relationship with previous models submitted by random people.

Expand full comment

What you are arguing for is one of three possible philosophical approaches to defining probability. There is the approach that it is mathematical--a probability of 0.5 for heads is embedded in the mathematical definition of a fair coin. There is the approach that is empirical--as you flip many coins, the frequency of heads approaches 50 percent. Finally, there is the notion that probability is subjective. I believe that the probability of heads is 0.5, but you could believe something else.

You are insisting on the subjective interpretation of probability. It is what one person believes, but it need not be verifiable scientifically. Other approaches make "the probability that Biden wins the election" an incoherent statement. The subjective definition has no problem with it.

Expand full comment
Mar 24·edited Mar 24

I think that's an interesting differentiation. On the one hand I think it's arguably semantic. Eg:

In colloquial English there's a 50% chance a coin lands on head, and a 50% chance the red team wins. But one of these is "real" probability, and the other is shmrobability.

But I think subjective probability is surprisingly sensible. Ie, a coin, with identical inputs, always lands on either heads or tails. A coin flip, to God, is not probabilistic.

I don't know enough about physics to know if "true" probability exists. But I think this is a red herring anyway. Most of the time when we are arguing about probability we're really arguing about too much complexity to give a precise answer. In this sense it's more legible to say "a coin has a 50% change of landing on heads" then "the coin is predetermined to either land heads or tells but we're not sure which."

Perhaps a magician can flip a coin and have it always land heads. And even with a normal flip a coin seems not 50/50: https://www.reddit.com/r/math/comments/174nep0/fair_coins_tend_to_land_on_the_same_side_they/

Expand full comment

You might also get a kick out of Cox's theorem, which "proves" that any "reasonable" view of probability is Bayesian. https://en.wikipedia.org/wiki/Cox%27s_theorem . There is a reasonably accessible proof in one of the appendices in *Probability Theory: The Logic of Science*.

You can still poke holes in this, particularly if you have some sort of Knightian uncertainty, but it's still pretty interesting.

Expand full comment

Is it generally held that probabilities of the urn-kind are truly different in kind (rather than degree) from one-off predictions?

In assigning a probability to drawing a certain type of ball, I am taking the ultimately unique, once-in-history event of drawing a particular ball in a particular way from a particular urn, and performing a series of reductions to this until it becomes comparable to a class of similar events over which an explicit computation of probability is tractable. We take the reductions themselves to be implicit, not even worth spelling out (how often is it remarked on that the number of balls must stay constant?).

It seems like this reduce-until-tractable procedure would be enough to assign a probability to most one-off events, and is (roughly) what is actually done to get a probability (allowing here for the reduction to terminate at things like one's gut feeling, in addition to more objective considerations).

Is there something wrong about this account, such that urn-probabilities can really be set apart from one-offs in a fundamental way?

Expand full comment

To go back to your example of fair coin flip vs biased coin flip vs unknown process with binary outcome, the reason you end up with 50% for all of them is because the probability is a summary statistic that is fundamentally lossy. It's true that if all you're asked to do is choose to predict a single even from that process, your "50%" estimate is all you need. But the minute you need to do anything else, especially adjust your probability in light of new information, the description of the system starts mattering a lot: The known-fair coin's probability doesn't budge from 50%, the biased coin's probability shifts significantly based on whatever results you see, and the unknown process's probability shifts slowly at first, then more quickly if you notice a correlation between the outcome and the world around you (if the process turns out to be based on something you can observe).

Summarizing to a single number loses most of that usefulness. It's less lossy than "probably not", and you're right to defend against people who want to go in that direction, but it's not that much less lossy. And in a world where we can send each other pages of text (even on Twitter now!) there's not much value in extreme brevity.

I tend to take more seriously people who offer at least a few conditional probabilities, or otherwise express how much to adjust their current estimate based on future information.

Expand full comment

>Summarizing to a single number loses most of that usefulness. It's less lossy than "probably not", and you're right to defend against people who want to go in that direction, but it's not that much less lossy. And in a world where we can send each other pages of text (even on Twitter now!) there's not much value in extreme brevity.

I rather like Scott's "lightly held" phrasing. I think of it as being how much he expects a probability estimate to shift, given additional evidence. It seems like a move in the "less lossy" direction you suggest, and would fit well with the coin example you give.

Expand full comment

> Is it bad that one term can mean both perfect information (as in 1) and total lack of information (as in 3)? No. This is no different from how we discuss things when we’re not using probability.

I kind of disagree. I think many people have converged on a protocol where you give a probability when you have a decent amount of evidence, and if you don't have a decent amount of evidence then you say something like "I guess maybe" or "probably not".

Some people (people who are interested in x-risk) are trying to use a different protocol, where you're allowed to give a probability even if you have no actual evidence. Everyone else is pushing back, saying, "no, that's not the convention we established, giving probabilities in this context is an implicit assertion that you have more evidence than you actually do".

Scott is arguing "no, the protocol where you can give a probability with no evidence is correct" but I don't feel like his examples are convincing, *for the specific case where someone has no evidence but gives a probability anyway*, which is the case we seem to be arguing about.

I wish we could fix it by switching to a protocol where you're allowed to give a probability if you want, but you have to also say how much evidence you have.

Expand full comment

The non-frequentist view of probability is useful even if it has no connection to the real-world. It's a slider between 0 and 1 that quantifies the internal belief of the one making the statement. There are a myriad of biases that will skew the belief though. However, everyone's probability wouldn't be the same because the "math" people do in their heads to come up with these numbers will be inconsistent. Some will curve towards higher probabilities, others will go low - one person's 40% need not correspond one-to-one to another person's 40%. We should call it something like "opinion-meter" unless there is a more formal data aggregation process to come up with the numbers (thus ensuring consistency)

Expand full comment

Part of the issue is that many people who complain use numbers as substitutes for other ideas:

-50/50 does not mean even odds, it means 'I don't think this is something anyone can predict'

-99.9% does not mean 999/1000, it means 'I think it's really likely'

These linguistic shortcuts are fine for everyday conversation, but when others use numbers trying to convey probabilistic reasoning, this first crowd often defaults to their own use of numbers and concludes - 'oh, you can't actually know stuff like this - and it is weird that you are using non-standard fake probability numbers'.

On a related note, I've dealt with many in the medical field who (perhaps for liability reasons, perhaps for other reasons) are strongly opposed to trying to phrase anything in a probabilistic nature. My kid was preparing for a procedure which was recommended, but I knew didn't always have great outcomes. I ask one doctor how likely it is that the procedure will work: 'oh, 50/50'. I ask another one: 'there's a pretty good chance it will work, but no guarantees', but neither will provide anything more informative or any evidence behind their output. But I learned that, at least with some doctors, if I said: if this procedure were to be performed, say, 80 times in situations comparable to this one, how many of those 80 procedures would you expect to go well? With this sort of phrasing, most of the doctors could give answers like 50-65 (out of 80) which was far more satisfying of a response for me. But I imagine for the average patient, this extra information wouldn't add much to the experience, so they are inclined to simply keep things vague.

Expand full comment

> I don’t understand why you would want to do this. If you do, then fine, let’s call it shmrobability. My shmrobability that Joe Biden will be impeached is 17%...

Scott, maybe a new word is a better idea than you think? Imagine if you and other Bay Area rationalists pledged to adopt the top-voted new words for probability & evidence & rationalist, provided that 500 people with standing in your community vote. I would not be a voter, but I can offer suggestions:

- probability = subjective confidence (this clarifies it is not measurable like a frequency is);

- evidence = reason to believe (this avoids overlap with either frequentist science or the legal system);

- rationalist community = showing-our-work community (sounds less arrogant; leaves open the possibility that there is a better methodology of systematized winning; avoids confusion with either Descartes or Victorian-era freethinkers).

Then you wouldn't have to keep writing posts like this!

Instead, you could focus on spreading the word about superforecasters. (Based on their comments, some critics didn't even notice that section.) But if you seem to be engaging in a philosophical/definitional debate, then that's what you will usually get, perpetually.

Expand full comment

> if you want to find out why Yosuha Bengio thinks there’s 20% chance of AI catastrophe, you should read his blog, or the papers he’s written, or listen to any of the interviews he’s given on the subject - not just say “Ha ha, some dumb people think probabilities are a substitute for thinking!”

On the other hand, if you want status, that may well be the optimal response. I think people care orders of magnitude more often about status than about why someone else thinks something.

Expand full comment

"If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%."

Recently, I saw an "expert" put the probability of AI-driven human extinction between 10% and 90%. Now, this would average as a 50% probability, but it would mean both a lot less and a lot more than a simple statement of a 50% probability. It conveys that it is very unlikely that AI will be quite harmless, but a bad outcome is also by no means certain. Also, all probabilities between 10% and 90% seem (incredibly) to be equally likely to him. This looks like a pretty strange belief system, but it's surely logically consistent. But then, a straight-up 50% assessment would have the meaningful frequentist property of being right half of the time, if well-calibrated. But then, in the context of human extinction, does it really matter? I guess, the 10%-90% statement could mean that based on the current evidence, it is equally likely to find a probability that has the required long-run properties of being right in the range of probabilities from 0.1 to 0.9. (With the understanding that a long-run probability here requires some sort of multiverse to be meaningful.)

What if I said that my probability lies somewhere between 0% and 100%? By saying this, I will add no information to the debate (as I have 0 knowledge on the matter), but would still claim a 50% average probability of human extinction? I find this hard to believe...

Expand full comment

I think you could remove all of my objections if you replaced "probability" with "confidence" in this post, and also assigned a confidence interval that is informed by your (well, not necessarily you personally, but whoever is providing these numerical values) skill as a forecaster.

Saying "my probability" is, to me, similar to saying "my truth".

Expand full comment

I came here to say more or less this. To be concrete, saying "confidence interval 16%-18%" versus "confidence interval "5%-40%" nicely quantifies some part of what Scott seems to mean by "lightly held".

Expand full comment

Oh, I like that. This would also solve my problem - both the range and using a different term. I don't think it's just semantics either, as "probability" is being used to mean something that implies the more mathematical version while meaning something else.

I think a lot of my problem with predictors using overly specific "probabilities" is that we often mean something far less specific. Using a range would cover this well. Saying 1-2% is very specific in a way that 1% is not, where a lot of people mean <5% or a general "pretty unlikely" instead of a specific 1%. 5-40% correctly carries the information about uncertainty that 25% used in the same place would not.

Expand full comment

You still have to decide under uncertainty between different payoffs. This can be abstracted by bets. You still accept some side with one odds and another side with other odds if pressured to choose. You still have one number as your probability.

Mathematically this is also much cleaner. If you gave a range, you probably think some values in that range are more, some are less likely, in other words you would have a probability distribution over that range. By integrating that distribution you could arrive at your actual probability.

Expand full comment
Mar 21·edited Mar 21

Reasoning about uncertainty, formal or informal, must involve some kind of distribution. But articulating a full distribution is generally cumbersome at best. Summarizing with a single probability has some utility. Summarizing with a confidence interval can help communicate better about highly uncertain events.

If Bengio put his 90% confidence interval for doom at 19-21 I would feel fine accusing him of just being provocative. That can't possibly be a serious estimate of the distribution.

In a sense I'm advocating for taking Scott's argument one step further. Replace the fuzzy "lightly held" with a number.

Expand full comment

Consider the following example: Bengio says there is 50% chance that p(doom) is 40% and there is 50% chance p(doom) is 20%. In this case, using the standard law of probability, we can say that Bengio believes p(doom) to be 30%. I don't see what the former way of saying 30% means above what the latter means given that they are equivalent mathematically. Similarly for your proposal of using a range.

Expand full comment

A 2 point distribution for any complex highly uncertain question is silly. I think a serious attempt to estimate such a distribution would result in some kind of blob with tails. An interval helps communicate whether you think that distribution is more tall and skinny or short and wide.

Expand full comment

>In a sense I'm advocating for taking Scott's argument one step further. Replace the fuzzy "lightly held" with a number.

Sounds reasonable. Perhaps take it one step further than that? If there is a known-unknown which dominates the likely change in the probability, then maybe giving the conditional probabilities on the outcomes for it is the best approach?

p(doom) = 20

p(doom | partialAlignSuccess) = 15

p(doom | !partialAlignSuccess) = 25

( If p(partialAlignSuccess) = 0.5 )

Expand full comment

Seems reasonable in some cases. Although I worry it would get cumbersome quickly. Scott seems to mostly be interested in productive conversation in this post; that's what I'm interested in for sure. Even using a single concrete probability number is going to put some people off. Using an interval makes that somewhat worse. Diving into multiple lines of formal notation seems like it will limit the set of people who will get much out of it.

Expand full comment

Question: "is the 38th digit of pi larger than 4?"

If you ask me to give an interval for my probability that this is the case, the interval is [0.5, 0.5], because I know exactly how likely I think the statement is to be true.

Yet this belief of mine is very lightly held - it would completely change if I just googled the answer.

Expand full comment

Suppose that my confidence interval for the probability that Trump wins the election is 45%-55%. Now suppose that somebody puts a gun to my head and forces me to choose to bet 1 dollar on either Trump winning or Trump losing, at even odds. The confidence interval doesn't help - I need to actually decide whether I assign > 0.5 probability to Trump winning.

Expand full comment

Unrelated to the post: great username!

Expand full comment

Giving a range that is equidistant to either side of 50% is telling very useful information, even if it doesn't help you in placing a bet. It's saying that you don't know for sure between two items, but you think it's relatively close. To me, it's much better than saying 52%, which implies a strong reason to think one is more likely than the other, but not by much.

In most cases, a range that centers on 52% (47-57%) makes more sense if that's the correct level of uncertainty.

Saying 50% with no range doesn't actually help with the problem either. "Gun to your head" you still need to pick a side if you say 50%.

Expand full comment

But if my probability is 50% then it doesn't matter which side I pick because my expected payoff is the same either way.

Expand full comment

Sure, but at that point couldn't you treat the range of 45-55% the same? It's got an average of 50% and no reason to select above or below that.

Expand full comment

Why does my point estimate of the probability have to be the midpoint of my interval?

Expand full comment

Gun-to-my-head choice is not the only thing that matters. If Alice gives her interval as 45-55 and Bob gives his as 40-50, they're communicating something different. Alice has more confidence that she's close to the "right" estimate. Maybe she has better data; maybe she has better models; maybe she's just more confident in her guesses.

It's about communication.

Expand full comment

Confidence interval already has a precise definition in null hypothesis significance testing. It would be wise to avoid that term.

Expand full comment

The problem is that there is a massive equivocation here.

You're arguing for using the language of probability for situations in which we have little information or ability to rigorously model. You acknowledge that these situations are not quite the same as those (like balls in urns) where we have high information and ability to model. You even say one might want to call the former something different (e.g., a "shmrobability.")

But the simple fact is that the situations are not all the same. The low information ones really do involve people "pulling numbers out of their ass," and this happens all day long. How many conversations that begin "What's your P(doom)?" involve such shit-covered numbers?

For what it's worth, the following summarizes my position pretty well. Yes, I know you purport to address it in the post, but I don't think your discussion of it really does.

https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for

Expand full comment

The usual argument here, I think, is Cox's theorem: since it turns out any useful notion of shmrobability has to obey the same laws as probability, why not just call it probability?

One could argue about whether all of Cox's assumptions about what constitutes "useful" are reasonable - I'm not entirely convinced the "real number" assumption makes sense, for example - but amount-of-information and model-rigor don't come into it.

Expand full comment

Another way of stating my critique is that the informal uses of "probability" do *not* in fact obey the laws of probability in the way that Cox's theorem says they ought to. Many will say "Cox theorem, therefore we should use probability theory," but for them that functions merely as a rhetorical argument; the numbers they then begin making up do not in fact behave as probabilities.

Expand full comment

Hmm. Are you saying that in informal usage, people are intending to be consistent but actually pretty bad at it, or that they are not actually intending to be consistent? I agree the latter should not be using the language of probability.

Expand full comment

I think it’s a mix of both, often unconscious. People will not only use, but argue for, the language of probability on the basis of its consistency properties, but then not actually do what would be needed to achieve those properties, resulting in a unintentional motte-and-bailey.

Expand full comment

Hmm. I would argue that there should be some allowance made for the "trying to be consistent but not succeeding" case, provided the speaker is willing to update their estimations when they realize an error. Bounded rationality, sometimes bookies lose money, etc. But I do agree that it's more common in the wild to encounter claimed probabilities like "99%" or even "100%" used in a loose, impressionistic manner. THAT I wish people would stop doing!

Expand full comment

Yes, it’s the latter I’m talking about. It’s just that I encounter a lot of it, even among very smart people, enough to make me think of it as an attractor in cognitive space.

Expand full comment

Frequentism doesn’t actually exist, metaphysically.

No two events are exactly the same, so we are always performing some kind of conceptual bundling or analogy, in order to use past events in creating probabilities.

Consider a coin flip. Percy Diaconis showed that a coin is more likely to land on the face that began pointing up. So a good frequentist should only use past instances of coin flips that started in the same orientation as theirs, in forming a probability.

But then, the size and shape of the coin also impact this effect, so they should only use flips that match those as well. And each coin has a unique pattern of wear and tear, so it better be this exact coin.

And actually, the angle of the flipper’s hand and the level of force they apply and the breeze in the room are also key…

As it turns out, Diaconis solved this too: he built a machine that precisely controls all elements of the flip, and produces the same result each time.

The probabilistic variance in the flip comes from the aggregation of these disruptive mitigating factors.

Frequentists imagine they are insisting on using only past perfect reference classes, but to actually do this, you’d need to set up the entire universe in the same configuration. And if you did, then the result would be deterministic, making probability irrelevant.

The fact is that every probability is secretly Bayesian. You are always drawing your own conceptual boundaries around distinct and deterministic events in order to create an imperfect reference class which is useful to you.

Frequentists are just arguing for a tighter set of rules for drawing these Bayesian boundaries. But they are also Bayesians, because it’s the only conceptual framework that can support actual forecasting. They’re just especially unimaginative Bayesians.

And this is allowed! You can be a tight Bayesian. But if you want to call it frequentism and insist there is a bright metaphysical line, you need to explain exactly where the line is. And you simply can’t do so, without sacrificing probability altogether.

(Obviously this is weird metaphysical nitpicking, but people in weird metaphysical glass houses should not throw weird metaphysical stones.)

Expand full comment

So the thing is that everything you've said is correct and this is a good article and people who disagree would have to work very hard to convince me it's wrong. The other side of the debate is making a type error and not communicating their objection well.

But the objection could be: "You've imposed language requirements unique to your culture on making arguments and then discounted arguments that don't use that language, ensuring that you'll discount any critique that comes from outside your bubble." If so, this isn't much different than objections to culture war stuff, or academic fields gatekeeping.

Expand full comment

All probabilities are metaphors - to a greater or lesser extent. Perfectly serviceable as such.

Expand full comment

One teensy additional thought: Wharton business professor Phil Rosenzweig stresses the difference between probabilities where you can affect the outcome (e.g. NASA scientists thinking about moon landing) and those where we have no control (e.g. climate change). See a good breakdown in Don Moore and Paul Healy (2008): “The Trouble with Overconfidence”.

Expand full comment

You seem to be defending not just non-frequentism about probabilities, but a sort of *objective* non-frequentism. At least, that’s what’s going on in the section about Samotsvety. At least, in the sentence where you say, “If someone asked you (rather than Samotsvety) for this number, you would give a less good number that didn’t have these special properties.”

I claim that the number .17 doesn’t have any of these special properties. The only numbers that could have objectively special properties for any question are 1 or 0. If you assign the number .17, and someone else assigns 1 and someone else assigns 0, then the person who does the best with regards to this particular claim is either the person who assigned 1 or the person who assigned 0. (They bought all the bets that paid off and none of the bets that didn’t, unlike the person who assigned the other extreme, who did the absolute worst, or the person who assigned .17, who bought some of the bets that paid off and some of the ones that didn’t.)

However, there are *policies* for assigning numbers that are better than others. Samotsvety has a good *policy* and most of us can’t come up with a better policy. No one could have a policy of assigning just 1s and 0s and do as well as Samotsvety unless they are nearly omniscient. (I say “nearly” because a person who is assigning values closer to 1 and 0 and is *perfectly* omniscient actually gets much *more* value than someone who assigns non-extreme probabilities no matter how well the latter does.)

But the goodness of the policy can’t be used to say that each particular number the policy issues is good. Two people who are equally good at forecasting overall may assign different numbers to a specific event. If you had the policy of always deferring to one of them, or the policy of always deferring to the other, you would very likely do better than using whatever other policy you have. But in this particular case, you can’t defer to both because they disagree. Neither of them is “right” or “wrong” because they didn’t assign 1 or 0. But they are both much more skilled than you or I.

This is no weirder than any other case where experts on a subject matter disagree. Experts who are actually good will disagree with each other less than random people do - even computer scientists who disagree about whether or not P=NP agree about lots of more ordinary questions (including many unproven conjectures). But there is nothing logically impossible about experts disagreeing, and thus the *number* can’t have the special properties you want to assign to it.

Expand full comment

I prefer the term "credence" for the sort of number that you get from forecasting, "probability" for the sort of number that you get from math, and "frequency" for the sort of number you get from experiment. "The probability for an ideal coin to land heads is 50%. The observed frequency for this particular coin to land heads is 49%. My credence in the hypothesis that this is an unbiased coin is 99%."

Expand full comment

I like this better than the "confidence" suggestion upthread. "Credence" captures what these numbers mean in a way that "confidence" doesn't (you could have 80% credence in event A happening, with high confidence, because you're an expert on the topic, while also having 80% credence in event B, with low confidence).

Expand full comment

About 23.2% of the things I ever say I'm quoting dialogue from a movie. Not "you talking to me" but more ""I have a lawyer acquaintance” or "it's my own invention". In the film high Fidelity the protagonist asks his former girlfriend what are the chances they'll get back together and she says something like "there is a 57% chance we'll get back together" I use this all the time. I tell my wife "there's a 27% chance I will remember to pick up bananas at the store", allowing her to make whatever adjustments she thinks are necessary.

Expand full comment

People decide what they think are the rational prices for stock futures and options all the time, despite that each one is about a singular future event.

Expand full comment

Frequentist probabilities are just model-based Bayesian probabilities where the subjective element is obscured.

Parable: I have an urn containing 30% red marbles and 70% blue marbles. I ask four people to tell me the odds that the first marble I'll draw out will be red. The first guy says 30% because he knows that 30% of the marbles are red. The second guy says 100% because he is the one who put the marbles in the urn, and he put in blue ones, followed by the red ones, so the red ones completely cover the blue ones, meaning the first one I touch will be red. The third guy says 30%, because he knows everything the other two guys know, but also knows that I always vigorously shake the urn for a solid minute before I draw out a marble. The fourth guy says 0% because he is a mentat who has calculated the exact dynamics of the marbles and of my searching hand given the initial conditions of the Big Bang and knows it will be blue.

All allegedly "Frequentist" probabilities are like this. You smuggle in your knowledge about the process to structure and screen off uncertainties, such that the remaining uncertainty is Bayesian, or, in other words, based entirely in unquantifiables. You then pretend that you have done something different than what Bayesians do.

Expand full comment

what a great parable. is this original?

Expand full comment

Thanks! Yes.

Expand full comment

It's brilliant. Could replace Scott's whole post haha

Expand full comment

> If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%.

This approach has some problems. Following it, you would assign p(Atheism)=0.5, p(Christianity)=0.5, p(Norse Pantheon)=0.5.

Of course, any of these possibilities is mutually exclusive, so the probabilities can't add up to more than one. You could simply say "okay, then probabilities are inverse to the number of mutually exclusive options". But should Christianity really just count as one option? Different sects surely differ in mutually exclusive details of their theology! Perhaps you should have at least two options for Christianity. Then someone invents the Flying Spaghetti Monster as a joke. If you work from zero knowledge, it would look just as probable as Atheism.

Someone generates a random number. What is the probability that it is two? What is the probability that it is 1/pi? Is the expected imaginary part of the number zero?

Without additional information ("the number is returned as a signed 32 bit integer"), I think it is very hard to form defensible priors for such questions.

Expand full comment

This is my complaint. The probability of a random statement being true isn’t 50%. Maybe you start there then normalize?

Expand full comment

Why wouldn't it be?

Every statement has a negation, right? If it's equally likely that a "random" statement would be S or the negation of S, and P(S) = 1 - P(not S), then the expected value of P(S) for an unknown S must be 0.5.

If you don't buy that, you have to be imagining that some statement exists that can't be negated, or that however you're picking "random" statements is somehow biased toward some statements over their negations. You could create models in which either is true but I wouldn't expect either based on the words "a random statement".

Expand full comment

I think you actually are making a good case for a 0-knowledge event to start with 50%? None of your counterexamples are 0-knowledge. We know what kind of an object a random number is. We know what Christianity is. Etc.

An example of a 0-knowledge event would be something like "Will faed%i^ojklk&jihl;kjoh/*-ifusk be true?" Since no-one has a clue about what that string signifies, start with 50%.

Expand full comment

I've been trying for a while to explain some of the confusions identified in this article, and even wrote an EA Forum post on it last year: https://forum.effectivealtruism.org/posts/WSqLHsuNGoveGXhgz/disentangling-some-important-forecasting-concepts-terms . I've been struggling to get feedback and might find the distinctions aren't helpful, but every time a read an article like this I keep thinking "Geez I wish we had more standardized language around forecasting/probability."

The main point of my forum post is that people often conflate concepts when talking about the meaning or practicality of forecasts—perhaps most notoriously for EA when people say things like "We can't make a forecast on AI existential risk, since we have no meaningful data on it" or "Your forecasts are just made up; nobody can know the real probability." Instead, I recommend using different terms that try to more cleanly segment reality (compared to distinctions like "Knightian uncertainty vs. risk" or "situations where you don't 'know the probability' vs. do know the probability").

Slightly rephrased, people sometimes demand "definitive" evidence or estimates, but they don't know what they mean by "definitive" and haven't really considered whether that's a reasonable standard when doing risk assessments. I think it's helpful to define "definitiveness" in at least one of two dimensions:

1) how much do you expect future information will change your best estimate of X (e.g., you might now think the probability is 50%, but expect that tomorrow you will either believe it is 80% or 20%);

2) how difficult is it to demonstrate to some external party that your estimate of X was due to good-faith or "proper" analysis (e.g., "we followed the procedure that you told us to follow for assessing X!").

The terms I lay out in my forum post are basically:

• "The actual probability": I don't really explain this in the article because quantum randomness is a big can of worms, but the point of this term is to emphasize that we almost never know ""the probability"" of something. If we decide quantum randomness is merely "indeterminable by physical observers [but still governed by physical laws/causality]" as opposed to "actually having no cause / due to the whims of Lady Luck," the probability of some specific event is either 100% or 0%. For example, a flipped coin is either 100% going to land heads or 0% going to land heads; it's not true that the event's "actual/inherent probability is 50%."

• "Best estimate": This is what people often mean when they say "I think the probability is X." It's the best, expected-error-minimizing estimate that the person can give based on a supposed set of information and computational capabilities. In the case of a normal coin flip, this is ~50%.

• "Forecast resilience": How much do you expect your forecast to change prior to some (specified or implicit) point in time, e.g., the event occurring. For example, if you have a fair coin your 50% estimate is very resilient, but if you have a coin that you know is biased but don't know the direction of the bias, and are asked to forecast whether the 10th flip will be heads, your initial best estimate should still be 50% but that forecast has low resilience (you might find that it is clearly biased for/against heads). *This seems like a situation where someone might say "you don't know the probability," and that's true, but your best estimate is still 50% until you get more information.*

• "Forecast legibility/credibility": I don't have a great operationalization, but the current definition I prefer is something like “How much time/effort do I expect a given audience would require to understand the justification behind my estimate (even if they do not necessarily update their own estimates to match), and/or to understand my estimate is made in good-faith as opposed to resulting from laziness, incompetence, or deliberate dishonesty?” Forecasts on coin flips might have very high legibility even if they are only 50%, but many forecasts on AI existential risk will struggle to have high legibility (depending on your estimate and audience).

Expand full comment

I'm missing an information-theoretic related answer here. There are two factors here, first, the number of "poll options" you can choose from, and second, which poll option you chose (and how that translates to the probability).

For whole percentages, you've got 101 options, 0 to 100. For percentages with three fractional digits, you've got 100,001. For [probably | probably not | don't know] you've got three options, and for just [don't know] you've got one option.

So, estimating the probability of the three scenarios, I'd answer as follows:

1) Fair coin: the 101st option of 201 options. (e.g. the one in the middle)

2) Biased coin: the 5th option of 9 options. (e.g. the one in the middle)

3) Unspecified process: the 1st option of 1 option. (e.g. the one in the middle)

In all the cases, the best answer is "the one in the middle" which best corresponds to 50%, but the answers are not at all alike. The number of options I chose reflects my certainty or lack thereof.

Expand full comment

Your state space is probably too small for both the 2nd and 3rd options

Expand full comment

You can argue about the second question, but for the third question there should be only one option, since we have zero information.

This is related to Scott's much discussed habit of listing binary predictions with 50% probability. To me, saying that a (non-repeated) event has a 50% chance of occurring is identical to saying that you have zero information about that event, whereas Scott seems to clearly regard it as meaningful.

There must be a branch of mathematics or information theory that has clear and unambiguous answers, with an introductory textbook that clears up all confusion in the article and the comments, and I was expecting to find it in a comment (with a probability of 66.667%, or [probably not | *probably*]), but I haven't yet.

Expand full comment

You think there's only one possible way this could go? What way?

> Consider some object or process which might or might not be a coin - perhaps it’s a dice, or a roulette wheel, or a US presidential election. We divide its outcomes into two possible bins - evens vs. odds, reds vs. blacks, Democrats vs. Republicans - one of which I have arbitrarily designated “heads” and the other “tails” (you don’t get to know which side is which). It may or may not be fair. What’s the probability it comes out heads?

> To me, saying that a (non-repeated) event has a 50% chance of occurring is identical to saying that you have zero information about that event[...]

He literally says that in the post: https://open.substack.com/pub/astralcodexten/p/in-continued-defense-of-non-frequentist?r=9fl14&selection=30b6cc1a-5e60-458c-aaa1-a41f30465daf&utm_campaign=post-share-selection&utm_medium=web

Expand full comment

You got em Scott Alexander. You went out there and got em.

Expand full comment

great discussion. We use language ... it facilitates quite a lot, but we over ascribe certain powers and utilities to language. As Wittgenstein concluded at point, most of philosophy is a 'language game', and likely most other cherished beliefs are language games. Games are good, but they are not 'truths' of the universe. At best, means of organizing, conveying, and working with vague notions and data.

Expand full comment

Scott,

I think you need to watch “Dumb and Dumber” again; what is the semantic content of 1 in a million?

In a similar vein, do people really act in a way that one should take seriously their non zero predictions of disaster? What would a “rational” actor do if, for example, he or she thought there was a real chance of say being in a plane accident in the next Month? Would you expect to meet them at a conference overseas somewhere?

Expand full comment

How does sharing or researching a probability affect that probability? Everything we make a "thing" affects the quality of what we've "thinged." Human mind, inquiry, and dialogue seems to me to have a significant observer effect on itself. I guess this gets at the metaphysical aspect you mention?

Expand full comment

One of the biggest blindspots of Rationalism is the way that it pretends that minds are infinitely powerful computers gazing in at a universe they are not physically connected to. It's a useful abstraction in some domains, much like an assumption of rigid body classical physics, but it breaks down if you try to force it into a domain where those assumptions don't apply.

In the real world, a) people have limited time and memory and b) are physically embodied in the world and thus their thoughts and actions affect the world.

As an extreme case, suppose someone offers to bet against you on a Manifold market they created. There's *no* probability you can assign to the market because no matter which side you pick, they'll just resolve it the other way.

Expand full comment

Ben Goertzel took a stab a while back at turning the idea that you can have uncertainty in your probability estimates into an approach to computation: https://en.wikipedia.org/wiki/Probabilistic_logic_network

The gist is that rather than having a single number represent all the information you have about probabilities, you carry around probability ranges, and propagate them through belief networks through a fairly simple set of rules. The expectations might come out the same when looking at a 50% event whether you have a large or small interval around 50%, but other mechanics change as you add information or connect events together.

At least back then he was planning to make this the core of his approach to AI; I don't know if that has held up. It always seemed to me like an interesting idea, but I never quite dug deep enough to see if it was actually simplifying things enough to make it more useful than just carrying around full distributions and doing the Bayes thing properly (which can be computationally difficult when you're dealing with big networks).

Expand full comment

This is just Probabilistic Modeling, check out Jax

Expand full comment

One approach would be to recognize that there is a spectrum of kinds of uncertainties.

On the one side you have uncertainties in the territory of the form "I don't know if this radioactive nucleus will decay within the next half life". From out current physics understanding, even if you know all the physics there is, and the wave function of the universe, you will still be stuck making probabilistic predictions regarding quantifiable outcomes.

Then you have quantifiable uncertainties in the map like "we don't know the Higgs mass to arbitrary precision". At least to the degree that the standard model is true (spoiler: it is not), the Higgs mass has an exact value, which we can measure only imperfectly.

Before the Higgs was discovered, there was a different kind of uncertainty regarding it's existence. This is of course much harder to quantify. I guess you would have to have an assemble of theories which explain particle masses (with the Higgs mechanism being one of them) and penalize them by their complexity. Then guess if there are any short, elegant mechanisms which the theoreticians did not think of yet. (Of course, it might well be that an elegant mechanism reveals itself in the theory of everything, which we don't have yet.)

Both "Does the Higgs exist?" and "What is the exact mass of the measured Higgs?" are uncertainties which exist on our map, but I would argue that they are very different. The former is Knightian, the latter is well quantifiable.

For p(doom), there are two questions which are subject to Knightian uncertainty:

* How hard is it to create ASI?

* How hard is it to solve alignment before that?

It might be that we live in a world where ASI is practically impossible and AI will fizzle out. Or in a world where the present path of LLMs inevitably leads to ASI. Or in a world where getting to ASI by 2100 will require some "luck" or effort which mankind might or might not invest. p(ASI) is just the expected value over all these possible worlds. I would argue that the probability density function is a lot more interesting than the expected value.

In particular, I would expect that the regions around zero ("developing ASI is actually as hard as travelling to Andromeda") and one ("we will surely develop ASI unless we are wiped out by some other calamity in the next decade") have a significant fraction of the probability mass, with some more distributed in the middle region.

Expand full comment

You’re just pointing out that there is both model and data uncertainty.

Expand full comment

And intrinsic uncertainty in the territory, yes.

Of course, there is not always a clear dividing line between model and data uncertainty. I could treat two instances of standard model with a slightly different Higgs mass as two completely different model, or treat them as part of a continuous ensemble of models with a free parameter.

From my perspective, Bayes updates can not bridge multiple Kuhnian paradigm shifts. When Newton wrote down his theory of gravity and motion, nobody hat a prior which included General Relativity. (Of course, all models are wrong, but some are useful. So the prior should not be "Newtonian Mechanics are true" but "Newtonian Mechanics describe a certain class of observations to a certain level of precision". General Relativity would then encompass Newtonian gravity as the approximation for v<<c etc. Of course, nobody formalized the caveats to Newton before Einstein came along, so for me it seems more natural to see Newton as falsified rather than refined.)

Expand full comment

A Kuhnian paradigm shift is literally a Bayesian update over model priors by many agents.

Expand full comment

I think one thing that trips people up is "normal" versus "weird" distributions.

Normal:

I'm throwing darts at a board. I'm pretty good and hit the bullseye exactly 25% of the time. The remaining throws follow a normal distribution. You can bet that my average throw will land closer to the center than my twin, who hits the bullseye only 2% of the time.

Weird:

A robot throws darts at a board. It hits the bullseye exactly 25% of the time. Because of a strange programming error the other 75% of the time it throws a dart in a random direction.

If a prediction market gives a 60% chance of landing on Mars by 2050, some of the prediction follows a normal distribution. Eg, maybe there's a 50% chance by 2047, and a 67% chance by 2055. It's intuitive if there's a 60% chance of success, in the 40% of failures we should at least be close. But some of the "no" percentage follows a weird distribution. Eg, international nuclear conflict breaks out and extracurricular activities like space travel are put on indefinite leave.

I think weird outcomes lead to post hoc dismissal of prediction. If the dart thrower slips and throws into the ground, and we laugh at the 25% bullseye chance. If in 100 years all AI is chill and friendly we'll think that the 20% chance of existential threat we give it now is nonsense.

But weird probabilities don't work that way.

Expand full comment

Apologies if someone has already posted this, but I think it's a fun, illuminating graphic connecting language usage and probability.

https://github.com/zonination/perceptions

Expand full comment

The problem is that lots of Bayesianists like to pretend that the probabilities are somehow scientific or objectively true, but if you take the philosophical underpinnings of Bayesian probabilities seriously it's clear that they are purely subjective opinions, not capable of being true or false. For any information you have, you could always choose the probability you'd like, then calculate backwards to figure out what your prior must have been. This means that unless there is some objectively correct prior, and there isn't, *any* probability is just as valid as any other for any prediction and set of information. When Bayesians are pressed on this, they fall back to the frequentist justifications they supposedly reject, just like Scott did on the first point about Samotsvety.

Expand full comment

On another look, the fourth point about Samotsvety is explicitly frequentist as well. It undermines rather than supports Scott's point.

Expand full comment

It's only frequentist if you are modeling "Samotsvety makes a prediction" as a repeating event, and somehow that isn't a statement about how subjective probability can still have objective properties (like calibration or sensitivity).

I guess you can claim that discovery of those properties are "beholden to frequentism" but that's a concession to the audience of non Subjective Bayesianism believers, but imo the frequentist is far more embarrassed by this than the Bayesian.

If it *is* true that you can model predictions as a repeatable event, where does the "this only happens once and so probabilities are meaningless" objection go? I cannot see a way where you can claim both, or not have your philosophical claims collapse into "just" subjective Bayesianism.

Expand full comment

I don't think I agree, but I haven't quite worked out exactly where my disagreement lies. I just want to say that you've made an interesting point to think about and I probably shouldn't have been as combative as I was in my initial post.

Expand full comment

The typical statistical perspective on non-repeating events is to frame the problem in a super-population. In the context that of predictions that we seem to be in here, there are at least two ways to frame this.

1. As far as we can tell, randomness is fundamental to the structure of reality. In the many worlds interpretation, which is convenient here, the population is the set of realities in the future and the probability that Trump wins is the (weighted) proportion of those realities that contain a Trump victory. There is therefore a real P(Trump) out there, so it makes sense to talk about it. Forecasters are then tasked with estimating it (\hat{P}(Trump)).

I think it is okay to critique whether \hat{P} is a good estimate of P, but even if it isn't, it still tells you about what the forecaster thinks P is, which still at least tells you something about them. Thus, we should ask two questions of a forecast. The first is: What is \hat{P}? The second is: Is \hat{P} a good estimator (i.e. based on good information).

2. The second way to frame the super-population is to consider the set of forecasts that a forecaster makes. If they are well calibrated, then 17% of the time the forecaster gives a probability of 17%, the outcome will come out positive. That is to say that if we take the set of predictions a forecaster makes with probability 17%, 83% of those occasions will have a negative outcome and 17% will have a positive one.

When we encounter one of their predictions in the wild (say P(Trump)) we are essentially picking a prediction randomly from the superset of all their predictions, in which case the probability they give is literally the true probability that their prediction is correct, provided they are well calibrated.

----

Note that both (1) and (2) are frequentist probability interpretations and not subjective probabilities. In that regard, I'm not sure that you are giving a defense of "non-frequentist" probabilities as much as a defense of imperfect "probability estimates."

Expand full comment
founding
Mar 21·edited Mar 21

>In this case, demanding that people stop using probability and go back to saying things like “I think it’s moderately likely” is crippling their ability to communicate clearly for no reason.

I personally feel that in most cases, false precision is the greater danger. Some small number of fuzzy categories often work quite well for quick assessments --

https://en.wikipedia.org/wiki/Words_of_estimative_probability

Expand full comment

I *think* there's a bit of a stolen valor argument that is hidden behind some of the probability objections.

For example, if I don't encounter people talking about probabilities at all, *except* in the context of Nate silver, superforecasters, or hell even from fiction where probability givers are depicted as smart and "accurate", and then I encounter some random lesswronger saying something like "23% likely". I'm going to assume this internet person is trying to LARP one of those accurate forecasters, and thus come off as if they're calibrated at the sub 1% level, even though (objectively!) most lesswrongers actually suck at calibration and giving good estimates, because they don't practice! (This was true a couple of years back I think, from lesswrong annual survey results)

None of this means we should stop talking about probability, or that it's correct to assume this of normal people who speak in probabilities. But I think the way to combat this isn't to argue about probability definitions, but to sit on top of a giant utility pile from correct predictions, and look down at the blasphemer and say "how droll!".

Someone saying "you're being unnecessarily precise" and then being linked to a single person's set of one thousand well calibrated predictions, well, it's only mildly embarrassing if it happens once. But if entering rationalist spaces means that there is a wall of calibration, you have to be pretty shameless to continue saying things like that.

Of course, this should *only* be done if rationalists can exercise the virtue they claim to preach, and well, I think that's just hard, but I can dream.

Expand full comment

>Giving an integer percent probability gives you 100 options.

Don't you mean 101 options? :-)

Expand full comment

Excellent points. We would benefit from even more language to express degrees of certainty. I wish it were commonplace to add a formal confidence rating to one’s estimations. Lacking that, I’ll take the informal one.

That is, at least, until I run off and join the tribe that linguist Guy Deutscher describes in “Through the Language Glass,” whose epistemology of accounts is baked into their grammar. If you tell someone a bear has attacked the village, it is impossible to form the sentence without also disclosing whether your source is hearsay, evidence, or a brush with a bear.

(Every language requires some such disclosures, though most of them less obvious. In English I can tell you I spent all day talking to my neighbor, without ever disclosing his or her gender—but not in French!)

Expand full comment

Further tangling the probability/confidence issue is that humans have serious cognitive deficits around low-risk, high-stakes predictions.

If a night of carousing is 20% likely to earn me a hangover, I have to weigh the benefits (ranging from certain to barely extant) against the costs (same range). That is more axes than we know how to deal with. And they don’t end there…suppose instead I am incurring a 2% chance of 10 hangovers (consecutive or concurrent)? Should my decision be the same in both cases? What if it were a 0.02% chance of 1000 hangovers, e.g. a brain hemorrhage?

We divide by zero. We buy lottery tickets, then take up smoking and scrimp on flood insurance.

Expand full comment

Our whole criminal system is an attempt to leverage incomplete information by multiplying stakes. You probably won’t be caught shoplifting, but if you are, you’ll be expected to pay not just for the thing you stole, but also for the take of the last ten people.

There are important ways in which probability *is* a function of knowledge. If all you know is the proportion of each color of ball in the urn, chances might be 40%; if you are allowed a split-second peek inside before drawing, and you notice the top layer of balls is 90% a single color, the equation changes—and your preference now plays a role in the odds. (See also: https://en.m.wikipedia.org/wiki/Monty_Hall_problem)

Countries with the least capacity to enforce law and order, tend to do so with the harshest methods. If you can’t lower the risk of people doing bad stuff, you have to up the stakes for the people considering it.

Expand full comment

Also, the obvious: the 100 percent knowledge of something coming to pass (or not) collapses the waveform of the future into the event horizon past (mixaphysically speaking), simplifying probability to either 100% or 0%.

That is, if you witnessed the thing happen or not happen (or faithfully measured it in some other way). Even then, our senses could deceive us blah blah, but *if* we really did see it happen, *then* its probability of happening is 100 percent. Was it always 100 percent? If hard determinism holds, then yes; regardless (says the fatalist), it will *now* always have been going to happen. (Time travelers would have to have so many more verb tenses. What is that—imperfect subjunctive?)

Expand full comment

Our priors continually adjust to account for the things that might have killed us but didn’t, and we get more and more foolhardy. Or some unlikely disaster upended our life one fateful Tuesday, so now we never leave the house on Tuesdays. We regularly overlearn our lesson, or learn the wrong one.

All of which is to say (as you corollaried much more succinctly) : Probability *is definitely* a function of knowledge—but that just means our estimations have to be the beginning of a conversation, not the end of one.

Expand full comment

Aleatory probability and epistemic probability ARE conceptually different things. Yes, they follow the same math; yes, in practice you have both simultaneously; and yes, a lot of what is considered aleatory (rolling dice, to use the obvious example) is really epistemic (if you can measure the state of the dice precisely enough, you can predict the result ~perfectly); but they're different enough (Monty Hall is entirely epistemic, radioactive decay is ~entirely aleatory) that in most other contexts, they'd have different words.

Expand full comment

The thing about lightly-held probabilities sounded kind of off to me, because according to Bayes' Law, you're meant to always update when you learn new information, but the thing is you only update for new information, not information you already know, so if there's not much more you could learn about the event until it happens (like with the coin flip), that makes the probability strongly held, and if there is a lot of information that you haven't included in reaching your probability, then there are a lot of things still to update on and it's lightly held. This relates to my other comment on the precision of probailities. If there are a lot of facts available that you don't know, then your probability distribution on what your probability would be if you knew all the relevant facts will have a wide range.

The thing that makes a probability useful information is a combination of it being well-calibrated (sort of a bare minimum of being correct, an easy bar to pass as Scott points out), and high-information in the sense of information theory (high -p ln(p) - (1-p)ln(1-p), or roughly speaking, close to 0 or 1, so high confidence). The latter seems weirdly missing in the section on Samotsvety. Information in this technical sense is closely related to information in the normal sense. If you perform a Bayesian update on a probability, this always increases the information in expectation, and the only way to increase the amount of information in your probability while remaining well-calibrated is to find more relevant facts to update on. (That is, at least if we neglect computational limits. The case of bounded rationality is harder to reason about but I assume something similar is true there too.) Scoring functions in things like forecasting competitions will generally reward both good calibration and high confidence, because acheiving either one without the other is trivial but useless.

Expand full comment

Even needing to ask the questions "What Is Samotsvety Better Than You At?" and "Do People Use Probability 'As A Substitute For Reasoning'?" points to how inscrutable forecasts are, especially crowd forecasts as Scott points out.

Whether the rationale is weak or formal as Scott describes it, we'd all be better off if we had all of them, to evaluate alongside the forecast.

Expand full comment

That schtick of Yglesias’s - I take your word for it, not being a reader - is funny because he sure took a different tack with his book title.

Expand full comment

This post feels strange to me because while you argue a Bayesian viewpoint, the examples you argue against point to an even more Bayesian approach!

For Bayesian inference, it makes a lot of difference whether you think a coin has 50% chance to come up heads because it looks kind of symmetric to you, or because you've flipped it 10 times and it came up heads 5 times. It matters because it tells you what to think when a new observation comes in. Do you update a lot, to 2/3 or just a little - to 7/13 (i am using Laplace's rule here) ? It depends.

You ideally need to know the entire distribution, instead of a single number. Of course, a single number is better than no number, no arguing there, but the opponents do have a point that "how tightly you hold this belief" is an important piece of information.

Expand full comment

> For example, you might think about something for hundreds of hours, make models, consider all the different arguments, and then decide it’s extremely unlikely, maybe only 1%. Then you would say “my probability of this happening is 1%”. ... Sometimes this reasoning process is very weak. You might have never thought about something before, but when someone demands an answer, you say “I don’t know, seems unlikely, maybe 1%”

That's exactly the problem: when you say "the probability of X is 1%", people tend to assume the former process, whereas in reality you employed the latter one. It is also possible that you've thought about everything extremely carefully, took notice of every contingency and calculated all the numbers precisely... but your entire calculation rests on the core assumption that the Moon is made out of green cheese, so your results aren't worth much.

But a single number like "1%" doesn't contain enough information to make a distinction between these scenarios. It should not confer any more gravitas and authority on the predictor than merely saying "meh, I don't think it's likely but whatevs" -- not by itself. But because it is a number, and it sounds all scientific-like, it *does* tend to confer unwarranted authority, and that's what people (at least some of them) are upset about -- not the mere existence of probabilities as a mathematical tool.

Expand full comment

If the people who get upset about 1% sounding scientific-like were to read the post and adopt Scott Alexander's point of view, would you have any remaining objection?

Expand full comment

If Scott Alexander's point of view is something like, "numeric probability estimates are just expressions of your subjective certainty and do not necessarily indicate any level of intellectual or evidential rigor beyound that", and if most people internalized this concept, then yes, I'd be pretty happy.

Expand full comment

I think metaphysical determinism is true, so from that POV I don't think there ever was a "chance" that an event that obtained might NOT have obtained, that's sort of meaningless to even say. Either there was sufficient cause, or there was not, and if sufficient cause existed the event had to obtain. If a person's statement "this is 40% likely to happen" meant "there are some large number of possible universes that could exist, or in some way actually do exist, and in 40% of them Event X will obtain", I'd have a pretty serious problem with *that* claim, and some of the wilder AI discourse on LW veers into those types of fantasyland. But this isn't what's going on with most AI p(doom) discourse.

Most people without thinking too much about this will just translate percentages into a statement about your epistemic limitations in one way or another. Sometimes 40% means "I expect this not to happen, but I have pretty low confidence in that" and sometimes 40% means "this scenario has been designed to produce Event X approximately 40% of the time, within epistemic limits that cannot be overcome", and those are pretty different but in common everyday encounters humans switch between those modes all the time. I've had to give speeches sort of like this during jury selection, to get people to understand what "reasonable doubt" means, to get them to switch from thinking in percentages to thinking about the trial as accumulating enough knowledge to be confident in a conclusion (although the courts in some states don't like you explaining this.)

Expand full comment

>Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

Importantly, I think this section is making a normative claim rather than a descriptive claim. And if that wasn't obvious to some, that may be part of what divides the people arguing about that online.

Which is to say: I agree, it would be best if both 'I think pretty likely not' and 'I assign 15% probability' both said nothing at all about your level of information, and no on expected them to.

However, my strong impression is that *in the current reality*, most people do use contextual phrasing to convey something about their level of information when making an estimate.

The way most people actually speak and interpret speech, someone who says '23.5%' is implying they have more information than someone who says 'around 25%', who is implying they have more information than someone who says 'somewhat unlikely'.

Information is often conveyed through that type of rhetorical context rather than directly in words, and I think people who infer that someone is trying to convey that information based on teh way they report their estimates is not wrong most of the time.

(think about movie characters who give precise percents on predictions of things that will happen, they are supposed to be conveying that they are geniuses or masterminds who have used lots of information to precisely calculate things, not trying to convey that they stylistically prefer talking in numbers but have no other information advantage)

Now, again, I agree that it would be better if people didn't use that particular channel of communication that way, because you're correct that everyone talking in percentages would ussually be better, and for that to work it has to be actually disjoint from level of knowledge.

But again, it needs to be recognized that this is a normative claim about how people *should* communicate.

I think it fails as a descriptive claim about how people *do* communicate.

Expand full comment

To steelman this a bit, I think that similar to Learned Epistemological Helplessness, this is a case where people's natural reactions are an evolved response to issues not captured in Spherical Cow Rationalism.

Reducing things to a single number will often be more misleading than illuminating. For example, suppose I made a Manifold market called "How will I resolve this market?", and ask you to estimate the probability and then bet on it. And then no matter what you choose, I'll bet the other way and then resolve it in my favor. **There is no probability you could offer that would actually be meaningful here**

Expand full comment

A useful trick to distinguish a coin you are pretty sure is fair Vs a coin that might be loaded by any amount, which has big implications as to how much you update your believe after seeing outcomes:

Consider that Probability(tails)=s, where s is a parameter whose value you might be uncertain about.

In the first case Prob s is (close to) 0.5 is (almost) 1, while in the second case, ProbDensity(s=x) = 1 for x in [0, 1].

In the second case, Prob(tails) = \int Prob(tails given s=x ) * ProbDensity(s=x) dx = \int x * 1*dx = 1/2

There is, however, a large difference in how you update your believe about s (and therefore about P(tails) when you observe coin outcomes. In the 1st case you (almost) don't update (if you see 3 tails in a row, you still say P(tails)=0.5). In the 2nd you update a lot (if you see 3 tails in a row, you say P(tails)=0.7 or something that can be calculated).

Expand full comment
Mar 21·edited Mar 21

Try this argument on for size (NB, I posted this argument as a response further down the list, but I wanted to present it to the general audience of this comments section with it a little more of it fleshed out).

I view the problem as involving the timelines of possible universes going forward from the present — and when it comes to the Mars example, what is the set of possible universes that will *prevent* us from getting to Mars in 2050 vs the set that will *allow* us to get to Mars in 2050? I think we can agree that there is an infinity of universes for the former, but a smaller infinity of universes for the latter. After all, the set of universes where the sequence of events happen in the proper order to get us to Mars would be smaller than the set of universes where the sequence of events didn't happen (or couldn't happen).

If these were sets of numbers we could create a bijection (a one-to-one correspondence) between their elements. For instance, the infinite sets of universes going forward where a coin flip was heads vs tails, would be equal. The subsequent branchings from each of those sides of the flip would be infinite, but their infinities would be equal in size. Likewise, before we cast a die, the universes going forward from a roll of a die where a six will come face-up would be one-sixth the size of the infinity of universes going forward where it doesn't. However, the set of universes where I don't bother to pick up the die to roll it is larger than the ones in which I do. But I can't make a bijection of the infinitude of universes where I don't roll the die vs the infinitude of universes where I do and I get a specific result

Likewise, no such comparison is possible between the two sets of where we get to Mars and where we don't get to Mars, and the only thing we can say with surety about them is that they don't have the same cardinality. Trying to calculate the proportionality of the two sets would be impossible, so determining the probability of universes where we get to Mars in 2050 and universes that don't get us to Mars in 2050 would be a nonsensical question.

I don't intend to die on this hill, though. Feel free to shoot my arguments down. ;-)

Expand full comment

I would have expected more focus on the capitalist interpretation of probability as representing the odds you'd be willing to take on a bet. Which is not really about the money, but instead a very effective way to signal how your beliefs about the event will influence your future behavior. If you force me to bet on a coin toss, then seeing which side I bet on tells you which side I consider not-less likely. If you vary the odds you can see exactly how much more likely I consider that side. Once you know that you'll also have a sense about how I'm likely to prepare for future events that depend on the outcome of the coin toss.

Expand full comment

Jaynes book "Probability Theory, the logic of science" makes a very good description of treating probabilities as measures of state of belief. This is usually described as Bayesian statistics.

The wiki page on "Bayesian probability is also pretty informative.

Savage also has good work showing that, under a suitable definition of rationality, peoples choices maximice expected utility over subjective probability distributions

https://en.wikipedia.org/wiki/Subjective_expected_utility

Expand full comment

>Does drinking monkey blood cause autism? Also no.

That's a relief.

Expand full comment

Often I find myself reading an ACX post where Scott wrestles with a problem that exists because he is too charitable. There's usually about 300 comments already by the time I get to it, and none of them will notice this. He himself will never read my comment, and if he did I'm not at all sure I'd want him to be less charitable in the future. Obviously there are writers I could prioritize that have different ratios of genius and charity, so my consistently reading Scott isn't an accident.

But seriously, come on now. Most people are really terrible at predicting anything and would do worse than random chance in a forecasting contest. I'm probably in this group myself. Anyone who criticizes using probabilities in this way is very obviously also in this group, and should be assumed to have no valid opinion. Being good at using probabilities is table stakes here. I try very hard to avoid using probabilities because I know I shouldn't. In domains in which I am actually an expert I will sometimes say "probably 80% of the time it's this way but that's just pareto; also i'm biased".

tl:dr: the only good response to "you shouldn't use probabilities this way" is "you're right that -you- shouldn't, yes."

Expand full comment

Well I for one actually won a forecasting contest and I think that Scott's post has numerous issues, as pointed out by the other commenters here.

Expand full comment

Ah excellent. I hadn't expected the 1% of disagreers to discover my comment -- but then of course I've already stipulated I'm not good at predicting things.

[note: the 1% in my comment is a reference to the occupy wall street slogan, not an attempt to estimate what percentage of people who have opinions about non-frequentist probabilities are actual superforecasters.]

Expand full comment

Another point I would make is that any use of probability for any real-world situation actually still requires making unprovable assumptions about the future. I think there's an illusion of mathematical rigor when you model coin flips and dice rolls with nice-looking distributions, but when you use the past flips of a coin to say how likely it is to come up heads on the next flip, you are making an assumption that the previous flips and future flips have a very specific relationship. There might be reasons why you think this--but these are real-world, empirical reasons like "I don't think the coin flipper is manipulating the outcome" and are not really different from "last presidential election and next presidential election have related dynamics, a similar voting populace, etc."

Obviously coin flips *feel* like they're necessarily from a consistent data generating process, while elections feel like they may not be, but this is not a mathematical justification for treating them differently.

To steal an example from 3blue1brown, consider the following sequence of questions:

1. I roll a d6. What is the expected value of the result?

2. I flip a coin and based on the result, roll a d6 or d12. What is the expected value of the result?

3. I select some number of dice at random from a d4, d6, d8, d10, d12, and d20, then roll them and add up the results. What is the expected value of this sum?

4. I have a collection of unknown numbers of each of the 6 dice above, and select 5 of them. What is E(sum of rolls)?

5. Same as 4, but I select anywhere from 1 to 10 dice.

6. Same as 5, but I select an unknown number of dice.

7. I have a box containing some number of dice, of unknown number of sides, with unknown values on the sides. What is E(sum of rolls)?

At what point do we go from "nice well defined problem where probability can be applied" to "non-repeating, hard-to-model event where it's irresponsible to give a probability"? You can of course keep going and add more and more fine-grained levels of uncertainty to this sequence of problems, which all just involve the repeatable well-defined process of dice rolling. There's no real distinction between "situations with well-defined distributions (so you can use probability)" and "situations without (so you can't)."

Expand full comment

This is such a useless piece that I can’t fathom the sort of person it’s useful for.

Expand full comment

>>> If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%

This is not true. You should read about the False Confidence Theorem and imprecise probability.

Expand full comment

When the outcome is either 1 or 0, it's better to just say, "I believe X will happen with a Y level of confidence".

Expand full comment

As a commenter I'm also cursed to return to the same response again and again:

Probabilities relate to probability models.

Heavily simplified:

Probabilities work fairly fine if you know already know all the possibilities of what might be true and the logical relations between those possibilities (for example, there is no relevant math question you are unsure about and so on). In that case you just need to assign probabilities to all the known possibilities and then you can plug them into Bayes' theorem and other cool formulae. Also then (and only then) there are cool mathematical guarantees that your initially arbitrary probabilities will come ever closer to the truth as new evidence comes in.

On the other hand probabilistic thinking leads to absolute trainwrecks if there are relevant possibilities you didn't think of at all, that is if there are "unknown unknowns" or, worse, mathematical facts you weren't aware of.

Using probabilities should relate to your level of information because they are useful if your level of information is high and useless if it is low in the sense of your uncertainty being dominated by unknown unknowns. For example, it is normally not a great idea to make bets with experts on things you have no clue about, they will just take your money. (Yeah, you could carefully examine the experts opinions and then adjust for expected overconfidence and salience bias, but that is actually another example of my point: You need to get to fairly high information before probability stops being counterproductive).

And yes, if you use probabilities outside their domain of usefulness that will be a substitute for thinking. You will be plugging numbers into Bayes' theorem which feels very math-y and sophisticated, just like it feels like learning to do more of the math problems you already understand rather than face the confusion of the hard ones.

Expand full comment

If this is all true, how is it that one of the commonalities that superforecasters share with each other is knowledge of probability theory and their claims to use them? Is it really more likely that some people claim that they use something when they actually use that way of thinking as a substitute for reasoning, so these people are worse off than than those who don't know, and somehow still have higher accuracy than those who don't?

That sure seems unlikely to the point of cope.

The fact that better forecasters actually do use the frame of probability theory screens off this type of reasoning, and the listed experience (of feeling the world is easier) is not an accurate characterization of how I see rationalists reason (which is closer to analysis paralysis of not understanding what Nth order effects should do to your beliefs).

Expand full comment

Note that I didn't say probability is always useless for all purpuses, contrarywise I made a distinction about when it is/is not useful.

Superforcasters are basically an institutionalized form of my parenthetical prebuttal ("Yeah, you could carefully examine the experts opinions and then adjust for expected overconfidence and salience bias..."). That is in fact valuable but it is also a refinement of subject-matter experts' work. Also, the demonstrated successes of superforcasters are all fairly close to the domain of well-understood sample spaces and it is not reasonable to expect that to generalize such successes to things like AI-doom, God, the MWI, &ct.

Again note I didn't say superforcasters or probability theory are generally useless and refuting that is not an argument against my point. As for the point of cope: rubber, glue.

Expand full comment

> Again note I didn't say superforcasters or probability theory are generally useless and refuting that is not an argument against my point.

It seems to me that posting this, in contradiction to a post claiming that probability theory is useful to talk in the context of geopolitical events or forecasting certain human advancements, and then claiming that "it would lead to trainwrecks if you think probabilistically and there are unknown unknowns", means that you think that probability theory is useless for predicting geopolitical events. Since this is in fact a common objection against using forecasts, I think it was a reasonable interpretation for me to have. I also think that if anything were to have unknown unknowns, it'd be geopolitical events.

Well now I'm not sure what definition of "unknown unknowns" would both

1. Meaningfully explain the success of superforecasters

2. And yet still serve as a useful comment by predicting when probabilistic thinking helps or hurts.

Because it sure seems you're drawing an unprincipled line, post hoc, to explain why your theory is actually pretty good, despite being pretty bad.

I do not think you would have held this belief and also, ahead of the good judgement project existing, said that superforecasters were exploring a fairly well understood space. Would you have? If so, what definition of unknown unknowns would you have used, without knowledge that geopolitics was "a well understood domain"? I'm willing to take your word at this, but I can't imagine this happened.

Otherwise, you're saying "bad guesses are bad because they are bad", which is obviously going to be true? I don't think any system of thought handles unknown unknowns well, which makes your point about them vacuous! What concrete bad actions occur when doing probabilistic thinking in information poor environments, that wouldn't happen if you were doing something else?

Expand full comment

While he does talk about geopolitical events in the justification, the probability Scott is actually trying to save is p(AI Doom). And that actually is a meaningless question.

As Scott and I both noted this happens to be a (depending on your mood) classic or sisyphean point of contention, so actually I just so happen to have an ancient blog post explaining exactly what I think about it: https://last-conformer.net/2015/08/31/the-problem-with-probabililities-without-models/

Expand full comment
Mar 23·edited Mar 23

I had to reread your blog post several times to understand how it was germane to the discussion, as far as I can tell you say something like:

1. Probability theory is a branch of math, which means it relies on a bunch of axioms. (I agree)

2. One of those axioms is that your model includes every possibility. (Okay, fair enough, since probability is essentially a counting game, if you don't count it, you're no longer playing the same game.

3. Now think about a situation where you don't consider something. (Okay, I expect the model not to work as well then)

4. THEN the situation exactly works out such that the thing you don't consider exactly happened (WTF!!! WHY WOULD THIS BE RELEVANT AT ALL)

5. You sure look pretty stupid here if you said something wrong beforehand.

6. Therefore probability theory is worthless when your model is impoverished.

7. (You don't say this but) And models being impoverished happens semi frequently and can be predicted before hand.

Why am I adding 7 even though you don't say this? Because if 7 WEREN'T true, why should anyone care? (This is my point 2 from before). I think your point 4 attempts to silently gloss over the *incredibly important question* of how common this situation actually appears, and if your ability to tell if probabilities are worthwhile is a real thing or just a post hoc belief, where you can insult every single prediction you don't like, and then when it's shown your dislike was misplaced, you can just claim that "they had a well behaved model" after.

Consider the analogous argument against using addition in constructing buildings:

1. Addition assumes that whatever you are modeling obeys the laws of addition, such that it doesn't matter in what order you combine them, or which ones comes first gets you the result accurately.

2. One of those axioms is that adding things together does not cause the subcomponents to change.

3. Now consider the situation where addition doesn't work, for example, you add two components together and they react to expand

4. You build something where components reacted and they expanded

5. Your "addition" sure looks stupid then.

6. Therefore you should not use addition when building things that expand.

7. And therefore addition is a bad tool, because this type of situation happens often.

Like, this is a bad argument? I can just choose to not use components that react, or add in whatever the expansion length is after? Similarly, if I was wrong about the extent of my model, I can choose to make more circumscribed predictions or incorporate more into my model. The argument only has force if you can prove that these "obvious" interventions aren't obvious, or in practice aren't done at all, or there exists some alternative that is strictly superior. Instead of assuming that the opposing position is wrong.

If I SQUINT, I see the following possible more reasonable argument:

1. Insofar as predictions are good, you can say it's because of probability theory and having a good model.

2. However, looking at the massive amount of restrictions that probability theory has, I'm pretty sure the ratio is something more like 90% of the goodness is from having domain knowledge and good models and 10% probability theory.

3. Therefore, emphasizing probabilities is putting the cart before the horse.

4. And since no one can get good domain knowledge about things like AGI, or interpretations of quantum mechanics, Scott+ rationalists are trying to bamboozle you into thinking it's 90/10 the other way around, when in reality they get minimal gains.

And this is what I'm trying to respond to. See: https://www.astralcodexten.com/p/in-continued-defense-of-non-frequentist/comment/52249226

Superforecasters in fact do better than domain experts! They do better than analysts with private information! And they out perform them, in my view by having a better understanding of probability theory! That seems like a direct answer to the thesis of your comment and blogpost.

Expand full comment

For your point 2, I'm not sure if you may be interpreting "every possibility" as "every outcome I care about". Technically you can do that, you can say my sample space has two events, either AI-doom happens or it doesn't so I'll assign p to event 1 and 1-p to event 2, done. Mathematically I have no complaint with that, for any p between 0 and 1 that is a valid probability model, just a practically useless one.

But that is not what Samotsvety is doing, they will also be thinking about a lot of things they aren't directly trying to predict. Like (I'm pulling the example out of my backside here, no idea if they actually tried to predict that) if they want to give a probability for the Crimean Bridge still standing 5 years from now, they would probably be thinking about what ways it could be attacked and how likely those ways are to succeed, how the war might go, how likely western countries might be to give long range weapons to Ukraine, how likely Ukraine would be to aim them at the Crimean Bridge and how likely such weapons are to destroy a bridge, about possible regime change in either country and how it would effect the result and so on. And then they would try if probabilities for some of the relevant factors can be guessed from base rates, do a lot of conditional probability calculations and so on.

At this point the "events" of the relevant implicit model are something more like all possible worlds with regard to all the relevant factors. (Event is a mathematical term here, in very simple cases like dice-throwing events are actually "things that can happen" but unfortunately the abstract mathematical meaning disconnects from a naive intuition if conditional probability is involved.) So in response to your point 7, for interesting models the rate of them being (in your terminology, which I will steal for the rest of this comment) "impoverished" is basically indistinguishable from 100%.

Now here's the thing: You don't need your point 4 for that to be a problem. Sure, if something you hadn't thought about happens, that is a fairly obvious failure mode. But the problem goes much deeper: You have a lot of helpful math in probability, like the Rev. T. Bayes' theorem, getting closer to truth as you process evidence, Aumann agreement &ct. If you model is impoverished and that impoverishment is at all relevant to your reasoning process all of that goes down the toilet. This is the point of my Alien vice laser example of section IV in the old blog post: The aliens don't directly care about what glasses are good for, that is just one way possible worlds can vary in ways relevant to their reasoning process. But now if they go on doing perfectly good Bayesian reasoning, calculating base rates for accidents with or without glasses, reasoning to conditional probabilities, maybe adjusting for known confounders like driver age and car build and all the other good math, it doesn't bring them any closer to the truth. If they collect more evidence it doesn't help either, as long as they don't fix the model all their probability calculations will be worthless.

And that's a general point, probabilities become useless if the model fails to account for something relevant. And there is absolutely no guarantee or rule that small modeling errors would only make small differences in the calculated probability.

So basically there are always two components of your uncertainty: 1) The part you do probability calculus on (the "known unknowns") and 2) the possibility that your model might be relevantly misspecified (the "unknown unknowns").

Now for many purposes it is helpful to assume that unknown unknowns won't matter. (And no, assigning a small probability to them mattering doesn't work, at least not if you want to have useful conditional probabilities. That part of your risk-judgment is always non-probabilistic.) For example, insurance mostly works by trusting in probability models and that is a good thing. Still, when insurances go bankrupt it is usually not because the situation they calculated to be once-in a century actually happened, but because their model was wrong. But generally we sort of feel (in a totally non-probabilistic ways) the known unknowns are the main show for most insurances, which can also be phrased as "we trust in the model" or "we are comfortable modelling this problematically".

By and large I'm ok with saying uncertainty in geopolitical events is dominated by known unknowns. Unforeseen things mattering a lot more than anybody thought is a theme in history, but not the main theme, so yeah good chance (but not in a probabilistic sense!) of getting away with ignoring that possibility.

On the other hand, suppose a time-traveler thought Aristotle about modern probability theory (and mathematical prerequisites) but not about the rest of modern science and technology. And then he asked him to guess the population of some cities in 2024. Aristotle would then duly do the Samotsvety thing and probably have fun doing it, but he would still not think of things like a modern division of labor, the invention of underground sewer systems, or the abolition of slavery. So he would basically be multiplying probabilities of totally irrelevant scenarios and coming up with a completely bullshit distribution going to epsilon somewhere around a hundred thousand inhabitants. That's because in his situation the unknown unknowns would be so much more important than the known ones that the probability calculations wouldn't add any value at all. Fictional Aristotle would be less wrong if he just refused to give any probability distribution.

And presently we know about as much about ML and AI as Aristotle knew about big cities. (To be fair, a lot more than nothing, Aristotle wrote books about cities!) So we should take a page from fictional Aristotle and admit naming number and calling it p(AI-doom) is not a helpful exercise.

Expand full comment

Btw does anyone know anything about the ACX prediction contest emails? I haven't gotten mine. Should I have more patience or I misremembered submitting my predictions?

Expand full comment

EPrime anyone?

Expand full comment
Mar 21·edited Mar 21

> If someone wants to know how much evidence/certainty is behind my “no”, they can ask, and I’ll tell them.

> But it’s your job to ask each person how much thought they put in

I really, really want there to be clear numerical ways to capture this. Giving the full probability distribution (or even, to a lesser degree, summarizing it with variance or standard deviation) captures *some* of it, eg it would help with expressing your certainty about vaccines-cause-autism vs monkey-blood-causes-autism. But that doesn't do anything to capture the top-philosopher vs random-teenager case.

This comes up a lot for me when thinking about doing (mostly informal) Bayesian updating. It strongly seems as though I should update more on the top-philosopher's answer than on the random-teenager's answer, but it's not clear to me how (if at all) the difference is represented in Bayes' theorem.

This may just be my own ignorance, and I'd really love for someone to correct me! But if not, it seems like a failing of how we think about probability.

Expand full comment

Similarly with the essay's later use of probabilities being more or less 'lightly held'. This seems to be crying out for having a principled way to talk about *how* lightly held a probability is (perhaps with a range between 0 and 100, or 0 and 1).

Expand full comment

In theory we should do it like ET Jaynes writes conditional probability, where he always writes P(A|BC) instead of P(A|B) where C represents existing knowledge about the world, and B represents the specific update you are receiving.

When we ask for raw probability estimates, we're asking P(A), no mention of B or C at all! When Scott says "it's your job to ask" he's saying something like "ask someone to decompose as many of their Bs or Cs as possible".

As an aside, I think you still need to be careful in something like the philosopher vs teenager case btw, because to a large extent your ideas about who is correct rests on assumptions about the correctness of the arguments they would make. So for example, if you take both a professor and a teenager, your expectation is that the professor would say more correct things than the teenager does, but if you start hearing the professor start saying obviously false things, and the teenager say obviously true things well 1. You should check there isn't something like a halo effect or some other cognitive bias that makes you think this and then 2. Their backgrounds should no longer influence your judgment. It's very tempting to let it do so, because status!

Expand full comment

One of these years I just really need to set aside the time and energy to read Jaynes. Thanks!

Expand full comment

Hold on a damn second here! I fear you may be making the mistake of thinking I have read Jaynes at length, and that shall not do! I am much dumber and lazier than that!

Mostly owe my insight to this LW post summarizing Jaynes:

https://www.lesswrong.com/posts/KN3BYDkWei9ADXnBy/e-t-jaynes-probability-theory-the-logic-of-science-i

Or, to say the above in a less weird and elself effacing way: Hope this helps.

Expand full comment

<3

Expand full comment

I think an analogous argument happens regarding using cost-benefit analysis to quantatively evaluate projects or policies to implement. For example, do we build a new bridge, or regulate some kind of industrial pollution.

A first school of thought says "we should try and estimate all the the costs and benefits using some economic model and proceed with the project if it exceeds some benefit to cost ratio threshold"

A second school counters: "we don't actually know the true costs and benefits, you're just modelling or guessing them, plus you can't count environmental benefits and time savings and regulatory compliance costs and safety benefits etc all together. These are categorically different or unquantifiable and your benefit-to-cost ratio is meaningless."

But like with probabilities - where does the second school leave us to judge how we should proceed?

Similarly to calling a probability 'very unlikely', we're left with vague qualitative statements like: 'it's absolutely essential', or 'it's very costly for not much gain', or 'there are both significant costs and benefits'.

In both cases I believe the number is telling us something useful - even if we concede that there are inaccuracies and biases that come with estimating. And if one disagrees with the number, they can go ahead and critique the reasoning in it or undertake their own analysis.

Expand full comment

« I agree that, because of the thorniness of the question, probabilities about AI are more lightly held than probabilities about Mars or impeachment »

It’s not clear what you mean by lightly-held to me, and the most likely possibility IMO is that of the meta probability, which you rejected earlier.

Expand full comment

One thing that bothers me with using probabilities in casual conversation is that they are opaque conclusions, not arguments. Scott writes long, evidence-based blog posts. I wouldn't find a tweet saying "To sum up, I think my p(doom) is 20%" to be an adequate substitute for a blog post. An argument can be built on stories about things I'd never heard of before, which I often find interesting even when I disagree with the conclusion.

But that doesn't mean asking for a p(doom) as an opening gambit in a conversation is necessarily bad. The question is, where do you go from there? Do you have any interesting evidence to share, or is just another unadorned opinion?

Surveys and elections and prediction markets have similar problems, but in aggregate. Each data point is an opaque opinion, and we can't go back to ask someone what they really meant when they picked choice C. (Maybe survey questions should have a comment box where people can explain what they meant, if they care to? It seems like it would be useful for debugging.)

But then again, these things happen in a larger context. An election is not just about voting. It's also about the millions of conversations and many thousands of articles written about the election. I believe prediction markets often have conversations alongside them too? It can be pretty repetitive, but there might be some pretty good arguments in there.

I wonder if they could be combined somehow. Suppose that, when voting for a candidate, it was also a vote for a particular argument that you found convincing? There might be a lot of "concurring opinions" to pick from, but knowing which arguments people liked the most would give us better insight into what people think.

(There is a privacy issue to work around, since the concurring opinion you pick might be identifying.)

Expand full comment

Don't see any comments about the difference between stochastic and epistemic uncertainty, but I believe it's a large part of this debate. Stochastic uncertainty is like the uncertainty around rolling a dice, epistemic uncertainty is about not knowing how many sides the dice has.

Perhaps we need norms around communicating both. For example, I'm 50% sure Biden will win the election and my epistemic uncertainty is low (meaning it would take very strong evidence to change my probability significantly). I'm also about 50% sure it will be sunny tomorrow, but my epistemic uncertainty is high because any piece of evidence could cause my estimate to vary widely.

Expand full comment

I would add to your list of reasons why Samotsvety's "probabilities" of unique events should be called probabilities. Apart from calibration properties you already mentioned, they behave like probabilities in other respects, for example

- they are numbers between 0 and 1

- behave as probabilities under logical operations, such as negation, AND, and OR, for example p(not(A)) = 1 - p(A)

- behave like probabilities in conditioning on events.

It would be weird if humanity did not have a word for sets of numbers with these properties and this word happens to be "probability distribution" for the full set and "probability" for individual numbers in the set.

Expand full comment

This is all wrong. Prices replace probability and trading replaces forecasting when faced with single shot events. Markets are the technological solution to mediating contingency when the notion of possibility is incoherent.

Expand full comment

I find the beta distribution a really nice intuition pump for this sort of thing. If one person says 50% and their beta distribution is beta(1, 1), then their 50% doesn't really mean much. But if someone says 50% and their beta distribution is beta(1000, 1000), then that's much more meaningful!

Expand full comment

Pretty sure that I agree with the defense given here, but felt some whiplash going from "Probabilities Are Linguistically Convenient" to "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To"

Wouldn't it be linguistically convenient for probabilities to describe your level of information?

Maybe not, if the disconnect revolves around a speaker's wish to have options and the listener's wish to be able to understand a speakers level of information when attempting to update their priors. Non-Experts hate this one weird trick.

Expand full comment

> Some people get really mad if you cite that Yoshua Bengio said the probability of AI causing a global catastrophe everybody is 20%. They might say “I have this whole argument for why it’s much lower, how dare you respond to an argument with a probability!” This is a type error. Saying “Yoshua Bengio’s p(doom) is 20%” is the same type as saying “Climatologists believe global warming is real”. If you give some long complicated argument against global warming, it’s perfectly fine to respond with “Okay, but climatologists have said global warming is definitely real, so I think you’re missing something”. That’s not an argument. It’s a pointer to the fact that climatologists have lots of arguments, and the fact that these arguments have convinced climatologists (who are domain experts) ought to be convincing to you. If you want to know why the climatologists think this, read their papers. Likewise, if you want to find out why Yosuha Bengio thinks there’s 20% chance of AI catastrophe, you should read his blog, or the papers he’s written, or listen to any of the interviews he’s given on the subject - not just say “Ha ha, some dumb people think probabilities are a substitute for thinking!”

But the thing that makes superforecasters good at their job (forecasting) IS NOT domain expertise! That was the major conclusion of Expert Political Judgement, and was reaffirmed in Superforecasting! The skill of "forecasting accurately" is about a specific set of forecasting-related skills, NOT domain expertise.

If Yoshua Bengio's arguments are presented to Samotsvety, I would trust their probability conclusion far, far more than Yoshua Bengio's. He may have domain expertise in AI, but he does not have domain expertise in forecasting. I remember reading an interview with a superforecaster, talking about professional virologists' forecasts during the early stages of Covid-19, where he would be baffled at them putting 5% odds on something that seemed to him to be way less than 1% probability, like results that would require the exponential growth rate to snap to zero without cause in the next month, well before vaccines. Numbers that, to a professional forecaster, are nonsense.

One of the main things that makes good forecasters good is that they understand what "20%" actually means, and answer the questions asked of them rather than semi-related questions or value judgements. e.g. "Will Hamas still exist in a year?" versus "Do I like Hamas?" or "Will Hamas return to political control of the Gaza strip in a year?" It is this capacity (among others), not domain expertise, that differentiates well-calibrated forecasters from poorly-calibrated ones.

Expand full comment

“Sometimes some client will ask Samotsvety for a prediction relative to their business, for example whether Joe Biden will get impeached, and they will give a number like “it’s 17% likely that this thing will happen”. This number has some valuable properties”

Does it really.

Apart from Samotsvety being a cool name for a something (as are most Russian names), 17 percent is one of a group of particularly weaselly forecasting-numbers. It is something that is more likely to happen than “is very unlikely to happen” (approx. 1-5%), but it is less likely to happen than “is unlikely to happen but I would not at all rule it out” (less than 40%).

The problem with such numbers relates to the ability/inability of a critic to say after the fact has happened: “hey you were quite far off the mark there, buddy.” Since with 17 percent, you have some degree of plausible deniability if Biden actually get impeached, and you are accused of being a bad forecaster: “I did not say it was very unlikely – I said it was 17 percent likely, implying that it actually had a non-negligible chance of happening”.

Related: When Trump was elected in 2016, people did not go “wow” on the forecasters who argued beforehand he had a 30 percent chance of being elected because these forecasters put the percentage at 30 percent. They went “wow” at those forecasters because almost everybody else put Trump winning as “very unlikely” (1-5 percent).

The really good forecasters, by the way, were those who put the likelihood of Trump winning at above 50 percent. They were the ones who took a real chance of risking to be falsified.

With “17 percent” you hedge your reputation as a good forecaster, with very little risk of being found out if you are the opposite.

…to be clear, I am talking of unique events that do not belong in a larger group of similar-type events. If you have 10.000 similar-type events, you can investigate if some outcome happens 17 percent of the time, while other outcomes happen 83 percent of the time, and use that to forecast what will likely happen at the 10.001 event. I assume here that “Biden being impeached” does not belong in such a larger group of similar-type events – implying that you cannot falsify a “17 percent probability” prediction by collecting a lot of similar-type events.

Expand full comment

If you have a proper scoring rule, then saying a number like 17% because you want to avoid losing a lot of points if “you’re wrong” is exactly what you should do.

https://en.wikipedia.org/wiki/Scoring_rule?wprov=sfti1

Expand full comment

Such rules make sense e.g. in meteorology, where you have times series data & the like of similar past events that can be used to make "real" predictions about future events, and/or to model the probability of future events.

Particularly likely to make sense in situations where the probability of future events is not dependent on human interference, since humans may otherwise react to predictions/forecasts by changing their behavior in ways that influence/negate the forecasted probabilities. (Again, meteorology is a good example where this possibility is not a problem, while finance - another area where such rules are used - is trickier.)

My concern is instead with such forecasting related to unique events, like impeaching Biden. (Assuming, at least for the sake of argument, that this is a unique event, not an event in a known series of impeachment-or-attempted-impeachment-of-presidents-events.)

If you think scoring rules also allows one to make sensible fine-grained (one single percentage point differences!) unique-event-predictions, I'd be interested in why & references to texts with some practical examples. (Not a rhetorical question, I stay open to the possibility.)

Expand full comment

I’m pretty sure it was Superforecasting that said that forecasting competitions and superforecasters use proper scoring rules. And as mentioned in the original post (and Superforecasting), superforecasters meaningfully do better with extra digits of precision.

It doesn’t matter if the event being predicted is a one-off or repeated. If a forecaster is graded using a proper scoring rule, then they will want to give predictions that are shmrobabilities to maximize their score.

I’m don’t know how forecasters deal with their predictions having influence over the events themselves. I’d be interested in learning more. A possibility would be to use conditional probabilities. “I predict X% = P(bad thing Z | policy Y) and W% = P(bad thing Z | not policy Y)” If people listen to the warning, then a different prediction is in effect. I don’t know if this is actually what they do.

Expand full comment

“I’m don’t know how forecasters deal with their predictions having influence over the events themselves. I’d be interested in learning more.”

This is the Big Remaining Ting to be fixed in my view, in order to move from rough predictions (which I concur are possible), to precise predictions (one-percentage-point predictions).

That is, the challenge is how to factor in how humans react to the predictions other humans do. Can that can be modelled with some degree of accuracy? Or do we have to throw up our hands and say that such situations become to complex: “we can only make crude-type highly/somewhat likely/unlikely predictions about events where hermeneutics enter the picture”. (Hermeneutics here meaning: how to get an accurate grip on how humans interpret and react to what other humans say and how other humans behave; the many signalling-and-screening games & arms races we all play & participate in. Making public predictions/forecasts is part of it.)

Clark Glymour, a philosopher I deeply respect, made a similar point in a 3:AM interview with Richard Marshall:

3:AM (Richard Marshall) :… do you think that in principle anyway your approach can handle any causal situation?

CG:….There are kinds of causal circumstances we don’t really have a good handle on. One is …. agent to agent causation as in social networks.

Link here, the whole interview is worth reading if you like Clark Glymour's (& Judea Pearl’s) take on philosophy/causation/probability/prediction:

https://www.3-16am.co.uk/articles/bayes-arrows

It is getting late in the evening in my time zone, and the wine in my glass is getting empty. For both reasons I guess I must end the conversation today. Thanks for your thoughts!

Expand full comment

It seems like a premise (or an effect, maybe?) of prediction markets is to create a way in which non-frequentist probabilities for different events can be compared. If you use your probability of each event to calculate expected values for bets, and you make bets, and you are able to successfully make money in the long term, doesn't that give some more objective meaning to these probabilities?

Expand full comment

It looks like section 4 is a response to my essay Probability Is Not A Substitute For Reasoning https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for). Or rather, it looks like a response to a different argument with the same title. It’s pretty unrelated to my argument so I was wondering if other people had picked up my title as a catchphrase, which can happen, but I can’t find anyone using the phrase except for me and you.

Anyway, you’ll note that my essay never says “don’t use probability to express deep uncertainty”, or anything like that. (I do it myself, when I’m speaking to people who understand that dialect.) Instead I'm objecting to a rhetorical dodge, where you'll make a clam, and I'll ask why I should think your claim is true, and then instead of giving me reasons you'll reply that your claim is a reflection of your probability distribution.

This is especially galling because most people who do this aren't actually using probability distributions (which, after all, is a lot of work). But even for the few who are using probability distributions for real, saying that your claim is the output of a probability distribution is different from giving a reason. In the motivating example (AI timelines), this happens largely because the reasons people could give for their opinions are very flimsy and don't stand up to scrutiny.

More at the link: https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for

Expand full comment

Do people use probabilities as a substitute for thinking? Yes: some people, some of the time. The fact that others don't , doesn't negate that. A stated probability estimate can be backed by a process of thought, but doesn't have to be.

If you have a subculture, "science", let's say, where probability estimates backed by nothing are regarded as worthless fluff, people are going to back their probability estimates with something. If you have a subculture, "Bayes", let's say, where probability swapping is regarded as intrinsically meaningful and worthwhile, you needn't bother.

Probabilities don't have to describe the speaker's state of information , but something needs to. If I am offered an opinion by some unknown person, it's worthless: likewise a probability estimate from someone in an unknown information state.

Expand full comment

These are all good points, but I don’t think this engages with the fundamental intuition that leads people to see “the odds of pulling a black ball is 45%” as different from “the odds humans will reach Mars by 2050 is 45%”. Which is that the former can be understood as an objective view of the situation, whereas the latter is a synthesis of know information.

Suppose we’re watching a power ball drawing with one minute to go. All the balls are spinning around chaotically in a grand show of randomness. Asked the probability of a 9 being the first number picked, you say 1 in 100. Shortly thereafter they announce the first number was 9. “Dumb luck” you think, but you play back the tape anyway and notice that when you made your prediction the 9 ball was actually wedged in the output area, making it all but certain that the 9 would come out first.

So were you correct in saying the probability was 1 in 100? Yes and no. You were correct in giving the odds based on the information you had, but if you had all the information you would have produced a different (and more accurate) prediction.

On the other hand, given 100 balls randomly distributed in an urn with 40 black, you can say objectively that the probability of pulling a black ball is 40%. This is because the unknown information is stipulated — there’s no real urn where I can point out that most of the black balls are on the bottom. To try to do so would be to violate a premise of the original question.

I think a big reason people dislike probabilities on real events is because they’re imagining that the prediction is supposed to be an objective description (“this is the actual probability, and god will roll a die”) rather than a synthesis of available information.

Expand full comment

Jaynes was shouting in my head the whole time reading this. The appendix of his "Probability Theory" mentions probability systems that don't have numeric values. Instead you can only compare events as being more or less likely (or unknown).

The punchline is that, in the limit of perfect information, these systems reduce to standard probability theory. In other words, it's always possible to just go ahead and assign probabilities from the get-go while remaining consistent, which seems like how a mathematician would write Scott's post.

Expand full comment

I completely agree with the case made here that it is useful and informative to convey actual numbers to express uncertainty over future outcomes, instead of vagaries like "probably".

That said, there's an adjacent mistaken belief that Scott is not promoting here but I think is widely-held enough to be worth rebuttal. The belief is that there is, for any given future event, a true probability of that event occurring, such that if person A says "it's 23.5!" and person B says "it's 34.7!", person A may be correct and person B may be incorrect.

Here's a brief sketch of why this is wrong. First, observe that you can have two teams of superforecasters assign probabilities to the same 10,000 events, and it's possible for both teams to assign substantially-different probabilities to each of those events, and both teams to still get perfect calibration scores. I won't attempt to prove this in this comment, but it's easy to prove it to yourself via a toy example.

Second, let's say these two teams of superforecasters, call them team A and team B, say that, at the present moment, the probability of human walking on Mars by 2050 is 23.5 and 34.7 respectively. Which team is correct? We can't judge them based on their past performance, because they have the same, perfect calibration score. How about by the result?

Well, let's say a human doesn't walk on Mars by 2050. Which team was right? There's a sense in which you can say team A was "less surprised" by that outcome, since they assigned a lower probability to a human walking on Mars. I think that's the intuition behind Brier scores. But they were still surprised by 23.5 percentage points! So it doesn't feel like this outcome can provide evidence that they were correct. In fact, Joe Idiot down the street, who isn't calibrated at all, said there was a 0.01% chance humans would walk on Mars, and I don't think we would want to say he was more correct than team A.

So it's pretty clear there is no knowably-objectively-correct probability for any given one-off event, outside of a toy system like a stipulated-fair coin (in the real world, are any coin flips actually fair?) That doesn't mean probabilities aren't useful as a means of communicating uncertainty, and it doesn't mean that we shouldn't use numbers vs English-language fuzziness, but I do think it's worth acknowledging that there isn't an underlying-fact-of-the-matter that we're trying to establish when we give competing probabilities.

Expand full comment

I think probabilities are well defined, if the question is unambiguous. By this I mean, when looking at almost any world and asking "did the event happen" there is a clear yes/no answer.

Questions to which you can't really assign probabilities.

"will misinformation be widespread in 2026?"

Questions you can assign a probability

"will the world misinformation organization publish a global misinformation level of more than 6.6 in 2026?"

But every time you make the question well defined, you risk that world misinformation organization going bankrupt and not publishing any stats.

Expand full comment

On the bright side, it's good that you keep having to explain this. It means that people who need to hear it are hearing it. Some of them are hearing it for the first time, which means you're reaching new people. It's a good sign, however frustrating it may be.

Expand full comment

I'll use section 2 as an opportunity to plug [my LessWrong post on unknown probabilities](https://www.lesswrong.com/posts/gJCxBXxxcYPhB2paQ/unknown-probabilities). We have a 50% probability that the biased coin comes up heads, while also having uncertainty over the probability we _would_ have, if we knew the bias. Personally I don't like talking about biased coins [because they don't exist](http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf), but I address in detail the case of a "possibly trick coin": it is either an ordinary fair coin, or a trick coin where both sides are heads.

Expand full comment

My main issue is that fractional probability estimates for one-off events are not falsifiable. Events either happen or they don't. If Joe Biden gets impeached, then the perfectly accurate prediction of his impeachment should have been 1. If he doesn't, then it should have been 0.

If I say that he will be impeached with a 17% probability and he does get impeached, well, I say, 17% is not 0. I was right. If he doesn't get impeached, well, I say, 17% is not 100%. I was right.

But if I predict that the L train will arrive on time 17% of the days, then post factum it can be said both whether I was accurate and how close I was to the accurate prediction (maybe, in reality, they arrive 16.7% or 95% of the days).

In the Samotsvety example, we are dealing with someone who predicts one-off events regularly. So, the probability we are talking about is not of the events happening but of Samotsvety's average performance over a series of events they predict. Basically, their numbers are bets which over the course of the prediction game spanning multiple years should yield the smallest difference from the de facto binary event probabilities when compared to other players.

If they say Joe Biden gets impeached with 17% probability, then in the case it actually happens they lose (100 - 17 = 83) points. If he doesn't get impeached they lose 17. The goal of the game is to lose the smallest possible number of points.

Thus, we are dealing with two fundamentally different numbers:

* expected frequency of a recurring event

* a bet on a one-off event in a series of predictions

We don't have a universally agreed upon way to distinguish them linguistically. I think many people grasp this intuitively and disagree to call both with the same word. Even though both are indeed "probabilities": subjective expectations of an uncertain event.

I would like to take historical performance of forecasting teams and round all of their predictions to 0 and 1. If this makes them lose more points than their original predictions then fractional bets make sense in the context of repeated one-off predictions. Otherwise, there's no practical use for them.

Expand full comment

A quick Google search didn't take me to where you discuss this fully:

"Do vaccines cause autism? No. Does drinking monkey blood cause autism? Also no. My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology."

Could you please point me (or us) to your full post(s) on this so we can check the autism~vaccines studies ourselves? With RFK Jr's rise, this issue will rear its head even further.

Expand full comment

I don't like the argument in the point 2.

There's a big difference between "I know the distribution very well and it so happens that the mean is 0.5" and "I don't know the distribution therefore I'll start with a conservative prior of equal probability on all outcome and it so happens mean is 0.5". The difference is in how you update on new information. In the first case you basically don't update at all on each subsequent sampling -- cause you sampled a lot before and you're confident about the distribution -- and in the second, each sampling should impact you heavily.

So again, for the first sampling both distributions give you best estimate of 0.5. But not for second etc.

Expand full comment
Mar 28·edited Mar 28

Eliezer wrote the post "When (Not) to Use Probabilities" which is mildly relevant here, at least as further reading.

https://www.lesswrong.com/posts/AJ9dX59QXokZb35fk/when-not-to-use-probabilities

Expand full comment