The funniest thing about those arguments you're rebutting is that the average of a large number of past 0-or-1 events is only an estimate of the probability of drawing a 1. In other words, the probabilities they're saying are the only ones that exist, are unknowable!

Okay, after years of this I think I have a better handle on what's going on. It's reasonable to pull probabilities because you can obviously perform an operation where something obviously isn't 1% and obviously isn't 99%, so then you're just 'arguing price.' On the other hand, it's reasonable for people to call this out as secretly rolling in reference class stuff and *not* having done the requisite moves to do more complex reasoning about probabilities, namely, defining the set of counterfactuals you are reasoning about and their assumptions and performing the costly cognitive operations of reasoning through those counterfactuals (what else would be different, what would be expect to see?). When people call BS on those not showing their work, they are being justly suspicious of summary statistics.

> you’re just forcing them to to say unclear things like “well, it’s a little likely, but not super likely, but not . . . no! back up! More likely than that!”, and confusing everyone for no possible gain.

There's something more to that than meets the eye. When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many. 20%? 30%? Would anyone stake much on the difference?

That's why (in an alternate world where weird new systems were acceptable for twitter dialouge), writing probabilities in decimal binary makes more sense. 0b0.01 express 25% with no false implication that it's not 26%. Now, nobody will learn binary just for this, but if you read it from right to left, it says "not likely (0), but it might happen (1)." 0b0.0101 would be, "Not likely, but it might happen, but it's less likely than that might be taken to imply, but I can see no reason why it could not come to pass." That would be acceptable with a transition to writing normal decimal percentages after three binary digits, when the least significant figure fell below 1/10.

This is one of those societal problem where the root is miscommunication. And frankly it's less of a problem than just the fact of life. I remember Trevor Noah grilling Nate silver that how could Trump win presidency when he predicted that Trump has only in 1/3 chance of winning. It was hilarious is some sense. Now this situation is reverse of what Scott is describing where the person using the probability is using it accurately but the dilemma is same: lack of clear communication.

Sooner or later, everyone wants to interpret probability statements using a frequentist approach. So, sure you can say that the probability of reaching Mars is 5% to indicate that you think it's very difficult to do, and you're skeptical that this will happen. But sooner or later that 5% will become the basis for a frequentist calculation.

If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. It's just inevitable.

It's also very obscure how to assign numerical probabilities to degrees of belief. For instance, suppose we all agree that there is a low probability that we will travel to Mars by 2050. What's the probability value for that? Is it 5%, 0.1%, or 0.000000001%? How do we adjudicate between those values? And how do I know that your 5% degree of belief represents the same thing as my 5% degree of belief?

I wonder if there's a bell curve relationship of "how much you care about a thing" versus "how accurately you can make predictions about that thing". E.g. do a football teams' biggest fans predict their outcomes more accurately or less accurately than non-supporters? I would guess that the superfans would be less accurate.

If that's the case, "Person X has spent loads of time thinking about this question" may be a reason to weigh their opinion less than that of a generally well-calibrated person who has considered the question more briefly.

Your suggestion to call different concepts of probability different names ("shmrobability") for metaphysical reasons actually makes complete sense. Maybe call frequentist probability "frequency", and Bayesian probability "chance" or "belief", with "probability" as an umbrella term. The different concepts are different enough that this would be useful. "The frequency of picking white balls from this urn is 40%." Sounds good. "The frequency of AI destroying mankind by 2050 is 1%" Makes no sense, as it should; it happens or not. "The chance of AI destroying mankind by 2050 is 1%." OK, now it makes sense. There we go!

> Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

This seems literally wrong to me. Probability and information both measure (in a very technical sense) how surprising various outcomes are. I think they may literally be isomorphic measures, with the only difference being that information is measured in bits rather than per cents.

Your examples are also off-base here. The probability of a fair coin coming up heads when I'm in California is 1/2, and the probability of a fair coin coming up heads when I'm in New York is 1/2, and we wouldn't say that probability is not the same thing as information because 1/2 does not capture what state I'm flipping the coin in. Similarly, the difference between the first two scenarios is not E[# heads / # flips] but E[(# heads / # flips)^2] - (E[# heads / # flips])^2 aka the expected variance of the distribution is different. This is because (1) is well-modelled by *independent* samples from a known distribution, while in (2) the samples are correlated (aka you need a distribution over hyper parameters if you want to treat the events as independent).

I also noticed you didn't touch logical / computationally-limited probability claims here, like P(the 10^100 digit of pi is 1) = 1/10.

Parts of this are unobjectionable, and other parts are very clearly wrong.

It is perfectly fine to use probabilities to represent beliefs. It is unreasonable to pretend the probabilities are something about the world, instead of something about your state of knowledge. Probabilities are part of epistemology, NOT part of the true state of the world.

You say "there's something special about 17%". No! It's just a belief! Maybe the belief is better than mine, but please don't conflate "belief" with "true fact about the world".

If Samotsvety predicts that aliens exist with probability 11.2%, that means they *believe* aliens to exist to that extent. It does not make the aliens "truly there 11.2% of the time" in some metaphysical sense. I can feel free to disagree with Samotsvety, so long as I take into account their history of good predictions.

(Side note: that history of good predictions may be more about politics and global events than it is about aliens; predicting the former well does not mean you predict the latter well.)

----------

Also, a correction: you say

"It’s well-calibrated. Things that they assign 17% probability to will happen about 17% of the time. If you randomly change this number (eg round it to 20%, or invert it to 83%) you will be less well-calibrated."

This is false. It is easy to use a simple rounding algorithm that guarantees the output is calibrated if the input is calibrated (sometimes you can even *increase* calibration by rounding). If I round 17% to 20% but also round a different 23% prediction to 20%, then it is a mathematical guarantee that if the predictions were calibrated before, they are still calibrated.

Calibration is just a very very bad way to measure accuracy, and you should never use it for that purpose. You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

Assume there is a one-shot event with two possible outcomes, A and B. A generous, trustworthy person offers you a choice between two free deals. With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs. By adjusting X, and under some mild(ish) assumptions, the threshold value of X behaves a helluva lot like a probability.

I feel that it's in a sense a continuation of the argument about whether it's OK to say that there's a 50% chance that bloxors are greeblic (i.e. to share raw priors like that). The section "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" specifically leans into that, and I disagree with it.

Suppose I ask you what are the chances that a biased coin flips heads. You tell me, 33%. It flips heads and I ask you again. In one world you say "50%", in another you say "34%", because in the first world most of your probability estimate came from your prior, while in the second you actually have a lot of empirical data.

That's two very different worlds. It is usually very important for me to know which one I'm in, because sure if you put a gun to my head and tell me to bet immediately, I should go with your estimate either way, but in the real world "collect more information before making a costly decision" is almost always an option.

There's nothing abstruse or philosophical about this situation. You can convey this extra information right this way, "33% but will update to 50% if a thing happens", with mathematical rigor. Though of course it would be nice if Bayesians recognized that it's important and useful and tried to find a better way of conveying the ratio of prior/updates that went into the estimate instead of insisting that a single number should be enough for everyone.

And so, I mean, sure, it's not anyone's job to also provide the prior/updates ratio, however it might look like, to go along with their estimates (unless they are specifically paid to do that of course), and people can ask them for that specifically if they are interested. But then you shouldn't be surprised that even the people who have never heard about the Bayes theorem still might intuitively understand that a number "50%" could come entirely from your prior and should be treated us such, and treat you with suspicion for not disclosing it.

I am glad, there is no new example of: ' "Do bronkfels shadwimp?" is binary, no one knows, thus: 50% chance. ' As in the "coin which you suspect is biased but you’re not sure to which side" - which IS 50%. - If A ask about bronkfels and knows/pretends to know: may be 50%. If no one around knows: Chance of a specific verb working with a specific noun; which is less than 1%. - "Are the balls in this bag all red?": around 4%. No surprise if they are, even if you did not know. - "Are 20% purple with swissair-blue dots?" I'd be surprised. And would not believe you did not know before. - "Are they showing pictures of your first student?": 50% really?

> Whenever something happens that makes Joe Biden’s impeachment more likely, this number will go up, and vice versa for things that make his impeachment less likely, and most people will agree that the size of the update seems to track how much more or less likely impeachment is.

There is a close parallel here to the same issue in polling, where the general sense is that the absolute level determined by any given poll is basically meaningless - it's very easy to run parallel polls with very similar questions that give you wildly different numbers - but such polls move in tandem, so the change in polled levels over time is meaningful.

Something's been bothering me for a while, related to an online dispute between Bret Devereaux and Matthew Yglesias. Yglesias took the position that, if history is supposed to be informative about the present, then that information should come with quantified probabilities attached. Devereaux took the position that Yglesias' position was stupid.

I think Devereaux is right. I want to draw an analogy to the Lorenz butterfly:

It is famous for the fact that its state cannot be predicted far in advance. I was very underwhelmed when I first found a presentation of the effect - it's very easy to predict what will happen, as long as you're vague about it. The point will move around whichever pole it is close to, until it gets close to the other pole, at which point it will flip. Over time, it broadly follows a figure 8.

You can make a lot of very informative comments this way. At any given time, the point is going to lie somewhere within a well-defined constant rectangle. That's already a huge amount of information when we're working with an infinite plane. And at any given time, the point is engaged in thoroughly characteristic orbital behavior. The things that are hard to predict are the details:

1. At time t, will the point be on the left or on the right?

2. How close will it be, within the range of possibilities, to the pole that currently dominates its movement?

3. How many degrees around that pole will it have covered? (In other words, what is the angle from the far pole, through the near pole, to the point?)

4. When the point next approaches the transition zone, will it repeat another orbit around its current pole, or will it switch to the opposite pole?

If you only have a finite amount of information about the point's position, these questions are unanswerable, even though you also have perfect information about the movement of the point. But that information does let us make near-term predictions. And just watching the simulation for a bit will also let you make near-term predictions.

This seems to me like an appropriate model for how the lessons of history apply to the present. There are many possible paths. You can't know which one you're on. But you can know what historical paths are similar to your current situation, and where they went. The demand for probabilities is ill-founded, because the system is not stable enough, 𝘢𝘴 𝘵𝘰 𝘵𝘩𝘦 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴 𝘺𝘰𝘶’𝘳𝘦 𝘢𝘴𝘬𝘪𝘯𝘨, for probabilities to be assessable given realistic limitations on how much information it's possible to have.

Quoting: “My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology”

Sorry if this is a naive question but if there are RCTs comparing vaccines to placebos (and not using other vaccines as placebos) with a long enough follow up to diagnose autism I would be keen to see a reference. Just asking because I thought while all the people claiming there is a link are frauds, we didn’t actually have evidence at the level of ‘sure about this as we are about anything in biology’.

I don't have any objections to phrasing things as "x% likely", and I do this colloquially sometimes, and I know lots of people who take this very seriously and get all the quoted benefits from it, and my constant thought when asked to actually do it myself or to take any suggested "low probability" numbers at face value is "oh God", because I'm normed on genetic disorders.

Any given disorder is an event so low-probability that people use their at-birth prevalences as meme numbers when mentioning things they think are vanishingly unlikely ("Oh, there's no *way* that could happen, it's, like, a 0.1% chance"). It turns out "1 in 1000" chances happen! When I look at probability estimates as percentages, I think of them in terms of "number of universes diverging from this point for this thing to likely happen/not happen". Say a 5% chance, so p=1–(0.95)^n. The 50%-likelihood-this-happens-in-one-universe marker is about p=1–(0.95)^14, which is actually a little above 50%. So 14 paths from this, it'll "just above half the time" happen in one of those paths? ~64% likelihood in 20 paths, ~90% in 45 paths (often more than once, in 45 paths). The probabilities people intuitively ascribe to "5%" feel noticeably lower than this.

"There's a 1 in 100,000 chance vaccines cause autism"? I've known pretty well, going about my daily life, absolutely unselected context, not any sort of reason to overrepresent medically unusual people, someone with a disorder *way* rarer than 1/100k! Probability estimates go much lower than people tend to intuit them. We think about "very unlikely" and say numbers that are more likely than people's "very unlikely" tends to be, when you push them on it, or when you watch how they react to those numbers (people estimating single-digit-percentage chances of a nuclear exchange this year don't seem to think of it as that likely).

There really isn't any possibility of opting out. Even if you reject using probabilities to reason about the future, you have to act as if you did. Clearly assuming a 50/50 chance on any outcome would not be particularly effective, and is not the way anyone actually thinks. I guess I can understand why some people might want to keep their probabilities implicit. Personally, I think reasoning explicitly in % terms helps me notice when some part of my world model is wrong.

This is one of the posts that gave me more sympathy for the opposing view than I had coming in. Now I think only superforecasters should be allowed to use percent-scale probabilities, and all the rest of us mooks shouldn't risk implying that we have the combination of information plus calibration necessary to make such statements.

Sorry if the following has already been said multiple times (it is an obvious observation, so it has probably been mentioned before):

Regarding item (3.) of the text: It is nonsensical, inconsistent and arbitrary to assign 50% probability in a situation where you have zero information and make up two outcomes.

Assume you have an object of unkonwn color. Then: 50% it is red, 50% it is not? But also: 50% it is green, 50% it is purple, 50% it is mauve; 50% it is reddish?

Assume you find in an alien spaceship a book in an unknown language that for some reason uses latin letters. What is its first sentence? 50% that it is "Hello earthling"? 50% for "Kjcwkhgq zqwgrzq blorg"?

What is the probability that all blx are blorgs? 50% of course! What is the probability that *some* blx are blorg? 50%! What is the probability that there is a blx? 50% That there are more than 10? 50% Less than 7? 50% More than a million? 50% More than 10^1000? 50%

"If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%."

As you have anticipated, I object to this for two reasons. Firstly, as a matter of theory, the assignment of priors is arbitrary. The agent need only choose some set of positive numbers which sum to 1. There is nothing to say that one choice is better than another. In particular, there is no reason to think that all outcomes are equally likely and attempting to assign priors based on this rule (the indifference principle) leads to well known paradoxes.

Secondly, as a matter of practice, if our fully uninformed agent assigns a probability of 50% to the world being destroyed by AI, it should also assign the same probability to the world being destroyed by a nuclear war, a pandemic, an asteroid strike, awakening an ancient diety, an accident involving avocado toast, and so on. But the world can only be destroyed once, so these probabilities are inconsistent.

Maybe I'm missing something, but there seems to be an easier way to communicate one's level of certainty without inappropriately invoking probabilities. If instead of saying "There is a 50% probability of X happening", you say "I'm 50% sure X will happen," I don't think anyone would object.

It might be shocking to learn, but an Anti-Rationalist like myself will point out as evidence for a Frequentist position that saying "I think X is more likely than Y" is merely a rhetorical phrase that could mean any number of things, and one of the things that it is on average (oh look, some a rhetorical use of the phrase "on average") very unlikely to mean is "I have calculated the probability of X and Y." Probability as a concept is not ontologically interchangeable with the manners of speech that may signify it. Rhetorically mentioning probability is not evidence that probability was used somewhere, or happened, or that the words are a proxy for a probability.

We see here Scott, like with most things, doesn't understand what probability is. But there are other interpretations of what's going on with probability, so sure, we could say he just has a Bayesian interpretation. Except Scott also seemingly doesn't understand what the argument is about, because his arguments don't align with classical bayesian probability either. They DO align with Rationalism. As usual, we find that Rationalists simply take it for granted that their extremely bizarre assumptions about things are correct, and we get weird arguments like, "If you give argument X, obviously you need to read what you're criticising, because I can't possibly be fundamentally missing the point of your argument." It is simply a tenant of Rationalism that Bayesian probability is not merely an interpretation of probability, it's a fundamental aspect of all human reasoning. Or as Scott says here, "A probability is the output of a reasoning process."

Sure, in the sense that mathematics is a reasoning process, and a percentage chance is a probability. But the very fact that "Maybe 1%?" means "I don't know, as of the evidence I have off the top of my head it seems unlikely", is to show that there is no probabilities happening here, non-Frequentist or otherwise. I could summarise I suppose by saying there's equivocation happening between Bayesian probability, which Scott is pretending to defend, and Bayesian Epistemology, which is what he's actually trying to defend.

Which brings about the bigger problem. This post a motte and bailey. "Oh, when I say 20% I don't REALLY mean a 20% chance, I just mean it might happen but I think it might happen less than that guy who said 30%." If that was the case, I would never see attempted mathematical exegeses about AI risk. Unfortunately you see this all the time, including with Scott, and of course with things that aren't AI risk. The real criticism here is that the mathematical language is merely a patina meant to mask the fact that the arguments are fundamentally badly reasoned and rest on faulty assumptions. Accusations that the social media-ites need to just read the papers or trust the science assumes they didn't or don't. Instead one could assume Scott is missing the point completely.

No, the way I use language, terms like "slightly unlikely," "very unlikely", etc. don't translate well into numbers. Medical personnel are always asking patients to rank pain on a 1-10 scale. I know what they mean, but my mind rebels against doing this. It sounds way too exact. "Not the worst pain I've ever had, but deep and persistent," is how I described one, and I just don't know how to put that as a number.

Usually on the Internet, a debate about X isn't really about X. When someone argues that a bayesian probability is invalid, they're usually just lawyering for their position. More "screw the billionaires and their space hobby," less thoughtful attempt to coverage on accurate world models.

Source: I have a lot of success changing gears in these conversations by a) switching the topic to their true objection, and b) expressing their opinion as a bayesian probability. So when I hear "there's no way you can know the probability of getting to Mars," I might respond with "I mean, sure, there's like a 95% chance that musk is a jerk who deserves to live paycheck to paycheck like the rest of us."

Well. By "success" I mean watching them turn red when, after they respond, I point out their inconsistent treatment of bayesian probability: immediate recognition and scathing dismissal when it goes against their position, routine acceptance when it supports their position. This allows a "successful" disengagement when they respond and I say "you don't have a strong opinion on Mars, you just don't like musk. That's ok, but look - that person over there might want to talk about musk. I'm going to look for someone who wants to talk about Mars. Cheers!"

One day I'll do this to someone who responds with "see, there you go again, that 95% number doesn't mean anything" and the rest of the conversation will be either wonderful or terrible...

I often find, when, say, ordering something by phone, or otherwise asking for a service to be performed that will take some time, an absolute refusal to give a time estimate. Apparently they're afraid to give a number for fear it will be taken as a guarantee.

I treat this by asking about absurdly short and long periods. "Will it take ten minutes?" No, not that quick. "Will it take two years?" No, not that long. "OK, then, we have a range. Let's see if we can narrow it down a bit more." This is frequently successful, and I wind up with something like, for instance, "about 3 or 4 weeks." That's helpful.

Calibration troubles me because probabilities presuppose some sort of regularity to the underlying phenomena, but predictions may have nothing in common at all. As Scott himself once said, you could game for perfect calibration at 17% by nominating 17 sure things and 83 sure misses. Superforecaster skill cannot be evenly distributed over all domains, so the level of trust you place on a superforecaster’s 17% is laden with assumptions. Can anyone resolve this for me? Why should one trust calibration?

It is the wrong question: Will AI destroy the world? The right question is whether we will use AI to destroy the world. Will people use AI predictive power to manipulate us to kill one another more readily than we now do. If AI has developed a theory of mind beyond that of any human it will predict our desires better than we can predict them ourselves. That creates a new lethality for people to use against each other as dangerous as nuclear bomb lethality. Will we do this to ourselves?

One counter-argument might be that people can use probabilities as a motte-and-bailey.

The motte is: I say "my probability of X is 71%", you say "you're no more of an expert than me, how can you claim such precise knowledge?", and I reply "I didn't say I could, a probability like that is such a shorthand for fairly but not completely confident".

The bailey is: I say "my probability of X is 71%", you say "okay, well I think X is fairly unlikely", and I reply "since I gave a precise probability, I've *clearly* thought about this much more than you! So it's clear whose opinion should be trusted."

I don't for sure if people do that, but I bet they do.

There is something a bit weird about the whole quest-for-accurate-predictions thing. Perhaps it helps to ask: why do we care if someone states that something is very likely/likely/50-50/unlikely/very unlikely to happen?

In one situation, you care because you want to place a bet on an outcome. Then you hope that nothing will change after you placed the bet, that may change the initial probabilities you assigned to various outcomes.

In another situation, you care because you want to do preventive action, i.e. precisely because you want to motivate yourself – or others – to do something to change the probabilities.

These motives are very different, and so is your motive for offering predictions in the first place. Plus, they change your motive for listening to the probabilities other people assign to events (including how you infer/speculate about the motives they have for offering their probabilities).

Edit: And an added twist: Your own motive for consulting probabilities may be a desire to place bets on various outcomes. But others may interpret the same probabilities as a call to action, leading them to do stuff that changes the initial probabilities. Making it less probable that your initial probabilities are really good estimates to use to place bets. (Unless you can also factor in the probability that others will see the same initial probabilities as a call to action to change those probabilities.)

…all of the above concerns difficulties related to how to factor in the “social” in the biopsychosocial frame of reference when trying to predict (& assign probabilities to) future events where “future human action” has a bearing on what may happen.

I’m pretty sure a government bureaucrat labeling one side of a discussion misinformation and then shutting off debate has a high probability of making the world much worse than it is now…

counterpoint: 97.26% of the silliest things "rationalists" do could be avoided if they just resisted the temptation to assign arbitrary numbers to things that are unquantifiable and then do math to those numbers as if they're meaningful

I think most people are way too averse to probabilities, but also rationalists are too enthusiastic about them. I used to share the enthusiasm, so here's what convinced me that they don't deserve so much credit as rationalists give them:

You can think of reality as being determined by a bunch of qualitatively distinct dynamics, which in probabilistic terms you might think of as being represented by a large number of long-tailed, highly skewed random variables. The outcomes you ask about for forecasting are affected by all of these variables, and so the probabilistic forecasts represent an attempt to aggregate the qualitative factors by their effects on the variables you forecast for.

This is fine if you directly care about the outcome variables, which you sometimes do - but not as often as you'd think. The two main issues are interventions on the outcomes and the completeness of the outcomes.

For intervention, you want to know what the long-tailed causes are so you can special-design your interventions for them. For instance if a political party is losing the favor of voters, they're gonna need to know why, as you are presumably doing just about everything generic they can to gain popularity, and so their main opportunity is to stay on top of unique challenges that pop up.

But even if you just care about prediction, the underlying causes likely matter. Remember, the uncertainty in the outcome is driven by qualitatively different long-tailed factors - the long-tailedness allows them to have huge effects on other variables than the raw outcomes, and the qualitative differences mean that those effects will hugely differ depending on the cause. (There's a difference between "Trump wins because he inflames ethnic tensions" and "Trump wins because black and hispanic voters get tired of the democrats".)

It appears to me that the crux of your argument is that probabilities about events which seem to be one-off events make sense because there's secretly a larger reference class they belong to.

So superforecasters have a natural reference class consisting of all the topics they feel qualified to make predictions on, and you calibrate them across this reference class they have chosen.

So it appears to me that your argument is much simpler than you make it out be - it revolves around the fact that everyone gets to choose their reference class, and while sometimes there's an obvious reference class (like flipping a coin), this isn't so material.

Conversely, if you don't have many examples of predictions to judge people on, then their probability statements are indeed meaningless. If some stranger tells me that there is a 50% chance that aliens land tomorrow, then I really don't know how to integrate this information into my worldview.

> It seems like having these terms is strictly worse than using a simple percent scale

Lost you here. The idea that we can and should stack rank likelihoods of different events is a good point and seems right, but converting the ranking to percents seems to be where people are just pulling numbers out of thin air.

You sort of address that in the next paragraph but I feel like I need more than a few sentences on this to be convinced. I was getting ready to agree from the racing argument.

Just like probabilities are meaningful in that they compare two outcomes, confidence is only meaningful when it compares two questions. Since we like to think about probabilities as betting odds, certainty tells you how you would distribute your capital between a large number of questions. It’s always smartest to bet your best guess, but you’ll put more weight on the ones with higher confidence. That’s how I think about it anyways.

Can we reconcile frequentists and non-frequentists by saying that it's all about the frequencies across multiple events for a given forecaster? I use forecaster as a concept of something that is able to make multiple forecasts, whether it's a single machine learning model, a human being, a group of people (like the wisdom of crowd, or predictions from random people), or a process to combine any such input to make a forecast. All the arguments you make about forecasts on non repeatable events being useful revolve around the fact that if you look at a collection of such events, they are indeed useful. So why not talk about the collection itself and not single events in isolation, which inevitably gets some people uneasy? We can just consider that this collection of predictions is a specific forecaster and peacefully discuss it.

If I have a model that I use once, no one has any info on how it's been built, and it tells you that it's going to rain at 90% probability tomorrow, what would you do with this? The only way to use this information in relationship with previous models submitted by random people.

What you are arguing for is one of three possible philosophical approaches to defining probability. There is the approach that it is mathematical--a probability of 0.5 for heads is embedded in the mathematical definition of a fair coin. There is the approach that is empirical--as you flip many coins, the frequency of heads approaches 50 percent. Finally, there is the notion that probability is subjective. I believe that the probability of heads is 0.5, but you could believe something else.

You are insisting on the subjective interpretation of probability. It is what one person believes, but it need not be verifiable scientifically. Other approaches make "the probability that Biden wins the election" an incoherent statement. The subjective definition has no problem with it.

You might also get a kick out of Cox's theorem, which "proves" that any "reasonable" view of probability is Bayesian. https://en.wikipedia.org/wiki/Cox%27s_theorem . There is a reasonably accessible proof in one of the appendices in *Probability Theory: The Logic of Science*.

You can still poke holes in this, particularly if you have some sort of Knightian uncertainty, but it's still pretty interesting.

Is it generally held that probabilities of the urn-kind are truly different in kind (rather than degree) from one-off predictions?

In assigning a probability to drawing a certain type of ball, I am taking the ultimately unique, once-in-history event of drawing a particular ball in a particular way from a particular urn, and performing a series of reductions to this until it becomes comparable to a class of similar events over which an explicit computation of probability is tractable. We take the reductions themselves to be implicit, not even worth spelling out (how often is it remarked on that the number of balls must stay constant?).

It seems like this reduce-until-tractable procedure would be enough to assign a probability to most one-off events, and is (roughly) what is actually done to get a probability (allowing here for the reduction to terminate at things like one's gut feeling, in addition to more objective considerations).

Is there something wrong about this account, such that urn-probabilities can really be set apart from one-offs in a fundamental way?

To go back to your example of fair coin flip vs biased coin flip vs unknown process with binary outcome, the reason you end up with 50% for all of them is because the probability is a summary statistic that is fundamentally lossy. It's true that if all you're asked to do is choose to predict a single even from that process, your "50%" estimate is all you need. But the minute you need to do anything else, especially adjust your probability in light of new information, the description of the system starts mattering a lot: The known-fair coin's probability doesn't budge from 50%, the biased coin's probability shifts significantly based on whatever results you see, and the unknown process's probability shifts slowly at first, then more quickly if you notice a correlation between the outcome and the world around you (if the process turns out to be based on something you can observe).

Summarizing to a single number loses most of that usefulness. It's less lossy than "probably not", and you're right to defend against people who want to go in that direction, but it's not that much less lossy. And in a world where we can send each other pages of text (even on Twitter now!) there's not much value in extreme brevity.

I tend to take more seriously people who offer at least a few conditional probabilities, or otherwise express how much to adjust their current estimate based on future information.

> Is it bad that one term can mean both perfect information (as in 1) and total lack of information (as in 3)? No. This is no different from how we discuss things when we’re not using probability.

I kind of disagree. I think many people have converged on a protocol where you give a probability when you have a decent amount of evidence, and if you don't have a decent amount of evidence then you say something like "I guess maybe" or "probably not".

Some people (people who are interested in x-risk) are trying to use a different protocol, where you're allowed to give a probability even if you have no actual evidence. Everyone else is pushing back, saying, "no, that's not the convention we established, giving probabilities in this context is an implicit assertion that you have more evidence than you actually do".

Scott is arguing "no, the protocol where you can give a probability with no evidence is correct" but I don't feel like his examples are convincing, *for the specific case where someone has no evidence but gives a probability anyway*, which is the case we seem to be arguing about.

I wish we could fix it by switching to a protocol where you're allowed to give a probability if you want, but you have to also say how much evidence you have.

The non-frequentist view of probability is useful even if it has no connection to the real-world. It's a slider between 0 and 1 that quantifies the internal belief of the one making the statement. There are a myriad of biases that will skew the belief though. However, everyone's probability wouldn't be the same because the "math" people do in their heads to come up with these numbers will be inconsistent. Some will curve towards higher probabilities, others will go low - one person's 40% need not correspond one-to-one to another person's 40%. We should call it something like "opinion-meter" unless there is a more formal data aggregation process to come up with the numbers (thus ensuring consistency)

Part of the issue is that many people who complain use numbers as substitutes for other ideas:

-50/50 does not mean even odds, it means 'I don't think this is something anyone can predict'

-99.9% does not mean 999/1000, it means 'I think it's really likely'

These linguistic shortcuts are fine for everyday conversation, but when others use numbers trying to convey probabilistic reasoning, this first crowd often defaults to their own use of numbers and concludes - 'oh, you can't actually know stuff like this - and it is weird that you are using non-standard fake probability numbers'.

On a related note, I've dealt with many in the medical field who (perhaps for liability reasons, perhaps for other reasons) are strongly opposed to trying to phrase anything in a probabilistic nature. My kid was preparing for a procedure which was recommended, but I knew didn't always have great outcomes. I ask one doctor how likely it is that the procedure will work: 'oh, 50/50'. I ask another one: 'there's a pretty good chance it will work, but no guarantees', but neither will provide anything more informative or any evidence behind their output. But I learned that, at least with some doctors, if I said: if this procedure were to be performed, say, 80 times in situations comparable to this one, how many of those 80 procedures would you expect to go well? With this sort of phrasing, most of the doctors could give answers like 50-65 (out of 80) which was far more satisfying of a response for me. But I imagine for the average patient, this extra information wouldn't add much to the experience, so they are inclined to simply keep things vague.

> I don’t understand why you would want to do this. If you do, then fine, let’s call it shmrobability. My shmrobability that Joe Biden will be impeached is 17%...

Scott, maybe a new word is a better idea than you think? Imagine if you and other Bay Area rationalists pledged to adopt the top-voted new words for probability & evidence & rationalist, provided that 500 people with standing in your community vote. I would not be a voter, but I can offer suggestions:

- probability = subjective confidence (this clarifies it is not measurable like a frequency is);

- evidence = reason to believe (this avoids overlap with either frequentist science or the legal system);

- rationalist community = showing-our-work community (sounds less arrogant; leaves open the possibility that there is a better methodology of systematized winning; avoids confusion with either Descartes or Victorian-era freethinkers).

Then you wouldn't have to keep writing posts like this!

Instead, you could focus on spreading the word about superforecasters. (Based on their comments, some critics didn't even notice that section.) But if you seem to be engaging in a philosophical/definitional debate, then that's what you will usually get, perpetually.

> if you want to find out why Yosuha Bengio thinks there’s 20% chance of AI catastrophe, you should read his blog, or the papers he’s written, or listen to any of the interviews he’s given on the subject - not just say “Ha ha, some dumb people think probabilities are a substitute for thinking!”

On the other hand, if you want status, that may well be the optimal response. I think people care orders of magnitude more often about status than about why someone else thinks something.

"If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%."

Recently, I saw an "expert" put the probability of AI-driven human extinction between 10% and 90%. Now, this would average as a 50% probability, but it would mean both a lot less and a lot more than a simple statement of a 50% probability. It conveys that it is very unlikely that AI will be quite harmless, but a bad outcome is also by no means certain. Also, all probabilities between 10% and 90% seem (incredibly) to be equally likely to him. This looks like a pretty strange belief system, but it's surely logically consistent. But then, a straight-up 50% assessment would have the meaningful frequentist property of being right half of the time, if well-calibrated. But then, in the context of human extinction, does it really matter? I guess, the 10%-90% statement could mean that based on the current evidence, it is equally likely to find a probability that has the required long-run properties of being right in the range of probabilities from 0.1 to 0.9. (With the understanding that a long-run probability here requires some sort of multiverse to be meaningful.)

What if I said that my probability lies somewhere between 0% and 100%? By saying this, I will add no information to the debate (as I have 0 knowledge on the matter), but would still claim a 50% average probability of human extinction? I find this hard to believe...

I think you could remove all of my objections if you replaced "probability" with "confidence" in this post, and also assigned a confidence interval that is informed by your (well, not necessarily you personally, but whoever is providing these numerical values) skill as a forecaster.

Saying "my probability" is, to me, similar to saying "my truth".

The problem is that there is a massive equivocation here.

You're arguing for using the language of probability for situations in which we have little information or ability to rigorously model. You acknowledge that these situations are not quite the same as those (like balls in urns) where we have high information and ability to model. You even say one might want to call the former something different (e.g., a "shmrobability.")

But the simple fact is that the situations are not all the same. The low information ones really do involve people "pulling numbers out of their ass," and this happens all day long. How many conversations that begin "What's your P(doom)?" involve such shit-covered numbers?

For what it's worth, the following summarizes my position pretty well. Yes, I know you purport to address it in the post, but I don't think your discussion of it really does.

No two events are exactly the same, so we are always performing some kind of conceptual bundling or analogy, in order to use past events in creating probabilities.

Consider a coin flip. Percy Diaconis showed that a coin is more likely to land on the face that began pointing up. So a good frequentist should only use past instances of coin flips that started in the same orientation as theirs, in forming a probability.

But then, the size and shape of the coin also impact this effect, so they should only use flips that match those as well. And each coin has a unique pattern of wear and tear, so it better be this exact coin.

And actually, the angle of the flipper’s hand and the level of force they apply and the breeze in the room are also key…

As it turns out, Diaconis solved this too: he built a machine that precisely controls all elements of the flip, and produces the same result each time.

The probabilistic variance in the flip comes from the aggregation of these disruptive mitigating factors.

Frequentists imagine they are insisting on using only past perfect reference classes, but to actually do this, you’d need to set up the entire universe in the same configuration. And if you did, then the result would be deterministic, making probability irrelevant.

The fact is that every probability is secretly Bayesian. You are always drawing your own conceptual boundaries around distinct and deterministic events in order to create an imperfect reference class which is useful to you.

Frequentists are just arguing for a tighter set of rules for drawing these Bayesian boundaries. But they are also Bayesians, because it’s the only conceptual framework that can support actual forecasting. They’re just especially unimaginative Bayesians.

And this is allowed! You can be a tight Bayesian. But if you want to call it frequentism and insist there is a bright metaphysical line, you need to explain exactly where the line is. And you simply can’t do so, without sacrificing probability altogether.

(Obviously this is weird metaphysical nitpicking, but people in weird metaphysical glass houses should not throw weird metaphysical stones.)

So the thing is that everything you've said is correct and this is a good article and people who disagree would have to work very hard to convince me it's wrong. The other side of the debate is making a type error and not communicating their objection well.

But the objection could be: "You've imposed language requirements unique to your culture on making arguments and then discounted arguments that don't use that language, ensuring that you'll discount any critique that comes from outside your bubble." If so, this isn't much different than objections to culture war stuff, or academic fields gatekeeping.

One teensy additional thought: Wharton business professor Phil Rosenzweig stresses the difference between probabilities where you can affect the outcome (e.g. NASA scientists thinking about moon landing) and those where we have no control (e.g. climate change). See a good breakdown in Don Moore and Paul Healy (2008): “The Trouble with Overconfidence”.

You seem to be defending not just non-frequentism about probabilities, but a sort of *objective* non-frequentism. At least, that’s what’s going on in the section about Samotsvety. At least, in the sentence where you say, “If someone asked you (rather than Samotsvety) for this number, you would give a less good number that didn’t have these special properties.”

I claim that the number .17 doesn’t have any of these special properties. The only numbers that could have objectively special properties for any question are 1 or 0. If you assign the number .17, and someone else assigns 1 and someone else assigns 0, then the person who does the best with regards to this particular claim is either the person who assigned 1 or the person who assigned 0. (They bought all the bets that paid off and none of the bets that didn’t, unlike the person who assigned the other extreme, who did the absolute worst, or the person who assigned .17, who bought some of the bets that paid off and some of the ones that didn’t.)

However, there are *policies* for assigning numbers that are better than others. Samotsvety has a good *policy* and most of us can’t come up with a better policy. No one could have a policy of assigning just 1s and 0s and do as well as Samotsvety unless they are nearly omniscient. (I say “nearly” because a person who is assigning values closer to 1 and 0 and is *perfectly* omniscient actually gets much *more* value than someone who assigns non-extreme probabilities no matter how well the latter does.)

But the goodness of the policy can’t be used to say that each particular number the policy issues is good. Two people who are equally good at forecasting overall may assign different numbers to a specific event. If you had the policy of always deferring to one of them, or the policy of always deferring to the other, you would very likely do better than using whatever other policy you have. But in this particular case, you can’t defer to both because they disagree. Neither of them is “right” or “wrong” because they didn’t assign 1 or 0. But they are both much more skilled than you or I.

This is no weirder than any other case where experts on a subject matter disagree. Experts who are actually good will disagree with each other less than random people do - even computer scientists who disagree about whether or not P=NP agree about lots of more ordinary questions (including many unproven conjectures). But there is nothing logically impossible about experts disagreeing, and thus the *number* can’t have the special properties you want to assign to it.

I prefer the term "credence" for the sort of number that you get from forecasting, "probability" for the sort of number that you get from math, and "frequency" for the sort of number you get from experiment. "The probability for an ideal coin to land heads is 50%. The observed frequency for this particular coin to land heads is 49%. My credence in the hypothesis that this is an unbiased coin is 99%."

About 23.2% of the things I ever say I'm quoting dialogue from a movie. Not "you talking to me" but more ""I have a lawyer acquaintance” or "it's my own invention". In the film high Fidelity the protagonist asks his former girlfriend what are the chances they'll get back together and she says something like "there is a 57% chance we'll get back together" I use this all the time. I tell my wife "there's a 27% chance I will remember to pick up bananas at the store", allowing her to make whatever adjustments she thinks are necessary.

People decide what they think are the rational prices for stock futures and options all the time, despite that each one is about a singular future event.

Frequentist probabilities are just model-based Bayesian probabilities where the subjective element is obscured.

Parable: I have an urn containing 30% red marbles and 70% blue marbles. I ask four people to tell me the odds that the first marble I'll draw out will be red. The first guy says 30% because he knows that 30% of the marbles are red. The second guy says 100% because he is the one who put the marbles in the urn, and he put in blue ones, followed by the red ones, so the red ones completely cover the blue ones, meaning the first one I touch will be red. The third guy says 30%, because he knows everything the other two guys know, but also knows that I always vigorously shake the urn for a solid minute before I draw out a marble. The fourth guy says 0% because he is a mentat who has calculated the exact dynamics of the marbles and of my searching hand given the initial conditions of the Big Bang and knows it will be blue.

All allegedly "Frequentist" probabilities are like this. You smuggle in your knowledge about the process to structure and screen off uncertainties, such that the remaining uncertainty is Bayesian, or, in other words, based entirely in unquantifiables. You then pretend that you have done something different than what Bayesians do.

> If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%.

This approach has some problems. Following it, you would assign p(Atheism)=0.5, p(Christianity)=0.5, p(Norse Pantheon)=0.5.

Of course, any of these possibilities is mutually exclusive, so the probabilities can't add up to more than one. You could simply say "okay, then probabilities are inverse to the number of mutually exclusive options". But should Christianity really just count as one option? Different sects surely differ in mutually exclusive details of their theology! Perhaps you should have at least two options for Christianity. Then someone invents the Flying Spaghetti Monster as a joke. If you work from zero knowledge, it would look just as probable as Atheism.

Someone generates a random number. What is the probability that it is two? What is the probability that it is 1/pi? Is the expected imaginary part of the number zero?

Without additional information ("the number is returned as a signed 32 bit integer"), I think it is very hard to form defensible priors for such questions.

I've been trying for a while to explain some of the confusions identified in this article, and even wrote an EA Forum post on it last year: https://forum.effectivealtruism.org/posts/WSqLHsuNGoveGXhgz/disentangling-some-important-forecasting-concepts-terms . I've been struggling to get feedback and might find the distinctions aren't helpful, but every time a read an article like this I keep thinking "Geez I wish we had more standardized language around forecasting/probability."

The main point of my forum post is that people often conflate concepts when talking about the meaning or practicality of forecasts—perhaps most notoriously for EA when people say things like "We can't make a forecast on AI existential risk, since we have no meaningful data on it" or "Your forecasts are just made up; nobody can know the real probability." Instead, I recommend using different terms that try to more cleanly segment reality (compared to distinctions like "Knightian uncertainty vs. risk" or "situations where you don't 'know the probability' vs. do know the probability").

Slightly rephrased, people sometimes demand "definitive" evidence or estimates, but they don't know what they mean by "definitive" and haven't really considered whether that's a reasonable standard when doing risk assessments. I think it's helpful to define "definitiveness" in at least one of two dimensions:

1) how much do you expect future information will change your best estimate of X (e.g., you might now think the probability is 50%, but expect that tomorrow you will either believe it is 80% or 20%);

2) how difficult is it to demonstrate to some external party that your estimate of X was due to good-faith or "proper" analysis (e.g., "we followed the procedure that you told us to follow for assessing X!").

The terms I lay out in my forum post are basically:

• "The actual probability": I don't really explain this in the article because quantum randomness is a big can of worms, but the point of this term is to emphasize that we almost never know ""the probability"" of something. If we decide quantum randomness is merely "indeterminable by physical observers [but still governed by physical laws/causality]" as opposed to "actually having no cause / due to the whims of Lady Luck," the probability of some specific event is either 100% or 0%. For example, a flipped coin is either 100% going to land heads or 0% going to land heads; it's not true that the event's "actual/inherent probability is 50%."

• "Best estimate": This is what people often mean when they say "I think the probability is X." It's the best, expected-error-minimizing estimate that the person can give based on a supposed set of information and computational capabilities. In the case of a normal coin flip, this is ~50%.

• "Forecast resilience": How much do you expect your forecast to change prior to some (specified or implicit) point in time, e.g., the event occurring. For example, if you have a fair coin your 50% estimate is very resilient, but if you have a coin that you know is biased but don't know the direction of the bias, and are asked to forecast whether the 10th flip will be heads, your initial best estimate should still be 50% but that forecast has low resilience (you might find that it is clearly biased for/against heads). *This seems like a situation where someone might say "you don't know the probability," and that's true, but your best estimate is still 50% until you get more information.*

• "Forecast legibility/credibility": I don't have a great operationalization, but the current definition I prefer is something like “How much time/effort do I expect a given audience would require to understand the justification behind my estimate (even if they do not necessarily update their own estimates to match), and/or to understand my estimate is made in good-faith as opposed to resulting from laziness, incompetence, or deliberate dishonesty?” Forecasts on coin flips might have very high legibility even if they are only 50%, but many forecasts on AI existential risk will struggle to have high legibility (depending on your estimate and audience).

I'm missing an information-theoretic related answer here. There are two factors here, first, the number of "poll options" you can choose from, and second, which poll option you chose (and how that translates to the probability).

For whole percentages, you've got 101 options, 0 to 100. For percentages with three fractional digits, you've got 100,001. For [probably | probably not | don't know] you've got three options, and for just [don't know] you've got one option.

So, estimating the probability of the three scenarios, I'd answer as follows:

1) Fair coin: the 101st option of 201 options. (e.g. the one in the middle)

2) Biased coin: the 5th option of 9 options. (e.g. the one in the middle)

3) Unspecified process: the 1st option of 1 option. (e.g. the one in the middle)

In all the cases, the best answer is "the one in the middle" which best corresponds to 50%, but the answers are not at all alike. The number of options I chose reflects my certainty or lack thereof.

great discussion. We use language ... it facilitates quite a lot, but we over ascribe certain powers and utilities to language. As Wittgenstein concluded at point, most of philosophy is a 'language game', and likely most other cherished beliefs are language games. Games are good, but they are not 'truths' of the universe. At best, means of organizing, conveying, and working with vague notions and data.

I think you need to watch “Dumb and Dumber” again; what is the semantic content of 1 in a million?

In a similar vein, do people really act in a way that one should take seriously their non zero predictions of disaster? What would a “rational” actor do if, for example, he or she thought there was a real chance of say being in a plane accident in the next Month? Would you expect to meet them at a conference overseas somewhere?

How does sharing or researching a probability affect that probability? Everything we make a "thing" affects the quality of what we've "thinged." Human mind, inquiry, and dialogue seems to me to have a significant observer effect on itself. I guess this gets at the metaphysical aspect you mention?

The gist is that rather than having a single number represent all the information you have about probabilities, you carry around probability ranges, and propagate them through belief networks through a fairly simple set of rules. The expectations might come out the same when looking at a 50% event whether you have a large or small interval around 50%, but other mechanics change as you add information or connect events together.

At least back then he was planning to make this the core of his approach to AI; I don't know if that has held up. It always seemed to me like an interesting idea, but I never quite dug deep enough to see if it was actually simplifying things enough to make it more useful than just carrying around full distributions and doing the Bayes thing properly (which can be computationally difficult when you're dealing with big networks).

One approach would be to recognize that there is a spectrum of kinds of uncertainties.

On the one side you have uncertainties in the territory of the form "I don't know if this radioactive nucleus will decay within the next half life". From out current physics understanding, even if you know all the physics there is, and the wave function of the universe, you will still be stuck making probabilistic predictions regarding quantifiable outcomes.

Then you have quantifiable uncertainties in the map like "we don't know the Higgs mass to arbitrary precision". At least to the degree that the standard model is true (spoiler: it is not), the Higgs mass has an exact value, which we can measure only imperfectly.

Before the Higgs was discovered, there was a different kind of uncertainty regarding it's existence. This is of course much harder to quantify. I guess you would have to have an assemble of theories which explain particle masses (with the Higgs mechanism being one of them) and penalize them by their complexity. Then guess if there are any short, elegant mechanisms which the theoreticians did not think of yet. (Of course, it might well be that an elegant mechanism reveals itself in the theory of everything, which we don't have yet.)

Both "Does the Higgs exist?" and "What is the exact mass of the measured Higgs?" are uncertainties which exist on our map, but I would argue that they are very different. The former is Knightian, the latter is well quantifiable.

For p(doom), there are two questions which are subject to Knightian uncertainty:

* How hard is it to create ASI?

* How hard is it to solve alignment before that?

It might be that we live in a world where ASI is practically impossible and AI will fizzle out. Or in a world where the present path of LLMs inevitably leads to ASI. Or in a world where getting to ASI by 2100 will require some "luck" or effort which mankind might or might not invest. p(ASI) is just the expected value over all these possible worlds. I would argue that the probability density function is a lot more interesting than the expected value.

In particular, I would expect that the regions around zero ("developing ASI is actually as hard as travelling to Andromeda") and one ("we will surely develop ASI unless we are wiped out by some other calamity in the next decade") have a significant fraction of the probability mass, with some more distributed in the middle region.

I think one thing that trips people up is "normal" versus "weird" distributions.

Normal:

I'm throwing darts at a board. I'm pretty good and hit the bullseye exactly 25% of the time. The remaining throws follow a normal distribution. You can bet that my average throw will land closer to the center than my twin, who hits the bullseye only 2% of the time.

Weird:

A robot throws darts at a board. It hits the bullseye exactly 25% of the time. Because of a strange programming error the other 75% of the time it throws a dart in a random direction.

If a prediction market gives a 60% chance of landing on Mars by 2050, some of the prediction follows a normal distribution. Eg, maybe there's a 50% chance by 2047, and a 67% chance by 2055. It's intuitive if there's a 60% chance of success, in the 40% of failures we should at least be close. But some of the "no" percentage follows a weird distribution. Eg, international nuclear conflict breaks out and extracurricular activities like space travel are put on indefinite leave.

I think weird outcomes lead to post hoc dismissal of prediction. If the dart thrower slips and throws into the ground, and we laugh at the 25% bullseye chance. If in 100 years all AI is chill and friendly we'll think that the 20% chance of existential threat we give it now is nonsense.

The problem is that lots of Bayesianists like to pretend that the probabilities are somehow scientific or objectively true, but if you take the philosophical underpinnings of Bayesian probabilities seriously it's clear that they are purely subjective opinions, not capable of being true or false. For any information you have, you could always choose the probability you'd like, then calculate backwards to figure out what your prior must have been. This means that unless there is some objectively correct prior, and there isn't, *any* probability is just as valid as any other for any prediction and set of information. When Bayesians are pressed on this, they fall back to the frequentist justifications they supposedly reject, just like Scott did on the first point about Samotsvety.

The typical statistical perspective on non-repeating events is to frame the problem in a super-population. In the context that of predictions that we seem to be in here, there are at least two ways to frame this.

1. As far as we can tell, randomness is fundamental to the structure of reality. In the many worlds interpretation, which is convenient here, the population is the set of realities in the future and the probability that Trump wins is the (weighted) proportion of those realities that contain a Trump victory. There is therefore a real P(Trump) out there, so it makes sense to talk about it. Forecasters are then tasked with estimating it (\hat{P}(Trump)).

I think it is okay to critique whether \hat{P} is a good estimate of P, but even if it isn't, it still tells you about what the forecaster thinks P is, which still at least tells you something about them. Thus, we should ask two questions of a forecast. The first is: What is \hat{P}? The second is: Is \hat{P} a good estimator (i.e. based on good information).

2. The second way to frame the super-population is to consider the set of forecasts that a forecaster makes. If they are well calibrated, then 17% of the time the forecaster gives a probability of 17%, the outcome will come out positive. That is to say that if we take the set of predictions a forecaster makes with probability 17%, 83% of those occasions will have a negative outcome and 17% will have a positive one.

When we encounter one of their predictions in the wild (say P(Trump)) we are essentially picking a prediction randomly from the superset of all their predictions, in which case the probability they give is literally the true probability that their prediction is correct, provided they are well calibrated.

----

Note that both (1) and (2) are frequentist probability interpretations and not subjective probabilities. In that regard, I'm not sure that you are giving a defense of "non-frequentist" probabilities as much as a defense of imperfect "probability estimates."

>In this case, demanding that people stop using probability and go back to saying things like “I think it’s moderately likely” is crippling their ability to communicate clearly for no reason.

I personally feel that in most cases, false precision is the greater danger. Some small number of fuzzy categories often work quite well for quick assessments --

I *think* there's a bit of a stolen valor argument that is hidden behind some of the probability objections.

For example, if I don't encounter people talking about probabilities at all, *except* in the context of Nate silver, superforecasters, or hell even from fiction where probability givers are depicted as smart and "accurate", and then I encounter some random lesswronger saying something like "23% likely". I'm going to assume this internet person is trying to LARP one of those accurate forecasters, and thus come off as if they're calibrated at the sub 1% level, even though (objectively!) most lesswrongers actually suck at calibration and giving good estimates, because they don't practice! (This was true a couple of years back I think, from lesswrong annual survey results)

None of this means we should stop talking about probability, or that it's correct to assume this of normal people who speak in probabilities. But I think the way to combat this isn't to argue about probability definitions, but to sit on top of a giant utility pile from correct predictions, and look down at the blasphemer and say "how droll!".

Someone saying "you're being unnecessarily precise" and then being linked to a single person's set of one thousand well calibrated predictions, well, it's only mildly embarrassing if it happens once. But if entering rationalist spaces means that there is a wall of calibration, you have to be pretty shameless to continue saying things like that.

Of course, this should *only* be done if rationalists can exercise the virtue they claim to preach, and well, I think that's just hard, but I can dream.

Excellent points. We would benefit from even more language to express degrees of certainty. I wish it were commonplace to add a formal confidence rating to one’s estimations. Lacking that, I’ll take the informal one.

That is, at least, until I run off and join the tribe that linguist Guy Deutscher describes in “Through the Language Glass,” whose epistemology of accounts is baked into their grammar. If you tell someone a bear has attacked the village, it is impossible to form the sentence without also disclosing whether your source is hearsay, evidence, or a brush with a bear.

(Every language requires some such disclosures, though most of them less obvious. In English I can tell you I spent all day talking to my neighbor, without ever disclosing his or her gender—but not in French!)

Further tangling the probability/confidence issue is that humans have serious cognitive deficits around low-risk, high-stakes predictions.

If a night of carousing is 20% likely to earn me a hangover, I have to weigh the benefits (ranging from certain to barely extant) against the costs (same range). That is more axes than we know how to deal with. And they don’t end there…suppose instead I am incurring a 2% chance of 10 hangovers (consecutive or concurrent)? Should my decision be the same in both cases? What if it were a 0.02% chance of 1000 hangovers, e.g. a brain hemorrhage?

We divide by zero. We buy lottery tickets, then take up smoking and scrimp on flood insurance.

Our whole criminal system is an attempt to leverage incomplete information by multiplying stakes. You probably won’t be caught shoplifting, but if you are, you’ll be expected to pay not just for the thing you stole, but also for the take of the last ten people.

There are important ways in which probability *is* a function of knowledge. If all you know is the proportion of each color of ball in the urn, chances might be 40%; if you are allowed a split-second peek inside before drawing, and you notice the top layer of balls is 90% a single color, the equation changes—and your preference now plays a role in the odds. (See also: https://en.m.wikipedia.org/wiki/Monty_Hall_problem)

Countries with the least capacity to enforce law and order, tend to do so with the harshest methods. If you can’t lower the risk of people doing bad stuff, you have to up the stakes for the people considering it.

Also, the obvious: the 100 percent knowledge of something coming to pass (or not) collapses the waveform of the future into the event horizon past (mixaphysically speaking), simplifying probability to either 100% or 0%.

That is, if you witnessed the thing happen or not happen (or faithfully measured it in some other way). Even then, our senses could deceive us blah blah, but *if* we really did see it happen, *then* its probability of happening is 100 percent. Was it always 100 percent? If hard determinism holds, then yes; regardless (says the fatalist), it will *now* always have been going to happen. (Time travelers would have to have so many more verb tenses. What is that—imperfect subjunctive?)

Our priors continually adjust to account for the things that might have killed us but didn’t, and we get more and more foolhardy. Or some unlikely disaster upended our life one fateful Tuesday, so now we never leave the house on Tuesdays. We regularly overlearn our lesson, or learn the wrong one.

All of which is to say (as you corollaried much more succinctly) : Probability *is definitely* a function of knowledge—but that just means our estimations have to be the beginning of a conversation, not the end of one.

Aleatory probability and epistemic probability ARE conceptually different things. Yes, they follow the same math; yes, in practice you have both simultaneously; and yes, a lot of what is considered aleatory (rolling dice, to use the obvious example) is really epistemic (if you can measure the state of the dice precisely enough, you can predict the result ~perfectly); but they're different enough (Monty Hall is entirely epistemic, radioactive decay is ~entirely aleatory) that in most other contexts, they'd have different words.

The thing about lightly-held probabilities sounded kind of off to me, because according to Bayes' Law, you're meant to always update when you learn new information, but the thing is you only update for new information, not information you already know, so if there's not much more you could learn about the event until it happens (like with the coin flip), that makes the probability strongly held, and if there is a lot of information that you haven't included in reaching your probability, then there are a lot of things still to update on and it's lightly held. This relates to my other comment on the precision of probailities. If there are a lot of facts available that you don't know, then your probability distribution on what your probability would be if you knew all the relevant facts will have a wide range.

The thing that makes a probability useful information is a combination of it being well-calibrated (sort of a bare minimum of being correct, an easy bar to pass as Scott points out), and high-information in the sense of information theory (high -p ln(p) - (1-p)ln(1-p), or roughly speaking, close to 0 or 1, so high confidence). The latter seems weirdly missing in the section on Samotsvety. Information in this technical sense is closely related to information in the normal sense. If you perform a Bayesian update on a probability, this always increases the information in expectation, and the only way to increase the amount of information in your probability while remaining well-calibrated is to find more relevant facts to update on. (That is, at least if we neglect computational limits. The case of bounded rationality is harder to reason about but I assume something similar is true there too.) Scoring functions in things like forecasting competitions will generally reward both good calibration and high confidence, because acheiving either one without the other is trivial but useless.

Even needing to ask the questions "What Is Samotsvety Better Than You At?" and "Do People Use Probability 'As A Substitute For Reasoning'?" points to how inscrutable forecasts are, especially crowd forecasts as Scott points out.

Whether the rationale is weak or formal as Scott describes it, we'd all be better off if we had all of them, to evaluate alongside the forecast.

This post feels strange to me because while you argue a Bayesian viewpoint, the examples you argue against point to an even more Bayesian approach!

For Bayesian inference, it makes a lot of difference whether you think a coin has 50% chance to come up heads because it looks kind of symmetric to you, or because you've flipped it 10 times and it came up heads 5 times. It matters because it tells you what to think when a new observation comes in. Do you update a lot, to 2/3 or just a little - to 7/13 (i am using Laplace's rule here) ? It depends.

You ideally need to know the entire distribution, instead of a single number. Of course, a single number is better than no number, no arguing there, but the opponents do have a point that "how tightly you hold this belief" is an important piece of information.

> For example, you might think about something for hundreds of hours, make models, consider all the different arguments, and then decide it’s extremely unlikely, maybe only 1%. Then you would say “my probability of this happening is 1%”. ... Sometimes this reasoning process is very weak. You might have never thought about something before, but when someone demands an answer, you say “I don’t know, seems unlikely, maybe 1%”

That's exactly the problem: when you say "the probability of X is 1%", people tend to assume the former process, whereas in reality you employed the latter one. It is also possible that you've thought about everything extremely carefully, took notice of every contingency and calculated all the numbers precisely... but your entire calculation rests on the core assumption that the Moon is made out of green cheese, so your results aren't worth much.

But a single number like "1%" doesn't contain enough information to make a distinction between these scenarios. It should not confer any more gravitas and authority on the predictor than merely saying "meh, I don't think it's likely but whatevs" -- not by itself. But because it is a number, and it sounds all scientific-like, it *does* tend to confer unwarranted authority, and that's what people (at least some of them) are upset about -- not the mere existence of probabilities as a mathematical tool.

I think metaphysical determinism is true, so from that POV I don't think there ever was a "chance" that an event that obtained might NOT have obtained, that's sort of meaningless to even say. Either there was sufficient cause, or there was not, and if sufficient cause existed the event had to obtain. If a person's statement "this is 40% likely to happen" meant "there are some large number of possible universes that could exist, or in some way actually do exist, and in 40% of them Event X will obtain", I'd have a pretty serious problem with *that* claim, and some of the wilder AI discourse on LW veers into those types of fantasyland. But this isn't what's going on with most AI p(doom) discourse.

Most people without thinking too much about this will just translate percentages into a statement about your epistemic limitations in one way or another. Sometimes 40% means "I expect this not to happen, but I have pretty low confidence in that" and sometimes 40% means "this scenario has been designed to produce Event X approximately 40% of the time, within epistemic limits that cannot be overcome", and those are pretty different but in common everyday encounters humans switch between those modes all the time. I've had to give speeches sort of like this during jury selection, to get people to understand what "reasonable doubt" means, to get them to switch from thinking in percentages to thinking about the trial as accumulating enough knowledge to be confident in a conclusion (although the courts in some states don't like you explaining this.)

>Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

Importantly, I think this section is making a normative claim rather than a descriptive claim. And if that wasn't obvious to some, that may be part of what divides the people arguing about that online.

Which is to say: I agree, it would be best if both 'I think pretty likely not' and 'I assign 15% probability' both said nothing at all about your level of information, and no on expected them to.

However, my strong impression is that *in the current reality*, most people do use contextual phrasing to convey something about their level of information when making an estimate.

The way most people actually speak and interpret speech, someone who says '23.5%' is implying they have more information than someone who says 'around 25%', who is implying they have more information than someone who says 'somewhat unlikely'.

Information is often conveyed through that type of rhetorical context rather than directly in words, and I think people who infer that someone is trying to convey that information based on teh way they report their estimates is not wrong most of the time.

(think about movie characters who give precise percents on predictions of things that will happen, they are supposed to be conveying that they are geniuses or masterminds who have used lots of information to precisely calculate things, not trying to convey that they stylistically prefer talking in numbers but have no other information advantage)

Now, again, I agree that it would be better if people didn't use that particular channel of communication that way, because you're correct that everyone talking in percentages would ussually be better, and for that to work it has to be actually disjoint from level of knowledge.

But again, it needs to be recognized that this is a normative claim about how people *should* communicate.

I think it fails as a descriptive claim about how people *do* communicate.

To steelman this a bit, I think that similar to Learned Epistemological Helplessness, this is a case where people's natural reactions are an evolved response to issues not captured in Spherical Cow Rationalism.

Reducing things to a single number will often be more misleading than illuminating. For example, suppose I made a Manifold market called "How will I resolve this market?", and ask you to estimate the probability and then bet on it. And then no matter what you choose, I'll bet the other way and then resolve it in my favor. **There is no probability you could offer that would actually be meaningful here**

A useful trick to distinguish a coin you are pretty sure is fair Vs a coin that might be loaded by any amount, which has big implications as to how much you update your believe after seeing outcomes:

Consider that Probability(tails)=s, where s is a parameter whose value you might be uncertain about.

In the first case Prob s is (close to) 0.5 is (almost) 1, while in the second case, ProbDensity(s=x) = 1 for x in [0, 1].

In the second case, Prob(tails) = \int Prob(tails given s=x ) * ProbDensity(s=x) dx = \int x * 1*dx = 1/2

There is, however, a large difference in how you update your believe about s (and therefore about P(tails) when you observe coin outcomes. In the 1st case you (almost) don't update (if you see 3 tails in a row, you still say P(tails)=0.5). In the 2nd you update a lot (if you see 3 tails in a row, you say P(tails)=0.7 or something that can be calculated).

Try this argument on for size (NB, I posted this argument as a response further down the list, but I wanted to present it to the general audience of this comments section with it a little more of it fleshed out).

I view the problem as involving the timelines of possible universes going forward from the present — and when it comes to the Mars example, what is the set of possible universes that will *prevent* us from getting to Mars in 2050 vs the set that will *allow* us to get to Mars in 2050? I think we can agree that there is an infinity of universes for the former, but a smaller infinity of universes for the latter. After all, the set of universes where the sequence of events happen in the proper order to get us to Mars would be smaller than the set of universes where the sequence of events didn't happen (or couldn't happen).

If these were sets of numbers we could create a bijection (a one-to-one correspondence) between their elements. For instance, the infinite sets of universes going forward where a coin flip was heads vs tails, would be equal. The subsequent branchings from each of those sides of the flip would be infinite, but their infinities would be equal in size. Likewise, before we cast a die, the universes going forward from a roll of a die where a six will come face-up would be one-sixth the size of the infinity of universes going forward where it doesn't. However, the set of universes where I don't bother to pick up the die to roll it is larger than the ones in which I do. But I can't make a bijection of the infinitude of universes where I don't roll the die vs the infinitude of universes where I do and I get a specific result

Likewise, no such comparison is possible between the two sets of where we get to Mars and where we don't get to Mars, and the only thing we can say with surety about them is that they don't have the same cardinality. Trying to calculate the proportionality of the two sets would be impossible, so determining the probability of universes where we get to Mars in 2050 and universes that don't get us to Mars in 2050 would be a nonsensical question.

I don't intend to die on this hill, though. Feel free to shoot my arguments down. ;-)

I would have expected more focus on the capitalist interpretation of probability as representing the odds you'd be willing to take on a bet. Which is not really about the money, but instead a very effective way to signal how your beliefs about the event will influence your future behavior. If you force me to bet on a coin toss, then seeing which side I bet on tells you which side I consider not-less likely. If you vary the odds you can see exactly how much more likely I consider that side. Once you know that you'll also have a sense about how I'm likely to prepare for future events that depend on the outcome of the coin toss.

Jaynes book "Probability Theory, the logic of science" makes a very good description of treating probabilities as measures of state of belief. This is usually described as Bayesian statistics.

The wiki page on "Bayesian probability is also pretty informative.

Savage also has good work showing that, under a suitable definition of rationality, peoples choices maximice expected utility over subjective probability distributions

Often I find myself reading an ACX post where Scott wrestles with a problem that exists because he is too charitable. There's usually about 300 comments already by the time I get to it, and none of them will notice this. He himself will never read my comment, and if he did I'm not at all sure I'd want him to be less charitable in the future. Obviously there are writers I could prioritize that have different ratios of genius and charity, so my consistently reading Scott isn't an accident.

But seriously, come on now. Most people are really terrible at predicting anything and would do worse than random chance in a forecasting contest. I'm probably in this group myself. Anyone who criticizes using probabilities in this way is very obviously also in this group, and should be assumed to have no valid opinion. Being good at using probabilities is table stakes here. I try very hard to avoid using probabilities because I know I shouldn't. In domains in which I am actually an expert I will sometimes say "probably 80% of the time it's this way but that's just pareto; also i'm biased".

tl:dr: the only good response to "you shouldn't use probabilities this way" is "you're right that -you- shouldn't, yes."

Another point I would make is that any use of probability for any real-world situation actually still requires making unprovable assumptions about the future. I think there's an illusion of mathematical rigor when you model coin flips and dice rolls with nice-looking distributions, but when you use the past flips of a coin to say how likely it is to come up heads on the next flip, you are making an assumption that the previous flips and future flips have a very specific relationship. There might be reasons why you think this--but these are real-world, empirical reasons like "I don't think the coin flipper is manipulating the outcome" and are not really different from "last presidential election and next presidential election have related dynamics, a similar voting populace, etc."

Obviously coin flips *feel* like they're necessarily from a consistent data generating process, while elections feel like they may not be, but this is not a mathematical justification for treating them differently.

To steal an example from 3blue1brown, consider the following sequence of questions:

1. I roll a d6. What is the expected value of the result?

2. I flip a coin and based on the result, roll a d6 or d12. What is the expected value of the result?

3. I select some number of dice at random from a d4, d6, d8, d10, d12, and d20, then roll them and add up the results. What is the expected value of this sum?

4. I have a collection of unknown numbers of each of the 6 dice above, and select 5 of them. What is E(sum of rolls)?

5. Same as 4, but I select anywhere from 1 to 10 dice.

6. Same as 5, but I select an unknown number of dice.

7. I have a box containing some number of dice, of unknown number of sides, with unknown values on the sides. What is E(sum of rolls)?

At what point do we go from "nice well defined problem where probability can be applied" to "non-repeating, hard-to-model event where it's irresponsible to give a probability"? You can of course keep going and add more and more fine-grained levels of uncertainty to this sequence of problems, which all just involve the repeatable well-defined process of dice rolling. There's no real distinction between "situations with well-defined distributions (so you can use probability)" and "situations without (so you can't)."

As a commenter I'm also cursed to return to the same response again and again:

Probabilities relate to probability models.

Heavily simplified:

Probabilities work fairly fine if you know already know all the possibilities of what might be true and the logical relations between those possibilities (for example, there is no relevant math question you are unsure about and so on). In that case you just need to assign probabilities to all the known possibilities and then you can plug them into Bayes' theorem and other cool formulae. Also then (and only then) there are cool mathematical guarantees that your initially arbitrary probabilities will come ever closer to the truth as new evidence comes in.

On the other hand probabilistic thinking leads to absolute trainwrecks if there are relevant possibilities you didn't think of at all, that is if there are "unknown unknowns" or, worse, mathematical facts you weren't aware of.

Using probabilities should relate to your level of information because they are useful if your level of information is high and useless if it is low in the sense of your uncertainty being dominated by unknown unknowns. For example, it is normally not a great idea to make bets with experts on things you have no clue about, they will just take your money. (Yeah, you could carefully examine the experts opinions and then adjust for expected overconfidence and salience bias, but that is actually another example of my point: You need to get to fairly high information before probability stops being counterproductive).

And yes, if you use probabilities outside their domain of usefulness that will be a substitute for thinking. You will be plugging numbers into Bayes' theorem which feels very math-y and sophisticated, just like it feels like learning to do more of the math problems you already understand rather than face the confusion of the hard ones.

Btw does anyone know anything about the ACX prediction contest emails? I haven't gotten mine. Should I have more patience or I misremembered submitting my predictions?

> If someone wants to know how much evidence/certainty is behind my “no”, they can ask, and I’ll tell them.

> But it’s your job to ask each person how much thought they put in

I really, really want there to be clear numerical ways to capture this. Giving the full probability distribution (or even, to a lesser degree, summarizing it with variance or standard deviation) captures *some* of it, eg it would help with expressing your certainty about vaccines-cause-autism vs monkey-blood-causes-autism. But that doesn't do anything to capture the top-philosopher vs random-teenager case.

This comes up a lot for me when thinking about doing (mostly informal) Bayesian updating. It strongly seems as though I should update more on the top-philosopher's answer than on the random-teenager's answer, but it's not clear to me how (if at all) the difference is represented in Bayes' theorem.

This may just be my own ignorance, and I'd really love for someone to correct me! But if not, it seems like a failing of how we think about probability.

I think an analogous argument happens regarding using cost-benefit analysis to quantatively evaluate projects or policies to implement. For example, do we build a new bridge, or regulate some kind of industrial pollution.

A first school of thought says "we should try and estimate all the the costs and benefits using some economic model and proceed with the project if it exceeds some benefit to cost ratio threshold"

A second school counters: "we don't actually know the true costs and benefits, you're just modelling or guessing them, plus you can't count environmental benefits and time savings and regulatory compliance costs and safety benefits etc all together. These are categorically different or unquantifiable and your benefit-to-cost ratio is meaningless."

But like with probabilities - where does the second school leave us to judge how we should proceed?

Similarly to calling a probability 'very unlikely', we're left with vague qualitative statements like: 'it's absolutely essential', or 'it's very costly for not much gain', or 'there are both significant costs and benefits'.

In both cases I believe the number is telling us something useful - even if we concede that there are inaccuracies and biases that come with estimating. And if one disagrees with the number, they can go ahead and critique the reasoning in it or undertake their own analysis.

One thing that bothers me with using probabilities in casual conversation is that they are opaque conclusions, not arguments. Scott writes long, evidence-based blog posts. I wouldn't find a tweet saying "To sum up, I think my p(doom) is 20%" to be an adequate substitute for a blog post. An argument can be built on stories about things I'd never heard of before, which I often find interesting even when I disagree with the conclusion.

But that doesn't mean asking for a p(doom) as an opening gambit in a conversation is necessarily bad. The question is, where do you go from there? Do you have any interesting evidence to share, or is just another unadorned opinion?

Surveys and elections and prediction markets have similar problems, but in aggregate. Each data point is an opaque opinion, and we can't go back to ask someone what they really meant when they picked choice C. (Maybe survey questions should have a comment box where people can explain what they meant, if they care to? It seems like it would be useful for debugging.)

But then again, these things happen in a larger context. An election is not just about voting. It's also about the millions of conversations and many thousands of articles written about the election. I believe prediction markets often have conversations alongside them too? It can be pretty repetitive, but there might be some pretty good arguments in there.

I wonder if they could be combined somehow. Suppose that, when voting for a candidate, it was also a vote for a particular argument that you found convincing? There might be a lot of "concurring opinions" to pick from, but knowing which arguments people liked the most would give us better insight into what people think.

(There is a privacy issue to work around, since the concurring opinion you pick might be identifying.)

Don't see any comments about the difference between stochastic and epistemic uncertainty, but I believe it's a large part of this debate. Stochastic uncertainty is like the uncertainty around rolling a dice, epistemic uncertainty is about not knowing how many sides the dice has.

Perhaps we need norms around communicating both. For example, I'm 50% sure Biden will win the election and my epistemic uncertainty is low (meaning it would take very strong evidence to change my probability significantly). I'm also about 50% sure it will be sunny tomorrow, but my epistemic uncertainty is high because any piece of evidence could cause my estimate to vary widely.

I would add to your list of reasons why Samotsvety's "probabilities" of unique events should be called probabilities. Apart from calibration properties you already mentioned, they behave like probabilities in other respects, for example

- they are numbers between 0 and 1

- behave as probabilities under logical operations, such as negation, AND, and OR, for example p(not(A)) = 1 - p(A)

- behave like probabilities in conditioning on events.

It would be weird if humanity did not have a word for sets of numbers with these properties and this word happens to be "probability distribution" for the full set and "probability" for individual numbers in the set.

This is all wrong. Prices replace probability and trading replaces forecasting when faced with single shot events. Markets are the technological solution to mediating contingency when the notion of possibility is incoherent.

I find the beta distribution a really nice intuition pump for this sort of thing. If one person says 50% and their beta distribution is beta(1, 1), then their 50% doesn't really mean much. But if someone says 50% and their beta distribution is beta(1000, 1000), then that's much more meaningful!

Pretty sure that I agree with the defense given here, but felt some whiplash going from "Probabilities Are Linguistically Convenient" to "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To"

Wouldn't it be linguistically convenient for probabilities to describe your level of information?

Maybe not, if the disconnect revolves around a speaker's wish to have options and the listener's wish to be able to understand a speakers level of information when attempting to update their priors. Non-Experts hate this one weird trick.

> Some people get really mad if you cite that Yoshua Bengio said the probability of AI causing a global catastrophe everybody is 20%. They might say “I have this whole argument for why it’s much lower, how dare you respond to an argument with a probability!” This is a type error. Saying “Yoshua Bengio’s p(doom) is 20%” is the same type as saying “Climatologists believe global warming is real”. If you give some long complicated argument against global warming, it’s perfectly fine to respond with “Okay, but climatologists have said global warming is definitely real, so I think you’re missing something”. That’s not an argument. It’s a pointer to the fact that climatologists have lots of arguments, and the fact that these arguments have convinced climatologists (who are domain experts) ought to be convincing to you. If you want to know why the climatologists think this, read their papers. Likewise, if you want to find out why Yosuha Bengio thinks there’s 20% chance of AI catastrophe, you should read his blog, or the papers he’s written, or listen to any of the interviews he’s given on the subject - not just say “Ha ha, some dumb people think probabilities are a substitute for thinking!”

But the thing that makes superforecasters good at their job (forecasting) IS NOT domain expertise! That was the major conclusion of Expert Political Judgement, and was reaffirmed in Superforecasting! The skill of "forecasting accurately" is about a specific set of forecasting-related skills, NOT domain expertise.

If Yoshua Bengio's arguments are presented to Samotsvety, I would trust their probability conclusion far, far more than Yoshua Bengio's. He may have domain expertise in AI, but he does not have domain expertise in forecasting. I remember reading an interview with a superforecaster, talking about professional virologists' forecasts during the early stages of Covid-19, where he would be baffled at them putting 5% odds on something that seemed to him to be way less than 1% probability, like results that would require the exponential growth rate to snap to zero without cause in the next month, well before vaccines. Numbers that, to a professional forecaster, are nonsense.

One of the main things that makes good forecasters good is that they understand what "20%" actually means, and answer the questions asked of them rather than semi-related questions or value judgements. e.g. "Will Hamas still exist in a year?" versus "Do I like Hamas?" or "Will Hamas return to political control of the Gaza strip in a year?" It is this capacity (among others), not domain expertise, that differentiates well-calibrated forecasters from poorly-calibrated ones.

“Sometimes some client will ask Samotsvety for a prediction relative to their business, for example whether Joe Biden will get impeached, and they will give a number like “it’s 17% likely that this thing will happen”. This number has some valuable properties”

Does it really.

Apart from Samotsvety being a cool name for a something (as are most Russian names), 17 percent is one of a group of particularly weaselly forecasting-numbers. It is something that is more likely to happen than “is very unlikely to happen” (approx. 1-5%), but it is less likely to happen than “is unlikely to happen but I would not at all rule it out” (less than 40%).

The problem with such numbers relates to the ability/inability of a critic to say after the fact has happened: “hey you were quite far off the mark there, buddy.” Since with 17 percent, you have some degree of plausible deniability if Biden actually get impeached, and you are accused of being a bad forecaster: “I did not say it was very unlikely – I said it was 17 percent likely, implying that it actually had a non-negligible chance of happening”.

Related: When Trump was elected in 2016, people did not go “wow” on the forecasters who argued beforehand he had a 30 percent chance of being elected because these forecasters put the percentage at 30 percent. They went “wow” at those forecasters because almost everybody else put Trump winning as “very unlikely” (1-5 percent).

The really good forecasters, by the way, were those who put the likelihood of Trump winning at above 50 percent. They were the ones who took a real chance of risking to be falsified.

With “17 percent” you hedge your reputation as a good forecaster, with very little risk of being found out if you are the opposite.

…to be clear, I am talking of unique events that do not belong in a larger group of similar-type events. If you have 10.000 similar-type events, you can investigate if some outcome happens 17 percent of the time, while other outcomes happen 83 percent of the time, and use that to forecast what will likely happen at the 10.001 event. I assume here that “Biden being impeached” does not belong in such a larger group of similar-type events – implying that you cannot falsify a “17 percent probability” prediction by collecting a lot of similar-type events.

It seems like a premise (or an effect, maybe?) of prediction markets is to create a way in which non-frequentist probabilities for different events can be compared. If you use your probability of each event to calculate expected values for bets, and you make bets, and you are able to successfully make money in the long term, doesn't that give some more objective meaning to these probabilities?

It looks like section 4 is a response to my essay Probability Is Not A Substitute For Reasoning https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for). Or rather, it looks like a response to a different argument with the same title. It’s pretty unrelated to my argument so I was wondering if other people had picked up my title as a catchphrase, which can happen, but I can’t find anyone using the phrase except for me and you.

Anyway, you’ll note that my essay never says “don’t use probability to express deep uncertainty”, or anything like that. (I do it myself, when I’m speaking to people who understand that dialect.) Instead I'm objecting to a rhetorical dodge, where you'll make a clam, and I'll ask why I should think your claim is true, and then instead of giving me reasons you'll reply that your claim is a reflection of your probability distribution.

This is especially galling because most people who do this aren't actually using probability distributions (which, after all, is a lot of work). But even for the few who are using probability distributions for real, saying that your claim is the output of a probability distribution is different from giving a reason. In the motivating example (AI timelines), this happens largely because the reasons people could give for their opinions are very flimsy and don't stand up to scrutiny.

Do people use probabilities as a substitute for thinking? Yes: some people, some of the time. The fact that others don't , doesn't negate that. A stated probability estimate can be backed by a process of thought, but doesn't have to be.

If you have a subculture, "science", let's say, where probability estimates backed by nothing are regarded as worthless fluff, people are going to back their probability estimates with something. If you have a subculture, "Bayes", let's say, where probability swapping is regarded as intrinsically meaningful and worthwhile, you needn't bother.

Probabilities don't have to describe the speaker's state of information , but something needs to. If I am offered an opinion by some unknown person, it's worthless: likewise a probability estimate from someone in an unknown information state.

These are all good points, but I don’t think this engages with the fundamental intuition that leads people to see “the odds of pulling a black ball is 45%” as different from “the odds humans will reach Mars by 2050 is 45%”. Which is that the former can be understood as an objective view of the situation, whereas the latter is a synthesis of know information.

Suppose we’re watching a power ball drawing with one minute to go. All the balls are spinning around chaotically in a grand show of randomness. Asked the probability of a 9 being the first number picked, you say 1 in 100. Shortly thereafter they announce the first number was 9. “Dumb luck” you think, but you play back the tape anyway and notice that when you made your prediction the 9 ball was actually wedged in the output area, making it all but certain that the 9 would come out first.

So were you correct in saying the probability was 1 in 100? Yes and no. You were correct in giving the odds based on the information you had, but if you had all the information you would have produced a different (and more accurate) prediction.

On the other hand, given 100 balls randomly distributed in an urn with 40 black, you can say objectively that the probability of pulling a black ball is 40%. This is because the unknown information is stipulated — there’s no real urn where I can point out that most of the black balls are on the bottom. To try to do so would be to violate a premise of the original question.

I think a big reason people dislike probabilities on real events is because they’re imagining that the prediction is supposed to be an objective description (“this is the actual probability, and god will roll a die”) rather than a synthesis of available information.

Jaynes was shouting in my head the whole time reading this. The appendix of his "Probability Theory" mentions probability systems that don't have numeric values. Instead you can only compare events as being more or less likely (or unknown).

The punchline is that, in the limit of perfect information, these systems reduce to standard probability theory. In other words, it's always possible to just go ahead and assign probabilities from the get-go while remaining consistent, which seems like how a mathematician would write Scott's post.

I completely agree with the case made here that it is useful and informative to convey actual numbers to express uncertainty over future outcomes, instead of vagaries like "probably".

That said, there's an adjacent mistaken belief that Scott is not promoting here but I think is widely-held enough to be worth rebuttal. The belief is that there is, for any given future event, a true probability of that event occurring, such that if person A says "it's 23.5!" and person B says "it's 34.7!", person A may be correct and person B may be incorrect.

Here's a brief sketch of why this is wrong. First, observe that you can have two teams of superforecasters assign probabilities to the same 10,000 events, and it's possible for both teams to assign substantially-different probabilities to each of those events, and both teams to still get perfect calibration scores. I won't attempt to prove this in this comment, but it's easy to prove it to yourself via a toy example.

Second, let's say these two teams of superforecasters, call them team A and team B, say that, at the present moment, the probability of human walking on Mars by 2050 is 23.5 and 34.7 respectively. Which team is correct? We can't judge them based on their past performance, because they have the same, perfect calibration score. How about by the result?

Well, let's say a human doesn't walk on Mars by 2050. Which team was right? There's a sense in which you can say team A was "less surprised" by that outcome, since they assigned a lower probability to a human walking on Mars. I think that's the intuition behind Brier scores. But they were still surprised by 23.5 percentage points! So it doesn't feel like this outcome can provide evidence that they were correct. In fact, Joe Idiot down the street, who isn't calibrated at all, said there was a 0.01% chance humans would walk on Mars, and I don't think we would want to say he was more correct than team A.

So it's pretty clear there is no knowably-objectively-correct probability for any given one-off event, outside of a toy system like a stipulated-fair coin (in the real world, are any coin flips actually fair?) That doesn't mean probabilities aren't useful as a means of communicating uncertainty, and it doesn't mean that we shouldn't use numbers vs English-language fuzziness, but I do think it's worth acknowledging that there isn't an underlying-fact-of-the-matter that we're trying to establish when we give competing probabilities.

I think probabilities are well defined, if the question is unambiguous. By this I mean, when looking at almost any world and asking "did the event happen" there is a clear yes/no answer.

Questions to which you can't really assign probabilities.

"will misinformation be widespread in 2026?"

Questions you can assign a probability

"will the world misinformation organization publish a global misinformation level of more than 6.6 in 2026?"

But every time you make the question well defined, you risk that world misinformation organization going bankrupt and not publishing any stats.

On the bright side, it's good that you keep having to explain this. It means that people who need to hear it are hearing it. Some of them are hearing it for the first time, which means you're reaching new people. It's a good sign, however frustrating it may be.

I'll use section 2 as an opportunity to plug [my LessWrong post on unknown probabilities](https://www.lesswrong.com/posts/gJCxBXxxcYPhB2paQ/unknown-probabilities). We have a 50% probability that the biased coin comes up heads, while also having uncertainty over the probability we _would_ have, if we knew the bias. Personally I don't like talking about biased coins [because they don't exist](http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf), but I address in detail the case of a "possibly trick coin": it is either an ordinary fair coin, or a trick coin where both sides are heads.

My main issue is that fractional probability estimates for one-off events are not falsifiable. Events either happen or they don't. If Joe Biden gets impeached, then the perfectly accurate prediction of his impeachment should have been 1. If he doesn't, then it should have been 0.

If I say that he will be impeached with a 17% probability and he does get impeached, well, I say, 17% is not 0. I was right. If he doesn't get impeached, well, I say, 17% is not 100%. I was right.

But if I predict that the L train will arrive on time 17% of the days, then post factum it can be said both whether I was accurate and how close I was to the accurate prediction (maybe, in reality, they arrive 16.7% or 95% of the days).

In the Samotsvety example, we are dealing with someone who predicts one-off events regularly. So, the probability we are talking about is not of the events happening but of Samotsvety's average performance over a series of events they predict. Basically, their numbers are bets which over the course of the prediction game spanning multiple years should yield the smallest difference from the de facto binary event probabilities when compared to other players.

If they say Joe Biden gets impeached with 17% probability, then in the case it actually happens they lose (100 - 17 = 83) points. If he doesn't get impeached they lose 17. The goal of the game is to lose the smallest possible number of points.

Thus, we are dealing with two fundamentally different numbers:

* expected frequency of a recurring event

* a bet on a one-off event in a series of predictions

We don't have a universally agreed upon way to distinguish them linguistically. I think many people grasp this intuitively and disagree to call both with the same word. Even though both are indeed "probabilities": subjective expectations of an uncertain event.

I would like to take historical performance of forecasting teams and round all of their predictions to 0 and 1. If this makes them lose more points than their original predictions then fractional bets make sense in the context of repeated one-off predictions. Otherwise, there's no practical use for them.

A quick Google search didn't take me to where you discuss this fully:

"Do vaccines cause autism? No. Does drinking monkey blood cause autism? Also no. My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology."

Could you please point me (or us) to your full post(s) on this so we can check the autism~vaccines studies ourselves? With RFK Jr's rise, this issue will rear its head even further.

There's a big difference between "I know the distribution very well and it so happens that the mean is 0.5" and "I don't know the distribution therefore I'll start with a conservative prior of equal probability on all outcome and it so happens mean is 0.5". The difference is in how you update on new information. In the first case you basically don't update at all on each subsequent sampling -- cause you sampled a lot before and you're confident about the distribution -- and in the second, each sampling should impact you heavily.

So again, for the first sampling both distributions give you best estimate of 0.5. But not for second etc.

edited Mar 21The funniest thing about those arguments you're rebutting is that the average of a large number of past 0-or-1 events is only an estimate of the probability of drawing a 1. In other words, the probabilities they're saying are the only ones that exist, are unknowable!

Okay, after years of this I think I have a better handle on what's going on. It's reasonable to pull probabilities because you can obviously perform an operation where something obviously isn't 1% and obviously isn't 99%, so then you're just 'arguing price.' On the other hand, it's reasonable for people to call this out as secretly rolling in reference class stuff and *not* having done the requisite moves to do more complex reasoning about probabilities, namely, defining the set of counterfactuals you are reasoning about and their assumptions and performing the costly cognitive operations of reasoning through those counterfactuals (what else would be different, what would be expect to see?). When people call BS on those not showing their work, they are being justly suspicious of summary statistics.

Mentioning Yoshua Bengio attracts typos:

"Yoshua Bengio said the probability of AI causing a global catastrophe everybody is 20%"

"Yosuha Bengio thinks there’s 20% chance of AI catastrophe"

edited Mar 21> you’re just forcing them to to say unclear things like “well, it’s a little likely, but not super likely, but not . . . no! back up! More likely than that!”, and confusing everyone for no possible gain.

There's something more to that than meets the eye. When you see a number like "95.436," you're expecting that the number of digits printed to represent the precision of the measurement or calculation - that the 6 at the end, means something. In conflict with that is the fact that one significant digit is too many. 20%? 30%? Would anyone stake much on the difference?

That's why (in an alternate world where weird new systems were acceptable for twitter dialouge), writing probabilities in decimal binary makes more sense. 0b0.01 express 25% with no false implication that it's not 26%. Now, nobody will learn binary just for this, but if you read it from right to left, it says "not likely (0), but it might happen (1)." 0b0.0101 would be, "Not likely, but it might happen, but it's less likely than that might be taken to imply, but I can see no reason why it could not come to pass." That would be acceptable with a transition to writing normal decimal percentages after three binary digits, when the least significant figure fell below 1/10.

This is one of those societal problem where the root is miscommunication. And frankly it's less of a problem than just the fact of life. I remember Trevor Noah grilling Nate silver that how could Trump win presidency when he predicted that Trump has only in 1/3 chance of winning. It was hilarious is some sense. Now this situation is reverse of what Scott is describing where the person using the probability is using it accurately but the dilemma is same: lack of clear communication.

Sooner or later, everyone wants to interpret probability statements using a frequentist approach. So, sure you can say that the probability of reaching Mars is 5% to indicate that you think it's very difficult to do, and you're skeptical that this will happen. But sooner or later that 5% will become the basis for a frequentist calculation.

If you read through this article you'll see that probability statements drift between statements of degree of belief and actual frequentist interpretations. It's just inevitable.

It's also very obscure how to assign numerical probabilities to degrees of belief. For instance, suppose we all agree that there is a low probability that we will travel to Mars by 2050. What's the probability value for that? Is it 5%, 0.1%, or 0.000000001%? How do we adjudicate between those values? And how do I know that your 5% degree of belief represents the same thing as my 5% degree of belief?

I wonder if there's a bell curve relationship of "how much you care about a thing" versus "how accurately you can make predictions about that thing". E.g. do a football teams' biggest fans predict their outcomes more accurately or less accurately than non-supporters? I would guess that the superfans would be less accurate.

If that's the case, "Person X has spent loads of time thinking about this question" may be a reason to weigh their opinion less than that of a generally well-calibrated person who has considered the question more briefly.

What comments triggered this? I saw Yann LeCun has made the Popperian argument that no probability calculus is possible.

A great project would be to trace how language has evolved to talk about probability, from before Pascal to now.

edited Mar 21Your suggestion to call different concepts of probability different names ("shmrobability") for metaphysical reasons actually makes complete sense. Maybe call frequentist probability "frequency", and Bayesian probability "chance" or "belief", with "probability" as an umbrella term. The different concepts are different enough that this would be useful. "The frequency of picking white balls from this urn is 40%." Sounds good. "The frequency of AI destroying mankind by 2050 is 1%" Makes no sense, as it should; it happens or not. "The chance of AI destroying mankind by 2050 is 1%." OK, now it makes sense. There we go!

> Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

This seems literally wrong to me. Probability and information both measure (in a very technical sense) how surprising various outcomes are. I think they may literally be isomorphic measures, with the only difference being that information is measured in bits rather than per cents.

Your examples are also off-base here. The probability of a fair coin coming up heads when I'm in California is 1/2, and the probability of a fair coin coming up heads when I'm in New York is 1/2, and we wouldn't say that probability is not the same thing as information because 1/2 does not capture what state I'm flipping the coin in. Similarly, the difference between the first two scenarios is not E[# heads / # flips] but E[(# heads / # flips)^2] - (E[# heads / # flips])^2 aka the expected variance of the distribution is different. This is because (1) is well-modelled by *independent* samples from a known distribution, while in (2) the samples are correlated (aka you need a distribution over hyper parameters if you want to treat the events as independent).

I also noticed you didn't touch logical / computationally-limited probability claims here, like P(the 10^100 digit of pi is 1) = 1/10.

edited Mar 21Parts of this are unobjectionable, and other parts are very clearly wrong.

It is perfectly fine to use probabilities to represent beliefs. It is unreasonable to pretend the probabilities are something about the world, instead of something about your state of knowledge. Probabilities are part of epistemology, NOT part of the true state of the world.

You say "there's something special about 17%". No! It's just a belief! Maybe the belief is better than mine, but please don't conflate "belief" with "true fact about the world".

If Samotsvety predicts that aliens exist with probability 11.2%, that means they *believe* aliens to exist to that extent. It does not make the aliens "truly there 11.2% of the time" in some metaphysical sense. I can feel free to disagree with Samotsvety, so long as I take into account their history of good predictions.

(Side note: that history of good predictions may be more about politics and global events than it is about aliens; predicting the former well does not mean you predict the latter well.)

----------

Also, a correction: you say

"It’s well-calibrated. Things that they assign 17% probability to will happen about 17% of the time. If you randomly change this number (eg round it to 20%, or invert it to 83%) you will be less well-calibrated."

This is false. It is easy to use a simple rounding algorithm that guarantees the output is calibrated if the input is calibrated (sometimes you can even *increase* calibration by rounding). If I round 17% to 20% but also round a different 23% prediction to 20%, then it is a mathematical guarantee that if the predictions were calibrated before, they are still calibrated.

Calibration is just a very very bad way to measure accuracy, and you should never use it for that purpose. You should internalize the fact that two people who predict very different probabilities on the same event (e.g. 10% and 90%) can both be perfectly calibrated at the same time.

I'm curious about the demand that probabilities come with meta-probabilities. Would it not anyway be satisfied by Jayne's A_p distribution?

Assume there is a one-shot event with two possible outcomes, A and B. A generous, trustworthy person offers you a choice between two free deals. With Deal 1, you get $X if A occurs, but $0 if B occurs. With Deal 2, you get $0 if A occurs, but $X if B occurs. By adjusting X, and under some mild(ish) assumptions, the threshold value of X behaves a helluva lot like a probability.

I feel that it's in a sense a continuation of the argument about whether it's OK to say that there's a 50% chance that bloxors are greeblic (i.e. to share raw priors like that). The section "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To" specifically leans into that, and I disagree with it.

Suppose I ask you what are the chances that a biased coin flips heads. You tell me, 33%. It flips heads and I ask you again. In one world you say "50%", in another you say "34%", because in the first world most of your probability estimate came from your prior, while in the second you actually have a lot of empirical data.

That's two very different worlds. It is usually very important for me to know which one I'm in, because sure if you put a gun to my head and tell me to bet immediately, I should go with your estimate either way, but in the real world "collect more information before making a costly decision" is almost always an option.

There's nothing abstruse or philosophical about this situation. You can convey this extra information right this way, "33% but will update to 50% if a thing happens", with mathematical rigor. Though of course it would be nice if Bayesians recognized that it's important and useful and tried to find a better way of conveying the ratio of prior/updates that went into the estimate instead of insisting that a single number should be enough for everyone.

And so, I mean, sure, it's not anyone's job to also provide the prior/updates ratio, however it might look like, to go along with their estimates (unless they are specifically paid to do that of course), and people can ask them for that specifically if they are interested. But then you shouldn't be surprised that even the people who have never heard about the Bayes theorem still might intuitively understand that a number "50%" could come entirely from your prior and should be treated us such, and treat you with suspicion for not disclosing it.

I am glad, there is no new example of: ' "Do bronkfels shadwimp?" is binary, no one knows, thus: 50% chance. ' As in the "coin which you suspect is biased but you’re not sure to which side" - which IS 50%. - If A ask about bronkfels and knows/pretends to know: may be 50%. If no one around knows: Chance of a specific verb working with a specific noun; which is less than 1%. - "Are the balls in this bag all red?": around 4%. No surprise if they are, even if you did not know. - "Are 20% purple with swissair-blue dots?" I'd be surprised. And would not believe you did not know before. - "Are they showing pictures of your first student?": 50% really?

> Whenever something happens that makes Joe Biden’s impeachment more likely, this number will go up, and vice versa for things that make his impeachment less likely, and most people will agree that the size of the update seems to track how much more or less likely impeachment is.

There is a close parallel here to the same issue in polling, where the general sense is that the absolute level determined by any given poll is basically meaningless - it's very easy to run parallel polls with very similar questions that give you wildly different numbers - but such polls move in tandem, so the change in polled levels over time is meaningful.

edited Mar 21Something's been bothering me for a while, related to an online dispute between Bret Devereaux and Matthew Yglesias. Yglesias took the position that, if history is supposed to be informative about the present, then that information should come with quantified probabilities attached. Devereaux took the position that Yglesias' position was stupid.

I think Devereaux is right. I want to draw an analogy to the Lorenz butterfly:

The butterfly is a system that resembles a planet orbiting around and between two stars. There is a visualization here: https://marksmath.org/visualization/LorenzExperiment/

It is famous for the fact that its state cannot be predicted far in advance. I was very underwhelmed when I first found a presentation of the effect - it's very easy to predict what will happen, as long as you're vague about it. The point will move around whichever pole it is close to, until it gets close to the other pole, at which point it will flip. Over time, it broadly follows a figure 8.

You can make a lot of very informative comments this way. At any given time, the point is going to lie somewhere within a well-defined constant rectangle. That's already a huge amount of information when we're working with an infinite plane. And at any given time, the point is engaged in thoroughly characteristic orbital behavior. The things that are hard to predict are the details:

1. At time t, will the point be on the left or on the right?

2. How close will it be, within the range of possibilities, to the pole that currently dominates its movement?

3. How many degrees around that pole will it have covered? (In other words, what is the angle from the far pole, through the near pole, to the point?)

4. When the point next approaches the transition zone, will it repeat another orbit around its current pole, or will it switch to the opposite pole?

If you only have a finite amount of information about the point's position, these questions are unanswerable, even though you also have perfect information about the movement of the point. But that information does let us make near-term predictions. And just watching the simulation for a bit will also let you make near-term predictions.

This seems to me like an appropriate model for how the lessons of history apply to the present. There are many possible paths. You can't know which one you're on. But you can know what historical paths are similar to your current situation, and where they went. The demand for probabilities is ill-founded, because the system is not stable enough, 𝘢𝘴 𝘵𝘰 𝘵𝘩𝘦 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴 𝘺𝘰𝘶’𝘳𝘦 𝘢𝘴𝘬𝘪𝘯𝘨, for probabilities to be assessable given realistic limitations on how much information it's possible to have.

Just buy a hardcover copy of ET Jaynes' book and throw it on those people's heads, duh

Quoting: “My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology”

Sorry if this is a naive question but if there are RCTs comparing vaccines to placebos (and not using other vaccines as placebos) with a long enough follow up to diagnose autism I would be keen to see a reference. Just asking because I thought while all the people claiming there is a link are frauds, we didn’t actually have evidence at the level of ‘sure about this as we are about anything in biology’.

edited Mar 21I don't have any objections to phrasing things as "x% likely", and I do this colloquially sometimes, and I know lots of people who take this very seriously and get all the quoted benefits from it, and my constant thought when asked to actually do it myself or to take any suggested "low probability" numbers at face value is "oh God", because I'm normed on genetic disorders.

Any given disorder is an event so low-probability that people use their at-birth prevalences as meme numbers when mentioning things they think are vanishingly unlikely ("Oh, there's no *way* that could happen, it's, like, a 0.1% chance"). It turns out "1 in 1000" chances happen! When I look at probability estimates as percentages, I think of them in terms of "number of universes diverging from this point for this thing to likely happen/not happen". Say a 5% chance, so p=1–(0.95)^n. The 50%-likelihood-this-happens-in-one-universe marker is about p=1–(0.95)^14, which is actually a little above 50%. So 14 paths from this, it'll "just above half the time" happen in one of those paths? ~64% likelihood in 20 paths, ~90% in 45 paths (often more than once, in 45 paths). The probabilities people intuitively ascribe to "5%" feel noticeably lower than this.

"There's a 1 in 100,000 chance vaccines cause autism"? I've known pretty well, going about my daily life, absolutely unselected context, not any sort of reason to overrepresent medically unusual people, someone with a disorder *way* rarer than 1/100k! Probability estimates go much lower than people tend to intuit them. We think about "very unlikely" and say numbers that are more likely than people's "very unlikely" tends to be, when you push them on it, or when you watch how they react to those numbers (people estimating single-digit-percentage chances of a nuclear exchange this year don't seem to think of it as that likely).

There really isn't any possibility of opting out. Even if you reject using probabilities to reason about the future, you have to act as if you did. Clearly assuming a 50/50 chance on any outcome would not be particularly effective, and is not the way anyone actually thinks. I guess I can understand why some people might want to keep their probabilities implicit. Personally, I think reasoning explicitly in % terms helps me notice when some part of my world model is wrong.

This is one of the posts that gave me more sympathy for the opposing view than I had coming in. Now I think only superforecasters should be allowed to use percent-scale probabilities, and all the rest of us mooks shouldn't risk implying that we have the combination of information plus calibration necessary to make such statements.

Sorry if the following has already been said multiple times (it is an obvious observation, so it has probably been mentioned before):

Regarding item (3.) of the text: It is nonsensical, inconsistent and arbitrary to assign 50% probability in a situation where you have zero information and make up two outcomes.

Assume you have an object of unkonwn color. Then: 50% it is red, 50% it is not? But also: 50% it is green, 50% it is purple, 50% it is mauve; 50% it is reddish?

Assume you find in an alien spaceship a book in an unknown language that for some reason uses latin letters. What is its first sentence? 50% that it is "Hello earthling"? 50% for "Kjcwkhgq zqwgrzq blorg"?

What is the probability that all blx are blorgs? 50% of course! What is the probability that *some* blx are blorg? 50%! What is the probability that there is a blx? 50% That there are more than 10? 50% Less than 7? 50% More than a million? 50% More than 10^1000? 50%

"If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%."

As you have anticipated, I object to this for two reasons. Firstly, as a matter of theory, the assignment of priors is arbitrary. The agent need only choose some set of positive numbers which sum to 1. There is nothing to say that one choice is better than another. In particular, there is no reason to think that all outcomes are equally likely and attempting to assign priors based on this rule (the indifference principle) leads to well known paradoxes.

Secondly, as a matter of practice, if our fully uninformed agent assigns a probability of 50% to the world being destroyed by AI, it should also assign the same probability to the world being destroyed by a nuclear war, a pandemic, an asteroid strike, awakening an ancient diety, an accident involving avocado toast, and so on. But the world can only be destroyed once, so these probabilities are inconsistent.

Maybe I'm missing something, but there seems to be an easier way to communicate one's level of certainty without inappropriately invoking probabilities. If instead of saying "There is a 50% probability of X happening", you say "I'm 50% sure X will happen," I don't think anyone would object.

edited Mar 21Why.

It might be shocking to learn, but an Anti-Rationalist like myself will point out as evidence for a Frequentist position that saying "I think X is more likely than Y" is merely a rhetorical phrase that could mean any number of things, and one of the things that it is on average (oh look, some a rhetorical use of the phrase "on average") very unlikely to mean is "I have calculated the probability of X and Y." Probability as a concept is not ontologically interchangeable with the manners of speech that may signify it. Rhetorically mentioning probability is not evidence that probability was used somewhere, or happened, or that the words are a proxy for a probability.

We see here Scott, like with most things, doesn't understand what probability is. But there are other interpretations of what's going on with probability, so sure, we could say he just has a Bayesian interpretation. Except Scott also seemingly doesn't understand what the argument is about, because his arguments don't align with classical bayesian probability either. They DO align with Rationalism. As usual, we find that Rationalists simply take it for granted that their extremely bizarre assumptions about things are correct, and we get weird arguments like, "If you give argument X, obviously you need to read what you're criticising, because I can't possibly be fundamentally missing the point of your argument." It is simply a tenant of Rationalism that Bayesian probability is not merely an interpretation of probability, it's a fundamental aspect of all human reasoning. Or as Scott says here, "A probability is the output of a reasoning process."

Sure, in the sense that mathematics is a reasoning process, and a percentage chance is a probability. But the very fact that "Maybe 1%?" means "I don't know, as of the evidence I have off the top of my head it seems unlikely", is to show that there is no probabilities happening here, non-Frequentist or otherwise. I could summarise I suppose by saying there's equivocation happening between Bayesian probability, which Scott is pretending to defend, and Bayesian Epistemology, which is what he's actually trying to defend.

Which brings about the bigger problem. This post a motte and bailey. "Oh, when I say 20% I don't REALLY mean a 20% chance, I just mean it might happen but I think it might happen less than that guy who said 30%." If that was the case, I would never see attempted mathematical exegeses about AI risk. Unfortunately you see this all the time, including with Scott, and of course with things that aren't AI risk. The real criticism here is that the mathematical language is merely a patina meant to mask the fact that the arguments are fundamentally badly reasoned and rest on faulty assumptions. Accusations that the social media-ites need to just read the papers or trust the science assumes they didn't or don't. Instead one could assume Scott is missing the point completely.

No, the way I use language, terms like "slightly unlikely," "very unlikely", etc. don't translate well into numbers. Medical personnel are always asking patients to rank pain on a 1-10 scale. I know what they mean, but my mind rebels against doing this. It sounds way too exact. "Not the worst pain I've ever had, but deep and persistent," is how I described one, and I just don't know how to put that as a number.

I can attest that drinking monkey blood does not cause autism. Don't ask

"But with humanity landing on Mars, aren’t you just making this number up?" No Scott, because I calibrate.

Usually on the Internet, a debate about X isn't really about X. When someone argues that a bayesian probability is invalid, they're usually just lawyering for their position. More "screw the billionaires and their space hobby," less thoughtful attempt to coverage on accurate world models.

Source: I have a lot of success changing gears in these conversations by a) switching the topic to their true objection, and b) expressing their opinion as a bayesian probability. So when I hear "there's no way you can know the probability of getting to Mars," I might respond with "I mean, sure, there's like a 95% chance that musk is a jerk who deserves to live paycheck to paycheck like the rest of us."

Well. By "success" I mean watching them turn red when, after they respond, I point out their inconsistent treatment of bayesian probability: immediate recognition and scathing dismissal when it goes against their position, routine acceptance when it supports their position. This allows a "successful" disengagement when they respond and I say "you don't have a strong opinion on Mars, you just don't like musk. That's ok, but look - that person over there might want to talk about musk. I'm going to look for someone who wants to talk about Mars. Cheers!"

One day I'll do this to someone who responds with "see, there you go again, that 95% number doesn't mean anything" and the rest of the conversation will be either wonderful or terrible...

There’s a value claim being made here: that there is symmetric risk between false confidence and false humility.

I don’t know about you, but my experience is that false confidence is far more destructive than false humility.

On the other hand ...

I often find, when, say, ordering something by phone, or otherwise asking for a service to be performed that will take some time, an absolute refusal to give a time estimate. Apparently they're afraid to give a number for fear it will be taken as a guarantee.

I treat this by asking about absurdly short and long periods. "Will it take ten minutes?" No, not that quick. "Will it take two years?" No, not that long. "OK, then, we have a range. Let's see if we can narrow it down a bit more." This is frequently successful, and I wind up with something like, for instance, "about 3 or 4 weeks." That's helpful.

Calibration troubles me because probabilities presuppose some sort of regularity to the underlying phenomena, but predictions may have nothing in common at all. As Scott himself once said, you could game for perfect calibration at 17% by nominating 17 sure things and 83 sure misses. Superforecaster skill cannot be evenly distributed over all domains, so the level of trust you place on a superforecaster’s 17% is laden with assumptions. Can anyone resolve this for me? Why should one trust calibration?

It is the wrong question: Will AI destroy the world? The right question is whether we will use AI to destroy the world. Will people use AI predictive power to manipulate us to kill one another more readily than we now do. If AI has developed a theory of mind beyond that of any human it will predict our desires better than we can predict them ourselves. That creates a new lethality for people to use against each other as dangerous as nuclear bomb lethality. Will we do this to ourselves?

edited Mar 21One counter-argument might be that people can use probabilities as a motte-and-bailey.

The motte is: I say "my probability of X is 71%", you say "you're no more of an expert than me, how can you claim such precise knowledge?", and I reply "I didn't say I could, a probability like that is such a shorthand for fairly but not completely confident".

The bailey is: I say "my probability of X is 71%", you say "okay, well I think X is fairly unlikely", and I reply "since I gave a precise probability, I've *clearly* thought about this much more than you! So it's clear whose opinion should be trusted."

I don't for sure if people do that, but I bet they do.

edited Mar 21There is something a bit weird about the whole quest-for-accurate-predictions thing. Perhaps it helps to ask: why do we care if someone states that something is very likely/likely/50-50/unlikely/very unlikely to happen?

In one situation, you care because you want to place a bet on an outcome. Then you hope that nothing will change after you placed the bet, that may change the initial probabilities you assigned to various outcomes.

In another situation, you care because you want to do preventive action, i.e. precisely because you want to motivate yourself – or others – to do something to change the probabilities.

These motives are very different, and so is your motive for offering predictions in the first place. Plus, they change your motive for listening to the probabilities other people assign to events (including how you infer/speculate about the motives they have for offering their probabilities).

Edit: And an added twist: Your own motive for consulting probabilities may be a desire to place bets on various outcomes. But others may interpret the same probabilities as a call to action, leading them to do stuff that changes the initial probabilities. Making it less probable that your initial probabilities are really good estimates to use to place bets. (Unless you can also factor in the probability that others will see the same initial probabilities as a call to action to change those probabilities.)

…all of the above concerns difficulties related to how to factor in the “social” in the biopsychosocial frame of reference when trying to predict (& assign probabilities to) future events where “future human action” has a bearing on what may happen.

I’m pretty sure a government bureaucrat labeling one side of a discussion misinformation and then shutting off debate has a high probability of making the world much worse than it is now…

counterpoint: 97.26% of the silliest things "rationalists" do could be avoided if they just resisted the temptation to assign arbitrary numbers to things that are unquantifiable and then do math to those numbers as if they're meaningful

I think most people are way too averse to probabilities, but also rationalists are too enthusiastic about them. I used to share the enthusiasm, so here's what convinced me that they don't deserve so much credit as rationalists give them:

You can think of reality as being determined by a bunch of qualitatively distinct dynamics, which in probabilistic terms you might think of as being represented by a large number of long-tailed, highly skewed random variables. The outcomes you ask about for forecasting are affected by all of these variables, and so the probabilistic forecasts represent an attempt to aggregate the qualitative factors by their effects on the variables you forecast for.

This is fine if you directly care about the outcome variables, which you sometimes do - but not as often as you'd think. The two main issues are interventions on the outcomes and the completeness of the outcomes.

For intervention, you want to know what the long-tailed causes are so you can special-design your interventions for them. For instance if a political party is losing the favor of voters, they're gonna need to know why, as you are presumably doing just about everything generic they can to gain popularity, and so their main opportunity is to stay on top of unique challenges that pop up.

But even if you just care about prediction, the underlying causes likely matter. Remember, the uncertainty in the outcome is driven by qualitatively different long-tailed factors - the long-tailedness allows them to have huge effects on other variables than the raw outcomes, and the qualitative differences mean that those effects will hugely differ depending on the cause. (There's a difference between "Trump wins because he inflames ethnic tensions" and "Trump wins because black and hispanic voters get tired of the democrats".)

It appears to me that the crux of your argument is that probabilities about events which seem to be one-off events make sense because there's secretly a larger reference class they belong to.

So superforecasters have a natural reference class consisting of all the topics they feel qualified to make predictions on, and you calibrate them across this reference class they have chosen.

So it appears to me that your argument is much simpler than you make it out be - it revolves around the fact that everyone gets to choose their reference class, and while sometimes there's an obvious reference class (like flipping a coin), this isn't so material.

Conversely, if you don't have many examples of predictions to judge people on, then their probability statements are indeed meaningless. If some stranger tells me that there is a 50% chance that aliens land tomorrow, then I really don't know how to integrate this information into my worldview.

edited Mar 21> It seems like having these terms is strictly worse than using a simple percent scale

Lost you here. The idea that we can and should stack rank likelihoods of different events is a good point and seems right, but converting the ranking to percents seems to be where people are just pulling numbers out of thin air.

You sort of address that in the next paragraph but I feel like I need more than a few sentences on this to be convinced. I was getting ready to agree from the racing argument.

Just like probabilities are meaningful in that they compare two outcomes, confidence is only meaningful when it compares two questions. Since we like to think about probabilities as betting odds, certainty tells you how you would distribute your capital between a large number of questions. It’s always smartest to bet your best guess, but you’ll put more weight on the ones with higher confidence. That’s how I think about it anyways.

Can we reconcile frequentists and non-frequentists by saying that it's all about the frequencies across multiple events for a given forecaster? I use forecaster as a concept of something that is able to make multiple forecasts, whether it's a single machine learning model, a human being, a group of people (like the wisdom of crowd, or predictions from random people), or a process to combine any such input to make a forecast. All the arguments you make about forecasts on non repeatable events being useful revolve around the fact that if you look at a collection of such events, they are indeed useful. So why not talk about the collection itself and not single events in isolation, which inevitably gets some people uneasy? We can just consider that this collection of predictions is a specific forecaster and peacefully discuss it.

If I have a model that I use once, no one has any info on how it's been built, and it tells you that it's going to rain at 90% probability tomorrow, what would you do with this? The only way to use this information in relationship with previous models submitted by random people.

What you are arguing for is one of three possible philosophical approaches to defining probability. There is the approach that it is mathematical--a probability of 0.5 for heads is embedded in the mathematical definition of a fair coin. There is the approach that is empirical--as you flip many coins, the frequency of heads approaches 50 percent. Finally, there is the notion that probability is subjective. I believe that the probability of heads is 0.5, but you could believe something else.

You are insisting on the subjective interpretation of probability. It is what one person believes, but it need not be verifiable scientifically. Other approaches make "the probability that Biden wins the election" an incoherent statement. The subjective definition has no problem with it.

You might also get a kick out of Cox's theorem, which "proves" that any "reasonable" view of probability is Bayesian. https://en.wikipedia.org/wiki/Cox%27s_theorem . There is a reasonably accessible proof in one of the appendices in *Probability Theory: The Logic of Science*.

You can still poke holes in this, particularly if you have some sort of Knightian uncertainty, but it's still pretty interesting.

Is it generally held that probabilities of the urn-kind are truly different in kind (rather than degree) from one-off predictions?

In assigning a probability to drawing a certain type of ball, I am taking the ultimately unique, once-in-history event of drawing a particular ball in a particular way from a particular urn, and performing a series of reductions to this until it becomes comparable to a class of similar events over which an explicit computation of probability is tractable. We take the reductions themselves to be implicit, not even worth spelling out (how often is it remarked on that the number of balls must stay constant?).

It seems like this reduce-until-tractable procedure would be enough to assign a probability to most one-off events, and is (roughly) what is actually done to get a probability (allowing here for the reduction to terminate at things like one's gut feeling, in addition to more objective considerations).

Is there something wrong about this account, such that urn-probabilities can really be set apart from one-offs in a fundamental way?

To go back to your example of fair coin flip vs biased coin flip vs unknown process with binary outcome, the reason you end up with 50% for all of them is because the probability is a summary statistic that is fundamentally lossy. It's true that if all you're asked to do is choose to predict a single even from that process, your "50%" estimate is all you need. But the minute you need to do anything else, especially adjust your probability in light of new information, the description of the system starts mattering a lot: The known-fair coin's probability doesn't budge from 50%, the biased coin's probability shifts significantly based on whatever results you see, and the unknown process's probability shifts slowly at first, then more quickly if you notice a correlation between the outcome and the world around you (if the process turns out to be based on something you can observe).

Summarizing to a single number loses most of that usefulness. It's less lossy than "probably not", and you're right to defend against people who want to go in that direction, but it's not that much less lossy. And in a world where we can send each other pages of text (even on Twitter now!) there's not much value in extreme brevity.

I tend to take more seriously people who offer at least a few conditional probabilities, or otherwise express how much to adjust their current estimate based on future information.

> Is it bad that one term can mean both perfect information (as in 1) and total lack of information (as in 3)? No. This is no different from how we discuss things when we’re not using probability.

I kind of disagree. I think many people have converged on a protocol where you give a probability when you have a decent amount of evidence, and if you don't have a decent amount of evidence then you say something like "I guess maybe" or "probably not".

Some people (people who are interested in x-risk) are trying to use a different protocol, where you're allowed to give a probability even if you have no actual evidence. Everyone else is pushing back, saying, "no, that's not the convention we established, giving probabilities in this context is an implicit assertion that you have more evidence than you actually do".

Scott is arguing "no, the protocol where you can give a probability with no evidence is correct" but I don't feel like his examples are convincing, *for the specific case where someone has no evidence but gives a probability anyway*, which is the case we seem to be arguing about.

I wish we could fix it by switching to a protocol where you're allowed to give a probability if you want, but you have to also say how much evidence you have.

The non-frequentist view of probability is useful even if it has no connection to the real-world. It's a slider between 0 and 1 that quantifies the internal belief of the one making the statement. There are a myriad of biases that will skew the belief though. However, everyone's probability wouldn't be the same because the "math" people do in their heads to come up with these numbers will be inconsistent. Some will curve towards higher probabilities, others will go low - one person's 40% need not correspond one-to-one to another person's 40%. We should call it something like "opinion-meter" unless there is a more formal data aggregation process to come up with the numbers (thus ensuring consistency)

Part of the issue is that many people who complain use numbers as substitutes for other ideas:

-50/50 does not mean even odds, it means 'I don't think this is something anyone can predict'

-99.9% does not mean 999/1000, it means 'I think it's really likely'

These linguistic shortcuts are fine for everyday conversation, but when others use numbers trying to convey probabilistic reasoning, this first crowd often defaults to their own use of numbers and concludes - 'oh, you can't actually know stuff like this - and it is weird that you are using non-standard fake probability numbers'.

On a related note, I've dealt with many in the medical field who (perhaps for liability reasons, perhaps for other reasons) are strongly opposed to trying to phrase anything in a probabilistic nature. My kid was preparing for a procedure which was recommended, but I knew didn't always have great outcomes. I ask one doctor how likely it is that the procedure will work: 'oh, 50/50'. I ask another one: 'there's a pretty good chance it will work, but no guarantees', but neither will provide anything more informative or any evidence behind their output. But I learned that, at least with some doctors, if I said: if this procedure were to be performed, say, 80 times in situations comparable to this one, how many of those 80 procedures would you expect to go well? With this sort of phrasing, most of the doctors could give answers like 50-65 (out of 80) which was far more satisfying of a response for me. But I imagine for the average patient, this extra information wouldn't add much to the experience, so they are inclined to simply keep things vague.

edited Mar 21> I don’t understand why you would want to do this. If you do, then fine, let’s call it shmrobability. My shmrobability that Joe Biden will be impeached is 17%...

Scott, maybe a new word is a better idea than you think? Imagine if you and other Bay Area rationalists pledged to adopt the top-voted new words for probability & evidence & rationalist, provided that 500 people with standing in your community vote. I would not be a voter, but I can offer suggestions:

- probability = subjective confidence (this clarifies it is not measurable like a frequency is);

- evidence = reason to believe (this avoids overlap with either frequentist science or the legal system);

- rationalist community = showing-our-work community (sounds less arrogant; leaves open the possibility that there is a better methodology of systematized winning; avoids confusion with either Descartes or Victorian-era freethinkers).

Then you wouldn't have to keep writing posts like this!

Instead, you could focus on spreading the word about superforecasters. (Based on their comments, some critics didn't even notice that section.) But if you seem to be engaging in a philosophical/definitional debate, then that's what you will usually get, perpetually.

> if you want to find out why Yosuha Bengio thinks there’s 20% chance of AI catastrophe, you should read his blog, or the papers he’s written, or listen to any of the interviews he’s given on the subject - not just say “Ha ha, some dumb people think probabilities are a substitute for thinking!”

On the other hand, if you want status, that may well be the optimal response. I think people care orders of magnitude more often about status than about why someone else thinks something.

"If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%."

Recently, I saw an "expert" put the probability of AI-driven human extinction between 10% and 90%. Now, this would average as a 50% probability, but it would mean both a lot less and a lot more than a simple statement of a 50% probability. It conveys that it is very unlikely that AI will be quite harmless, but a bad outcome is also by no means certain. Also, all probabilities between 10% and 90% seem (incredibly) to be equally likely to him. This looks like a pretty strange belief system, but it's surely logically consistent. But then, a straight-up 50% assessment would have the meaningful frequentist property of being right half of the time, if well-calibrated. But then, in the context of human extinction, does it really matter? I guess, the 10%-90% statement could mean that based on the current evidence, it is equally likely to find a probability that has the required long-run properties of being right in the range of probabilities from 0.1 to 0.9. (With the understanding that a long-run probability here requires some sort of multiverse to be meaningful.)

What if I said that my probability lies somewhere between 0% and 100%? By saying this, I will add no information to the debate (as I have 0 knowledge on the matter), but would still claim a 50% average probability of human extinction? I find this hard to believe...

I think you could remove all of my objections if you replaced "probability" with "confidence" in this post, and also assigned a confidence interval that is informed by your (well, not necessarily you personally, but whoever is providing these numerical values) skill as a forecaster.

Saying "my probability" is, to me, similar to saying "my truth".

The problem is that there is a massive equivocation here.

You're arguing for using the language of probability for situations in which we have little information or ability to rigorously model. You acknowledge that these situations are not quite the same as those (like balls in urns) where we have high information and ability to model. You even say one might want to call the former something different (e.g., a "shmrobability.")

But the simple fact is that the situations are not all the same. The low information ones really do involve people "pulling numbers out of their ass," and this happens all day long. How many conversations that begin "What's your P(doom)?" involve such shit-covered numbers?

For what it's worth, the following summarizes my position pretty well. Yes, I know you purport to address it in the post, but I don't think your discussion of it really does.

https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for

Frequentism doesn’t actually exist, metaphysically.

No two events are exactly the same, so we are always performing some kind of conceptual bundling or analogy, in order to use past events in creating probabilities.

Consider a coin flip. Percy Diaconis showed that a coin is more likely to land on the face that began pointing up. So a good frequentist should only use past instances of coin flips that started in the same orientation as theirs, in forming a probability.

But then, the size and shape of the coin also impact this effect, so they should only use flips that match those as well. And each coin has a unique pattern of wear and tear, so it better be this exact coin.

And actually, the angle of the flipper’s hand and the level of force they apply and the breeze in the room are also key…

As it turns out, Diaconis solved this too: he built a machine that precisely controls all elements of the flip, and produces the same result each time.

The probabilistic variance in the flip comes from the aggregation of these disruptive mitigating factors.

Frequentists imagine they are insisting on using only past perfect reference classes, but to actually do this, you’d need to set up the entire universe in the same configuration. And if you did, then the result would be deterministic, making probability irrelevant.

The fact is that every probability is secretly Bayesian. You are always drawing your own conceptual boundaries around distinct and deterministic events in order to create an imperfect reference class which is useful to you.

Frequentists are just arguing for a tighter set of rules for drawing these Bayesian boundaries. But they are also Bayesians, because it’s the only conceptual framework that can support actual forecasting. They’re just especially unimaginative Bayesians.

And this is allowed! You can be a tight Bayesian. But if you want to call it frequentism and insist there is a bright metaphysical line, you need to explain exactly where the line is. And you simply can’t do so, without sacrificing probability altogether.

(Obviously this is weird metaphysical nitpicking, but people in weird metaphysical glass houses should not throw weird metaphysical stones.)

So the thing is that everything you've said is correct and this is a good article and people who disagree would have to work very hard to convince me it's wrong. The other side of the debate is making a type error and not communicating their objection well.

But the objection could be: "You've imposed language requirements unique to your culture on making arguments and then discounted arguments that don't use that language, ensuring that you'll discount any critique that comes from outside your bubble." If so, this isn't much different than objections to culture war stuff, or academic fields gatekeeping.

All probabilities are metaphors - to a greater or lesser extent. Perfectly serviceable as such.

One teensy additional thought: Wharton business professor Phil Rosenzweig stresses the difference between probabilities where you can affect the outcome (e.g. NASA scientists thinking about moon landing) and those where we have no control (e.g. climate change). See a good breakdown in Don Moore and Paul Healy (2008): “The Trouble with Overconfidence”.

You seem to be defending not just non-frequentism about probabilities, but a sort of *objective* non-frequentism. At least, that’s what’s going on in the section about Samotsvety. At least, in the sentence where you say, “If someone asked you (rather than Samotsvety) for this number, you would give a less good number that didn’t have these special properties.”

I claim that the number .17 doesn’t have any of these special properties. The only numbers that could have objectively special properties for any question are 1 or 0. If you assign the number .17, and someone else assigns 1 and someone else assigns 0, then the person who does the best with regards to this particular claim is either the person who assigned 1 or the person who assigned 0. (They bought all the bets that paid off and none of the bets that didn’t, unlike the person who assigned the other extreme, who did the absolute worst, or the person who assigned .17, who bought some of the bets that paid off and some of the ones that didn’t.)

However, there are *policies* for assigning numbers that are better than others. Samotsvety has a good *policy* and most of us can’t come up with a better policy. No one could have a policy of assigning just 1s and 0s and do as well as Samotsvety unless they are nearly omniscient. (I say “nearly” because a person who is assigning values closer to 1 and 0 and is *perfectly* omniscient actually gets much *more* value than someone who assigns non-extreme probabilities no matter how well the latter does.)

But the goodness of the policy can’t be used to say that each particular number the policy issues is good. Two people who are equally good at forecasting overall may assign different numbers to a specific event. If you had the policy of always deferring to one of them, or the policy of always deferring to the other, you would very likely do better than using whatever other policy you have. But in this particular case, you can’t defer to both because they disagree. Neither of them is “right” or “wrong” because they didn’t assign 1 or 0. But they are both much more skilled than you or I.

This is no weirder than any other case where experts on a subject matter disagree. Experts who are actually good will disagree with each other less than random people do - even computer scientists who disagree about whether or not P=NP agree about lots of more ordinary questions (including many unproven conjectures). But there is nothing logically impossible about experts disagreeing, and thus the *number* can’t have the special properties you want to assign to it.

I prefer the term "credence" for the sort of number that you get from forecasting, "probability" for the sort of number that you get from math, and "frequency" for the sort of number you get from experiment. "The probability for an ideal coin to land heads is 50%. The observed frequency for this particular coin to land heads is 49%. My credence in the hypothesis that this is an unbiased coin is 99%."

About 23.2% of the things I ever say I'm quoting dialogue from a movie. Not "you talking to me" but more ""I have a lawyer acquaintance” or "it's my own invention". In the film high Fidelity the protagonist asks his former girlfriend what are the chances they'll get back together and she says something like "there is a 57% chance we'll get back together" I use this all the time. I tell my wife "there's a 27% chance I will remember to pick up bananas at the store", allowing her to make whatever adjustments she thinks are necessary.

People decide what they think are the rational prices for stock futures and options all the time, despite that each one is about a singular future event.

edited Mar 21Frequentist probabilities are just model-based Bayesian probabilities where the subjective element is obscured.

Parable: I have an urn containing 30% red marbles and 70% blue marbles. I ask four people to tell me the odds that the first marble I'll draw out will be red. The first guy says 30% because he knows that 30% of the marbles are red. The second guy says 100% because he is the one who put the marbles in the urn, and he put in blue ones, followed by the red ones, so the red ones completely cover the blue ones, meaning the first one I touch will be red. The third guy says 30%, because he knows everything the other two guys know, but also knows that I always vigorously shake the urn for a solid minute before I draw out a marble. The fourth guy says 0% because he is a mentat who has calculated the exact dynamics of the marbles and of my searching hand given the initial conditions of the Big Bang and knows it will be blue.

All allegedly "Frequentist" probabilities are like this. You smuggle in your knowledge about the process to structure and screen off uncertainties, such that the remaining uncertainty is Bayesian, or, in other words, based entirely in unquantifiables. You then pretend that you have done something different than what Bayesians do.

> If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%.

This approach has some problems. Following it, you would assign p(Atheism)=0.5, p(Christianity)=0.5, p(Norse Pantheon)=0.5.

Of course, any of these possibilities is mutually exclusive, so the probabilities can't add up to more than one. You could simply say "okay, then probabilities are inverse to the number of mutually exclusive options". But should Christianity really just count as one option? Different sects surely differ in mutually exclusive details of their theology! Perhaps you should have at least two options for Christianity. Then someone invents the Flying Spaghetti Monster as a joke. If you work from zero knowledge, it would look just as probable as Atheism.

Someone generates a random number. What is the probability that it is two? What is the probability that it is 1/pi? Is the expected imaginary part of the number zero?

Without additional information ("the number is returned as a signed 32 bit integer"), I think it is very hard to form defensible priors for such questions.

I've been trying for a while to explain some of the confusions identified in this article, and even wrote an EA Forum post on it last year: https://forum.effectivealtruism.org/posts/WSqLHsuNGoveGXhgz/disentangling-some-important-forecasting-concepts-terms . I've been struggling to get feedback and might find the distinctions aren't helpful, but every time a read an article like this I keep thinking "Geez I wish we had more standardized language around forecasting/probability."

The main point of my forum post is that people often conflate concepts when talking about the meaning or practicality of forecasts—perhaps most notoriously for EA when people say things like "We can't make a forecast on AI existential risk, since we have no meaningful data on it" or "Your forecasts are just made up; nobody can know the real probability." Instead, I recommend using different terms that try to more cleanly segment reality (compared to distinctions like "Knightian uncertainty vs. risk" or "situations where you don't 'know the probability' vs. do know the probability").

Slightly rephrased, people sometimes demand "definitive" evidence or estimates, but they don't know what they mean by "definitive" and haven't really considered whether that's a reasonable standard when doing risk assessments. I think it's helpful to define "definitiveness" in at least one of two dimensions:

1) how much do you expect future information will change your best estimate of X (e.g., you might now think the probability is 50%, but expect that tomorrow you will either believe it is 80% or 20%);

2) how difficult is it to demonstrate to some external party that your estimate of X was due to good-faith or "proper" analysis (e.g., "we followed the procedure that you told us to follow for assessing X!").

The terms I lay out in my forum post are basically:

• "The actual probability": I don't really explain this in the article because quantum randomness is a big can of worms, but the point of this term is to emphasize that we almost never know ""the probability"" of something. If we decide quantum randomness is merely "indeterminable by physical observers [but still governed by physical laws/causality]" as opposed to "actually having no cause / due to the whims of Lady Luck," the probability of some specific event is either 100% or 0%. For example, a flipped coin is either 100% going to land heads or 0% going to land heads; it's not true that the event's "actual/inherent probability is 50%."

• "Best estimate": This is what people often mean when they say "I think the probability is X." It's the best, expected-error-minimizing estimate that the person can give based on a supposed set of information and computational capabilities. In the case of a normal coin flip, this is ~50%.

• "Forecast resilience": How much do you expect your forecast to change prior to some (specified or implicit) point in time, e.g., the event occurring. For example, if you have a fair coin your 50% estimate is very resilient, but if you have a coin that you know is biased but don't know the direction of the bias, and are asked to forecast whether the 10th flip will be heads, your initial best estimate should still be 50% but that forecast has low resilience (you might find that it is clearly biased for/against heads). *This seems like a situation where someone might say "you don't know the probability," and that's true, but your best estimate is still 50% until you get more information.*

• "Forecast legibility/credibility": I don't have a great operationalization, but the current definition I prefer is something like “How much time/effort do I expect a given audience would require to understand the justification behind my estimate (even if they do not necessarily update their own estimates to match), and/or to understand my estimate is made in good-faith as opposed to resulting from laziness, incompetence, or deliberate dishonesty?” Forecasts on coin flips might have very high legibility even if they are only 50%, but many forecasts on AI existential risk will struggle to have high legibility (depending on your estimate and audience).

I'm missing an information-theoretic related answer here. There are two factors here, first, the number of "poll options" you can choose from, and second, which poll option you chose (and how that translates to the probability).

For whole percentages, you've got 101 options, 0 to 100. For percentages with three fractional digits, you've got 100,001. For [probably | probably not | don't know] you've got three options, and for just [don't know] you've got one option.

So, estimating the probability of the three scenarios, I'd answer as follows:

1) Fair coin: the 101st option of 201 options. (e.g. the one in the middle)

2) Biased coin: the 5th option of 9 options. (e.g. the one in the middle)

3) Unspecified process: the 1st option of 1 option. (e.g. the one in the middle)

In all the cases, the best answer is "the one in the middle" which best corresponds to 50%, but the answers are not at all alike. The number of options I chose reflects my certainty or lack thereof.

You got em Scott Alexander. You went out there and got em.

great discussion. We use language ... it facilitates quite a lot, but we over ascribe certain powers and utilities to language. As Wittgenstein concluded at point, most of philosophy is a 'language game', and likely most other cherished beliefs are language games. Games are good, but they are not 'truths' of the universe. At best, means of organizing, conveying, and working with vague notions and data.

Scott,

I think you need to watch “Dumb and Dumber” again; what is the semantic content of 1 in a million?

In a similar vein, do people really act in a way that one should take seriously their non zero predictions of disaster? What would a “rational” actor do if, for example, he or she thought there was a real chance of say being in a plane accident in the next Month? Would you expect to meet them at a conference overseas somewhere?

How does sharing or researching a probability affect that probability? Everything we make a "thing" affects the quality of what we've "thinged." Human mind, inquiry, and dialogue seems to me to have a significant observer effect on itself. I guess this gets at the metaphysical aspect you mention?

Ben Goertzel took a stab a while back at turning the idea that you can have uncertainty in your probability estimates into an approach to computation: https://en.wikipedia.org/wiki/Probabilistic_logic_network

The gist is that rather than having a single number represent all the information you have about probabilities, you carry around probability ranges, and propagate them through belief networks through a fairly simple set of rules. The expectations might come out the same when looking at a 50% event whether you have a large or small interval around 50%, but other mechanics change as you add information or connect events together.

At least back then he was planning to make this the core of his approach to AI; I don't know if that has held up. It always seemed to me like an interesting idea, but I never quite dug deep enough to see if it was actually simplifying things enough to make it more useful than just carrying around full distributions and doing the Bayes thing properly (which can be computationally difficult when you're dealing with big networks).

One approach would be to recognize that there is a spectrum of kinds of uncertainties.

On the one side you have uncertainties in the territory of the form "I don't know if this radioactive nucleus will decay within the next half life". From out current physics understanding, even if you know all the physics there is, and the wave function of the universe, you will still be stuck making probabilistic predictions regarding quantifiable outcomes.

Then you have quantifiable uncertainties in the map like "we don't know the Higgs mass to arbitrary precision". At least to the degree that the standard model is true (spoiler: it is not), the Higgs mass has an exact value, which we can measure only imperfectly.

Before the Higgs was discovered, there was a different kind of uncertainty regarding it's existence. This is of course much harder to quantify. I guess you would have to have an assemble of theories which explain particle masses (with the Higgs mechanism being one of them) and penalize them by their complexity. Then guess if there are any short, elegant mechanisms which the theoreticians did not think of yet. (Of course, it might well be that an elegant mechanism reveals itself in the theory of everything, which we don't have yet.)

Both "Does the Higgs exist?" and "What is the exact mass of the measured Higgs?" are uncertainties which exist on our map, but I would argue that they are very different. The former is Knightian, the latter is well quantifiable.

For p(doom), there are two questions which are subject to Knightian uncertainty:

* How hard is it to create ASI?

* How hard is it to solve alignment before that?

It might be that we live in a world where ASI is practically impossible and AI will fizzle out. Or in a world where the present path of LLMs inevitably leads to ASI. Or in a world where getting to ASI by 2100 will require some "luck" or effort which mankind might or might not invest. p(ASI) is just the expected value over all these possible worlds. I would argue that the probability density function is a lot more interesting than the expected value.

In particular, I would expect that the regions around zero ("developing ASI is actually as hard as travelling to Andromeda") and one ("we will surely develop ASI unless we are wiped out by some other calamity in the next decade") have a significant fraction of the probability mass, with some more distributed in the middle region.

I think one thing that trips people up is "normal" versus "weird" distributions.

Normal:

I'm throwing darts at a board. I'm pretty good and hit the bullseye exactly 25% of the time. The remaining throws follow a normal distribution. You can bet that my average throw will land closer to the center than my twin, who hits the bullseye only 2% of the time.

Weird:

A robot throws darts at a board. It hits the bullseye exactly 25% of the time. Because of a strange programming error the other 75% of the time it throws a dart in a random direction.

If a prediction market gives a 60% chance of landing on Mars by 2050, some of the prediction follows a normal distribution. Eg, maybe there's a 50% chance by 2047, and a 67% chance by 2055. It's intuitive if there's a 60% chance of success, in the 40% of failures we should at least be close. But some of the "no" percentage follows a weird distribution. Eg, international nuclear conflict breaks out and extracurricular activities like space travel are put on indefinite leave.

I think weird outcomes lead to post hoc dismissal of prediction. If the dart thrower slips and throws into the ground, and we laugh at the 25% bullseye chance. If in 100 years all AI is chill and friendly we'll think that the 20% chance of existential threat we give it now is nonsense.

But weird probabilities don't work that way.

Apologies if someone has already posted this, but I think it's a fun, illuminating graphic connecting language usage and probability.

https://github.com/zonination/perceptions

The problem is that lots of Bayesianists like to pretend that the probabilities are somehow scientific or objectively true, but if you take the philosophical underpinnings of Bayesian probabilities seriously it's clear that they are purely subjective opinions, not capable of being true or false. For any information you have, you could always choose the probability you'd like, then calculate backwards to figure out what your prior must have been. This means that unless there is some objectively correct prior, and there isn't, *any* probability is just as valid as any other for any prediction and set of information. When Bayesians are pressed on this, they fall back to the frequentist justifications they supposedly reject, just like Scott did on the first point about Samotsvety.

The typical statistical perspective on non-repeating events is to frame the problem in a super-population. In the context that of predictions that we seem to be in here, there are at least two ways to frame this.

1. As far as we can tell, randomness is fundamental to the structure of reality. In the many worlds interpretation, which is convenient here, the population is the set of realities in the future and the probability that Trump wins is the (weighted) proportion of those realities that contain a Trump victory. There is therefore a real P(Trump) out there, so it makes sense to talk about it. Forecasters are then tasked with estimating it (\hat{P}(Trump)).

I think it is okay to critique whether \hat{P} is a good estimate of P, but even if it isn't, it still tells you about what the forecaster thinks P is, which still at least tells you something about them. Thus, we should ask two questions of a forecast. The first is: What is \hat{P}? The second is: Is \hat{P} a good estimator (i.e. based on good information).

2. The second way to frame the super-population is to consider the set of forecasts that a forecaster makes. If they are well calibrated, then 17% of the time the forecaster gives a probability of 17%, the outcome will come out positive. That is to say that if we take the set of predictions a forecaster makes with probability 17%, 83% of those occasions will have a negative outcome and 17% will have a positive one.

When we encounter one of their predictions in the wild (say P(Trump)) we are essentially picking a prediction randomly from the superset of all their predictions, in which case the probability they give is literally the true probability that their prediction is correct, provided they are well calibrated.

----

Note that both (1) and (2) are frequentist probability interpretations and not subjective probabilities. In that regard, I'm not sure that you are giving a defense of "non-frequentist" probabilities as much as a defense of imperfect "probability estimates."

edited Mar 21>In this case, demanding that people stop using probability and go back to saying things like “I think it’s moderately likely” is crippling their ability to communicate clearly for no reason.

I personally feel that in most cases, false precision is the greater danger. Some small number of fuzzy categories often work quite well for quick assessments --

https://en.wikipedia.org/wiki/Words_of_estimative_probability

I *think* there's a bit of a stolen valor argument that is hidden behind some of the probability objections.

For example, if I don't encounter people talking about probabilities at all, *except* in the context of Nate silver, superforecasters, or hell even from fiction where probability givers are depicted as smart and "accurate", and then I encounter some random lesswronger saying something like "23% likely". I'm going to assume this internet person is trying to LARP one of those accurate forecasters, and thus come off as if they're calibrated at the sub 1% level, even though (objectively!) most lesswrongers actually suck at calibration and giving good estimates, because they don't practice! (This was true a couple of years back I think, from lesswrong annual survey results)

None of this means we should stop talking about probability, or that it's correct to assume this of normal people who speak in probabilities. But I think the way to combat this isn't to argue about probability definitions, but to sit on top of a giant utility pile from correct predictions, and look down at the blasphemer and say "how droll!".

Someone saying "you're being unnecessarily precise" and then being linked to a single person's set of one thousand well calibrated predictions, well, it's only mildly embarrassing if it happens once. But if entering rationalist spaces means that there is a wall of calibration, you have to be pretty shameless to continue saying things like that.

Of course, this should *only* be done if rationalists can exercise the virtue they claim to preach, and well, I think that's just hard, but I can dream.

>Giving an integer percent probability gives you 100 options.

Don't you mean 101 options? :-)

Excellent points. We would benefit from even more language to express degrees of certainty. I wish it were commonplace to add a formal confidence rating to one’s estimations. Lacking that, I’ll take the informal one.

That is, at least, until I run off and join the tribe that linguist Guy Deutscher describes in “Through the Language Glass,” whose epistemology of accounts is baked into their grammar. If you tell someone a bear has attacked the village, it is impossible to form the sentence without also disclosing whether your source is hearsay, evidence, or a brush with a bear.

(Every language requires some such disclosures, though most of them less obvious. In English I can tell you I spent all day talking to my neighbor, without ever disclosing his or her gender—but not in French!)

Further tangling the probability/confidence issue is that humans have serious cognitive deficits around low-risk, high-stakes predictions.

If a night of carousing is 20% likely to earn me a hangover, I have to weigh the benefits (ranging from certain to barely extant) against the costs (same range). That is more axes than we know how to deal with. And they don’t end there…suppose instead I am incurring a 2% chance of 10 hangovers (consecutive or concurrent)? Should my decision be the same in both cases? What if it were a 0.02% chance of 1000 hangovers, e.g. a brain hemorrhage?

We divide by zero. We buy lottery tickets, then take up smoking and scrimp on flood insurance.

Our whole criminal system is an attempt to leverage incomplete information by multiplying stakes. You probably won’t be caught shoplifting, but if you are, you’ll be expected to pay not just for the thing you stole, but also for the take of the last ten people.

There are important ways in which probability *is* a function of knowledge. If all you know is the proportion of each color of ball in the urn, chances might be 40%; if you are allowed a split-second peek inside before drawing, and you notice the top layer of balls is 90% a single color, the equation changes—and your preference now plays a role in the odds. (See also: https://en.m.wikipedia.org/wiki/Monty_Hall_problem)

Countries with the least capacity to enforce law and order, tend to do so with the harshest methods. If you can’t lower the risk of people doing bad stuff, you have to up the stakes for the people considering it.

Also, the obvious: the 100 percent knowledge of something coming to pass (or not) collapses the waveform of the future into the event horizon past (mixaphysically speaking), simplifying probability to either 100% or 0%.

That is, if you witnessed the thing happen or not happen (or faithfully measured it in some other way). Even then, our senses could deceive us blah blah, but *if* we really did see it happen, *then* its probability of happening is 100 percent. Was it always 100 percent? If hard determinism holds, then yes; regardless (says the fatalist), it will *now* always have been going to happen. (Time travelers would have to have so many more verb tenses. What is that—imperfect subjunctive?)

Our priors continually adjust to account for the things that might have killed us but didn’t, and we get more and more foolhardy. Or some unlikely disaster upended our life one fateful Tuesday, so now we never leave the house on Tuesdays. We regularly overlearn our lesson, or learn the wrong one.

All of which is to say (as you corollaried much more succinctly) : Probability *is definitely* a function of knowledge—but that just means our estimations have to be the beginning of a conversation, not the end of one.

edited Mar 21Aleatory probability and epistemic probability ARE conceptually different things. Yes, they follow the same math; yes, in practice you have both simultaneously; and yes, a lot of what is considered aleatory (rolling dice, to use the obvious example) is really epistemic (if you can measure the state of the dice precisely enough, you can predict the result ~perfectly); but they're different enough (Monty Hall is entirely epistemic, radioactive decay is ~entirely aleatory) that in most other contexts, they'd have different words.

The thing about lightly-held probabilities sounded kind of off to me, because according to Bayes' Law, you're meant to always update when you learn new information, but the thing is you only update for new information, not information you already know, so if there's not much more you could learn about the event until it happens (like with the coin flip), that makes the probability strongly held, and if there is a lot of information that you haven't included in reaching your probability, then there are a lot of things still to update on and it's lightly held. This relates to my other comment on the precision of probailities. If there are a lot of facts available that you don't know, then your probability distribution on what your probability would be if you knew all the relevant facts will have a wide range.

The thing that makes a probability useful information is a combination of it being well-calibrated (sort of a bare minimum of being correct, an easy bar to pass as Scott points out), and high-information in the sense of information theory (high -p ln(p) - (1-p)ln(1-p), or roughly speaking, close to 0 or 1, so high confidence). The latter seems weirdly missing in the section on Samotsvety. Information in this technical sense is closely related to information in the normal sense. If you perform a Bayesian update on a probability, this always increases the information in expectation, and the only way to increase the amount of information in your probability while remaining well-calibrated is to find more relevant facts to update on. (That is, at least if we neglect computational limits. The case of bounded rationality is harder to reason about but I assume something similar is true there too.) Scoring functions in things like forecasting competitions will generally reward both good calibration and high confidence, because acheiving either one without the other is trivial but useless.

Even needing to ask the questions "What Is Samotsvety Better Than You At?" and "Do People Use Probability 'As A Substitute For Reasoning'?" points to how inscrutable forecasts are, especially crowd forecasts as Scott points out.

Whether the rationale is weak or formal as Scott describes it, we'd all be better off if we had all of them, to evaluate alongside the forecast.

That schtick of Yglesias’s - I take your word for it, not being a reader - is funny because he sure took a different tack with his book title.

This post feels strange to me because while you argue a Bayesian viewpoint, the examples you argue against point to an even more Bayesian approach!

For Bayesian inference, it makes a lot of difference whether you think a coin has 50% chance to come up heads because it looks kind of symmetric to you, or because you've flipped it 10 times and it came up heads 5 times. It matters because it tells you what to think when a new observation comes in. Do you update a lot, to 2/3 or just a little - to 7/13 (i am using Laplace's rule here) ? It depends.

You ideally need to know the entire distribution, instead of a single number. Of course, a single number is better than no number, no arguing there, but the opponents do have a point that "how tightly you hold this belief" is an important piece of information.

> For example, you might think about something for hundreds of hours, make models, consider all the different arguments, and then decide it’s extremely unlikely, maybe only 1%. Then you would say “my probability of this happening is 1%”. ... Sometimes this reasoning process is very weak. You might have never thought about something before, but when someone demands an answer, you say “I don’t know, seems unlikely, maybe 1%”

That's exactly the problem: when you say "the probability of X is 1%", people tend to assume the former process, whereas in reality you employed the latter one. It is also possible that you've thought about everything extremely carefully, took notice of every contingency and calculated all the numbers precisely... but your entire calculation rests on the core assumption that the Moon is made out of green cheese, so your results aren't worth much.

But a single number like "1%" doesn't contain enough information to make a distinction between these scenarios. It should not confer any more gravitas and authority on the predictor than merely saying "meh, I don't think it's likely but whatevs" -- not by itself. But because it is a number, and it sounds all scientific-like, it *does* tend to confer unwarranted authority, and that's what people (at least some of them) are upset about -- not the mere existence of probabilities as a mathematical tool.

I think metaphysical determinism is true, so from that POV I don't think there ever was a "chance" that an event that obtained might NOT have obtained, that's sort of meaningless to even say. Either there was sufficient cause, or there was not, and if sufficient cause existed the event had to obtain. If a person's statement "this is 40% likely to happen" meant "there are some large number of possible universes that could exist, or in some way actually do exist, and in 40% of them Event X will obtain", I'd have a pretty serious problem with *that* claim, and some of the wilder AI discourse on LW veers into those types of fantasyland. But this isn't what's going on with most AI p(doom) discourse.

Most people without thinking too much about this will just translate percentages into a statement about your epistemic limitations in one way or another. Sometimes 40% means "I expect this not to happen, but I have pretty low confidence in that" and sometimes 40% means "this scenario has been designed to produce Event X approximately 40% of the time, within epistemic limits that cannot be overcome", and those are pretty different but in common everyday encounters humans switch between those modes all the time. I've had to give speeches sort of like this during jury selection, to get people to understand what "reasonable doubt" means, to get them to switch from thinking in percentages to thinking about the trial as accumulating enough knowledge to be confident in a conclusion (although the courts in some states don't like you explaining this.)

>Probabilities Don’t Describe Your Level Of Information, And Don’t Have To

Importantly, I think this section is making a normative claim rather than a descriptive claim. And if that wasn't obvious to some, that may be part of what divides the people arguing about that online.

Which is to say: I agree, it would be best if both 'I think pretty likely not' and 'I assign 15% probability' both said nothing at all about your level of information, and no on expected them to.

However, my strong impression is that *in the current reality*, most people do use contextual phrasing to convey something about their level of information when making an estimate.

The way most people actually speak and interpret speech, someone who says '23.5%' is implying they have more information than someone who says 'around 25%', who is implying they have more information than someone who says 'somewhat unlikely'.

Information is often conveyed through that type of rhetorical context rather than directly in words, and I think people who infer that someone is trying to convey that information based on teh way they report their estimates is not wrong most of the time.

(think about movie characters who give precise percents on predictions of things that will happen, they are supposed to be conveying that they are geniuses or masterminds who have used lots of information to precisely calculate things, not trying to convey that they stylistically prefer talking in numbers but have no other information advantage)

Now, again, I agree that it would be better if people didn't use that particular channel of communication that way, because you're correct that everyone talking in percentages would ussually be better, and for that to work it has to be actually disjoint from level of knowledge.

But again, it needs to be recognized that this is a normative claim about how people *should* communicate.

I think it fails as a descriptive claim about how people *do* communicate.

To steelman this a bit, I think that similar to Learned Epistemological Helplessness, this is a case where people's natural reactions are an evolved response to issues not captured in Spherical Cow Rationalism.

Reducing things to a single number will often be more misleading than illuminating. For example, suppose I made a Manifold market called "How will I resolve this market?", and ask you to estimate the probability and then bet on it. And then no matter what you choose, I'll bet the other way and then resolve it in my favor. **There is no probability you could offer that would actually be meaningful here**

A useful trick to distinguish a coin you are pretty sure is fair Vs a coin that might be loaded by any amount, which has big implications as to how much you update your believe after seeing outcomes:

Consider that Probability(tails)=s, where s is a parameter whose value you might be uncertain about.

In the first case Prob s is (close to) 0.5 is (almost) 1, while in the second case, ProbDensity(s=x) = 1 for x in [0, 1].

In the second case, Prob(tails) = \int Prob(tails given s=x ) * ProbDensity(s=x) dx = \int x * 1*dx = 1/2

There is, however, a large difference in how you update your believe about s (and therefore about P(tails) when you observe coin outcomes. In the 1st case you (almost) don't update (if you see 3 tails in a row, you still say P(tails)=0.5). In the 2nd you update a lot (if you see 3 tails in a row, you say P(tails)=0.7 or something that can be calculated).

edited Mar 21Try this argument on for size (NB, I posted this argument as a response further down the list, but I wanted to present it to the general audience of this comments section with it a little more of it fleshed out).

I view the problem as involving the timelines of possible universes going forward from the present — and when it comes to the Mars example, what is the set of possible universes that will *prevent* us from getting to Mars in 2050 vs the set that will *allow* us to get to Mars in 2050? I think we can agree that there is an infinity of universes for the former, but a smaller infinity of universes for the latter. After all, the set of universes where the sequence of events happen in the proper order to get us to Mars would be smaller than the set of universes where the sequence of events didn't happen (or couldn't happen).

If these were sets of numbers we could create a bijection (a one-to-one correspondence) between their elements. For instance, the infinite sets of universes going forward where a coin flip was heads vs tails, would be equal. The subsequent branchings from each of those sides of the flip would be infinite, but their infinities would be equal in size. Likewise, before we cast a die, the universes going forward from a roll of a die where a six will come face-up would be one-sixth the size of the infinity of universes going forward where it doesn't. However, the set of universes where I don't bother to pick up the die to roll it is larger than the ones in which I do. But I can't make a bijection of the infinitude of universes where I don't roll the die vs the infinitude of universes where I do and I get a specific result

Likewise, no such comparison is possible between the two sets of where we get to Mars and where we don't get to Mars, and the only thing we can say with surety about them is that they don't have the same cardinality. Trying to calculate the proportionality of the two sets would be impossible, so determining the probability of universes where we get to Mars in 2050 and universes that don't get us to Mars in 2050 would be a nonsensical question.

I don't intend to die on this hill, though. Feel free to shoot my arguments down. ;-)

I would have expected more focus on the capitalist interpretation of probability as representing the odds you'd be willing to take on a bet. Which is not really about the money, but instead a very effective way to signal how your beliefs about the event will influence your future behavior. If you force me to bet on a coin toss, then seeing which side I bet on tells you which side I consider not-less likely. If you vary the odds you can see exactly how much more likely I consider that side. Once you know that you'll also have a sense about how I'm likely to prepare for future events that depend on the outcome of the coin toss.

Jaynes book "Probability Theory, the logic of science" makes a very good description of treating probabilities as measures of state of belief. This is usually described as Bayesian statistics.

The wiki page on "Bayesian probability is also pretty informative.

Savage also has good work showing that, under a suitable definition of rationality, peoples choices maximice expected utility over subjective probability distributions

https://en.wikipedia.org/wiki/Subjective_expected_utility

>Does drinking monkey blood cause autism? Also no.

That's a relief.

Often I find myself reading an ACX post where Scott wrestles with a problem that exists because he is too charitable. There's usually about 300 comments already by the time I get to it, and none of them will notice this. He himself will never read my comment, and if he did I'm not at all sure I'd want him to be less charitable in the future. Obviously there are writers I could prioritize that have different ratios of genius and charity, so my consistently reading Scott isn't an accident.

But seriously, come on now. Most people are really terrible at predicting anything and would do worse than random chance in a forecasting contest. I'm probably in this group myself. Anyone who criticizes using probabilities in this way is very obviously also in this group, and should be assumed to have no valid opinion. Being good at using probabilities is table stakes here. I try very hard to avoid using probabilities because I know I shouldn't. In domains in which I am actually an expert I will sometimes say "probably 80% of the time it's this way but that's just pareto; also i'm biased".

tl:dr: the only good response to "you shouldn't use probabilities this way" is "you're right that -you- shouldn't, yes."

Another point I would make is that any use of probability for any real-world situation actually still requires making unprovable assumptions about the future. I think there's an illusion of mathematical rigor when you model coin flips and dice rolls with nice-looking distributions, but when you use the past flips of a coin to say how likely it is to come up heads on the next flip, you are making an assumption that the previous flips and future flips have a very specific relationship. There might be reasons why you think this--but these are real-world, empirical reasons like "I don't think the coin flipper is manipulating the outcome" and are not really different from "last presidential election and next presidential election have related dynamics, a similar voting populace, etc."

Obviously coin flips *feel* like they're necessarily from a consistent data generating process, while elections feel like they may not be, but this is not a mathematical justification for treating them differently.

To steal an example from 3blue1brown, consider the following sequence of questions:

1. I roll a d6. What is the expected value of the result?

2. I flip a coin and based on the result, roll a d6 or d12. What is the expected value of the result?

3. I select some number of dice at random from a d4, d6, d8, d10, d12, and d20, then roll them and add up the results. What is the expected value of this sum?

4. I have a collection of unknown numbers of each of the 6 dice above, and select 5 of them. What is E(sum of rolls)?

5. Same as 4, but I select anywhere from 1 to 10 dice.

6. Same as 5, but I select an unknown number of dice.

7. I have a box containing some number of dice, of unknown number of sides, with unknown values on the sides. What is E(sum of rolls)?

At what point do we go from "nice well defined problem where probability can be applied" to "non-repeating, hard-to-model event where it's irresponsible to give a probability"? You can of course keep going and add more and more fine-grained levels of uncertainty to this sequence of problems, which all just involve the repeatable well-defined process of dice rolling. There's no real distinction between "situations with well-defined distributions (so you can use probability)" and "situations without (so you can't)."

This is such a useless piece that I can’t fathom the sort of person it’s useful for.

>>> If you actually had zero knowledge about whether AI was capable of destroying the world, your probability should be 50%

This is not true. You should read about the False Confidence Theorem and imprecise probability.

When the outcome is either 1 or 0, it's better to just say, "I believe X will happen with a Y level of confidence".

As a commenter I'm also cursed to return to the same response again and again:

Probabilities relate to probability models.

Heavily simplified:

Probabilities work fairly fine if you know already know all the possibilities of what might be true and the logical relations between those possibilities (for example, there is no relevant math question you are unsure about and so on). In that case you just need to assign probabilities to all the known possibilities and then you can plug them into Bayes' theorem and other cool formulae. Also then (and only then) there are cool mathematical guarantees that your initially arbitrary probabilities will come ever closer to the truth as new evidence comes in.

On the other hand probabilistic thinking leads to absolute trainwrecks if there are relevant possibilities you didn't think of at all, that is if there are "unknown unknowns" or, worse, mathematical facts you weren't aware of.

Using probabilities should relate to your level of information because they are useful if your level of information is high and useless if it is low in the sense of your uncertainty being dominated by unknown unknowns. For example, it is normally not a great idea to make bets with experts on things you have no clue about, they will just take your money. (Yeah, you could carefully examine the experts opinions and then adjust for expected overconfidence and salience bias, but that is actually another example of my point: You need to get to fairly high information before probability stops being counterproductive).

And yes, if you use probabilities outside their domain of usefulness that will be a substitute for thinking. You will be plugging numbers into Bayes' theorem which feels very math-y and sophisticated, just like it feels like learning to do more of the math problems you already understand rather than face the confusion of the hard ones.

Btw does anyone know anything about the ACX prediction contest emails? I haven't gotten mine. Should I have more patience or I misremembered submitting my predictions?

EPrime anyone?

edited Mar 21> If someone wants to know how much evidence/certainty is behind my “no”, they can ask, and I’ll tell them.

> But it’s your job to ask each person how much thought they put in

I really, really want there to be clear numerical ways to capture this. Giving the full probability distribution (or even, to a lesser degree, summarizing it with variance or standard deviation) captures *some* of it, eg it would help with expressing your certainty about vaccines-cause-autism vs monkey-blood-causes-autism. But that doesn't do anything to capture the top-philosopher vs random-teenager case.

This comes up a lot for me when thinking about doing (mostly informal) Bayesian updating. It strongly seems as though I should update more on the top-philosopher's answer than on the random-teenager's answer, but it's not clear to me how (if at all) the difference is represented in Bayes' theorem.

This may just be my own ignorance, and I'd really love for someone to correct me! But if not, it seems like a failing of how we think about probability.

I think an analogous argument happens regarding using cost-benefit analysis to quantatively evaluate projects or policies to implement. For example, do we build a new bridge, or regulate some kind of industrial pollution.

A first school of thought says "we should try and estimate all the the costs and benefits using some economic model and proceed with the project if it exceeds some benefit to cost ratio threshold"

A second school counters: "we don't actually know the true costs and benefits, you're just modelling or guessing them, plus you can't count environmental benefits and time savings and regulatory compliance costs and safety benefits etc all together. These are categorically different or unquantifiable and your benefit-to-cost ratio is meaningless."

But like with probabilities - where does the second school leave us to judge how we should proceed?

Similarly to calling a probability 'very unlikely', we're left with vague qualitative statements like: 'it's absolutely essential', or 'it's very costly for not much gain', or 'there are both significant costs and benefits'.

In both cases I believe the number is telling us something useful - even if we concede that there are inaccuracies and biases that come with estimating. And if one disagrees with the number, they can go ahead and critique the reasoning in it or undertake their own analysis.

« I agree that, because of the thorniness of the question, probabilities about AI are more lightly held than probabilities about Mars or impeachment »

It’s not clear what you mean by lightly-held to me, and the most likely possibility IMO is that of the meta probability, which you rejected earlier.

Just relink to https://slatestarcodex.com/2013/05/02/if-its-worth-doing-its-worth-doing-with-made-up-statistics/ every month

One thing that bothers me with using probabilities in casual conversation is that they are opaque conclusions, not arguments. Scott writes long, evidence-based blog posts. I wouldn't find a tweet saying "To sum up, I think my p(doom) is 20%" to be an adequate substitute for a blog post. An argument can be built on stories about things I'd never heard of before, which I often find interesting even when I disagree with the conclusion.

But that doesn't mean asking for a p(doom) as an opening gambit in a conversation is necessarily bad. The question is, where do you go from there? Do you have any interesting evidence to share, or is just another unadorned opinion?

Surveys and elections and prediction markets have similar problems, but in aggregate. Each data point is an opaque opinion, and we can't go back to ask someone what they really meant when they picked choice C. (Maybe survey questions should have a comment box where people can explain what they meant, if they care to? It seems like it would be useful for debugging.)

But then again, these things happen in a larger context. An election is not just about voting. It's also about the millions of conversations and many thousands of articles written about the election. I believe prediction markets often have conversations alongside them too? It can be pretty repetitive, but there might be some pretty good arguments in there.

I wonder if they could be combined somehow. Suppose that, when voting for a candidate, it was also a vote for a particular argument that you found convincing? There might be a lot of "concurring opinions" to pick from, but knowing which arguments people liked the most would give us better insight into what people think.

(There is a privacy issue to work around, since the concurring opinion you pick might be identifying.)

Don't see any comments about the difference between stochastic and epistemic uncertainty, but I believe it's a large part of this debate. Stochastic uncertainty is like the uncertainty around rolling a dice, epistemic uncertainty is about not knowing how many sides the dice has.

Perhaps we need norms around communicating both. For example, I'm 50% sure Biden will win the election and my epistemic uncertainty is low (meaning it would take very strong evidence to change my probability significantly). I'm also about 50% sure it will be sunny tomorrow, but my epistemic uncertainty is high because any piece of evidence could cause my estimate to vary widely.

I would add to your list of reasons why Samotsvety's "probabilities" of unique events should be called probabilities. Apart from calibration properties you already mentioned, they behave like probabilities in other respects, for example

- they are numbers between 0 and 1

- behave as probabilities under logical operations, such as negation, AND, and OR, for example p(not(A)) = 1 - p(A)

- behave like probabilities in conditioning on events.

It would be weird if humanity did not have a word for sets of numbers with these properties and this word happens to be "probability distribution" for the full set and "probability" for individual numbers in the set.

This is all wrong. Prices replace probability and trading replaces forecasting when faced with single shot events. Markets are the technological solution to mediating contingency when the notion of possibility is incoherent.

I find the beta distribution a really nice intuition pump for this sort of thing. If one person says 50% and their beta distribution is beta(1, 1), then their 50% doesn't really mean much. But if someone says 50% and their beta distribution is beta(1000, 1000), then that's much more meaningful!

Pretty sure that I agree with the defense given here, but felt some whiplash going from "Probabilities Are Linguistically Convenient" to "Probabilities Don’t Describe Your Level Of Information, And Don’t Have To"

Wouldn't it be linguistically convenient for probabilities to describe your level of information?

Maybe not, if the disconnect revolves around a speaker's wish to have options and the listener's wish to be able to understand a speakers level of information when attempting to update their priors. Non-Experts hate this one weird trick.

> Some people get really mad if you cite that Yoshua Bengio said the probability of AI causing a global catastrophe everybody is 20%. They might say “I have this whole argument for why it’s much lower, how dare you respond to an argument with a probability!” This is a type error. Saying “Yoshua Bengio’s p(doom) is 20%” is the same type as saying “Climatologists believe global warming is real”. If you give some long complicated argument against global warming, it’s perfectly fine to respond with “Okay, but climatologists have said global warming is definitely real, so I think you’re missing something”. That’s not an argument. It’s a pointer to the fact that climatologists have lots of arguments, and the fact that these arguments have convinced climatologists (who are domain experts) ought to be convincing to you. If you want to know why the climatologists think this, read their papers. Likewise, if you want to find out why Yosuha Bengio thinks there’s 20% chance of AI catastrophe, you should read his blog, or the papers he’s written, or listen to any of the interviews he’s given on the subject - not just say “Ha ha, some dumb people think probabilities are a substitute for thinking!”

But the thing that makes superforecasters good at their job (forecasting) IS NOT domain expertise! That was the major conclusion of Expert Political Judgement, and was reaffirmed in Superforecasting! The skill of "forecasting accurately" is about a specific set of forecasting-related skills, NOT domain expertise.

If Yoshua Bengio's arguments are presented to Samotsvety, I would trust their probability conclusion far, far more than Yoshua Bengio's. He may have domain expertise in AI, but he does not have domain expertise in forecasting. I remember reading an interview with a superforecaster, talking about professional virologists' forecasts during the early stages of Covid-19, where he would be baffled at them putting 5% odds on something that seemed to him to be way less than 1% probability, like results that would require the exponential growth rate to snap to zero without cause in the next month, well before vaccines. Numbers that, to a professional forecaster, are nonsense.

One of the main things that makes good forecasters good is that they understand what "20%" actually means, and answer the questions asked of them rather than semi-related questions or value judgements. e.g. "Will Hamas still exist in a year?" versus "Do I like Hamas?" or "Will Hamas return to political control of the Gaza strip in a year?" It is this capacity (among others), not domain expertise, that differentiates well-calibrated forecasters from poorly-calibrated ones.

edited Mar 23“Sometimes some client will ask Samotsvety for a prediction relative to their business, for example whether Joe Biden will get impeached, and they will give a number like “it’s 17% likely that this thing will happen”. This number has some valuable properties”

Does it really.

Apart from Samotsvety being a cool name for a something (as are most Russian names), 17 percent is one of a group of particularly weaselly forecasting-numbers. It is something that is more likely to happen than “is very unlikely to happen” (approx. 1-5%), but it is less likely to happen than “is unlikely to happen but I would not at all rule it out” (less than 40%).

The problem with such numbers relates to the ability/inability of a critic to say after the fact has happened: “hey you were quite far off the mark there, buddy.” Since with 17 percent, you have some degree of plausible deniability if Biden actually get impeached, and you are accused of being a bad forecaster: “I did not say it was very unlikely – I said it was 17 percent likely, implying that it actually had a non-negligible chance of happening”.

Related: When Trump was elected in 2016, people did not go “wow” on the forecasters who argued beforehand he had a 30 percent chance of being elected because these forecasters put the percentage at 30 percent. They went “wow” at those forecasters because almost everybody else put Trump winning as “very unlikely” (1-5 percent).

The really good forecasters, by the way, were those who put the likelihood of Trump winning at above 50 percent. They were the ones who took a real chance of risking to be falsified.

With “17 percent” you hedge your reputation as a good forecaster, with very little risk of being found out if you are the opposite.

…to be clear, I am talking of unique events that do not belong in a larger group of similar-type events. If you have 10.000 similar-type events, you can investigate if some outcome happens 17 percent of the time, while other outcomes happen 83 percent of the time, and use that to forecast what will likely happen at the 10.001 event. I assume here that “Biden being impeached” does not belong in such a larger group of similar-type events – implying that you cannot falsify a “17 percent probability” prediction by collecting a lot of similar-type events.

It seems like a premise (or an effect, maybe?) of prediction markets is to create a way in which non-frequentist probabilities for different events can be compared. If you use your probability of each event to calculate expected values for bets, and you make bets, and you are able to successfully make money in the long term, doesn't that give some more objective meaning to these probabilities?

edited Mar 24It looks like section 4 is a response to my essay Probability Is Not A Substitute For Reasoning https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for). Or rather, it looks like a response to a different argument with the same title. It’s pretty unrelated to my argument so I was wondering if other people had picked up my title as a catchphrase, which can happen, but I can’t find anyone using the phrase except for me and you.

Anyway, you’ll note that my essay never says “don’t use probability to express deep uncertainty”, or anything like that. (I do it myself, when I’m speaking to people who understand that dialect.) Instead I'm objecting to a rhetorical dodge, where you'll make a clam, and I'll ask why I should think your claim is true, and then instead of giving me reasons you'll reply that your claim is a reflection of your probability distribution.

This is especially galling because most people who do this aren't actually using probability distributions (which, after all, is a lot of work). But even for the few who are using probability distributions for real, saying that your claim is the output of a probability distribution is different from giving a reason. In the motivating example (AI timelines), this happens largely because the reasons people could give for their opinions are very flimsy and don't stand up to scrutiny.

More at the link: https://www.benlandautaylor.com/p/probability-is-not-a-substitute-for

edited Mar 29Do people use probabilities as a substitute for thinking? Yes: some people, some of the time. The fact that others don't , doesn't negate that. A stated probability estimate can be backed by a process of thought, but doesn't have to be.

If you have a subculture, "science", let's say, where probability estimates backed by nothing are regarded as worthless fluff, people are going to back their probability estimates with something. If you have a subculture, "Bayes", let's say, where probability swapping is regarded as intrinsically meaningful and worthwhile, you needn't bother.

Probabilities don't have to describe the speaker's state of information , but something needs to. If I am offered an opinion by some unknown person, it's worthless: likewise a probability estimate from someone in an unknown information state.

These are all good points, but I don’t think this engages with the fundamental intuition that leads people to see “the odds of pulling a black ball is 45%” as different from “the odds humans will reach Mars by 2050 is 45%”. Which is that the former can be understood as an objective view of the situation, whereas the latter is a synthesis of know information.

Suppose we’re watching a power ball drawing with one minute to go. All the balls are spinning around chaotically in a grand show of randomness. Asked the probability of a 9 being the first number picked, you say 1 in 100. Shortly thereafter they announce the first number was 9. “Dumb luck” you think, but you play back the tape anyway and notice that when you made your prediction the 9 ball was actually wedged in the output area, making it all but certain that the 9 would come out first.

So were you correct in saying the probability was 1 in 100? Yes and no. You were correct in giving the odds based on the information you had, but if you had all the information you would have produced a different (and more accurate) prediction.

On the other hand, given 100 balls randomly distributed in an urn with 40 black, you can say objectively that the probability of pulling a black ball is 40%. This is because the unknown information is stipulated — there’s no real urn where I can point out that most of the black balls are on the bottom. To try to do so would be to violate a premise of the original question.

I think a big reason people dislike probabilities on real events is because they’re imagining that the prediction is supposed to be an objective description (“this is the actual probability, and god will roll a die”) rather than a synthesis of available information.

Jaynes was shouting in my head the whole time reading this. The appendix of his "Probability Theory" mentions probability systems that don't have numeric values. Instead you can only compare events as being more or less likely (or unknown).

The punchline is that, in the limit of perfect information, these systems reduce to standard probability theory. In other words, it's always possible to just go ahead and assign probabilities from the get-go while remaining consistent, which seems like how a mathematician would write Scott's post.

I completely agree with the case made here that it is useful and informative to convey actual numbers to express uncertainty over future outcomes, instead of vagaries like "probably".

That said, there's an adjacent mistaken belief that Scott is not promoting here but I think is widely-held enough to be worth rebuttal. The belief is that there is, for any given future event, a true probability of that event occurring, such that if person A says "it's 23.5!" and person B says "it's 34.7!", person A may be correct and person B may be incorrect.

Here's a brief sketch of why this is wrong. First, observe that you can have two teams of superforecasters assign probabilities to the same 10,000 events, and it's possible for both teams to assign substantially-different probabilities to each of those events, and both teams to still get perfect calibration scores. I won't attempt to prove this in this comment, but it's easy to prove it to yourself via a toy example.

Second, let's say these two teams of superforecasters, call them team A and team B, say that, at the present moment, the probability of human walking on Mars by 2050 is 23.5 and 34.7 respectively. Which team is correct? We can't judge them based on their past performance, because they have the same, perfect calibration score. How about by the result?

Well, let's say a human doesn't walk on Mars by 2050. Which team was right? There's a sense in which you can say team A was "less surprised" by that outcome, since they assigned a lower probability to a human walking on Mars. I think that's the intuition behind Brier scores. But they were still surprised by 23.5 percentage points! So it doesn't feel like this outcome can provide evidence that they were correct. In fact, Joe Idiot down the street, who isn't calibrated at all, said there was a 0.01% chance humans would walk on Mars, and I don't think we would want to say he was more correct than team A.

So it's pretty clear there is no knowably-objectively-correct probability for any given one-off event, outside of a toy system like a stipulated-fair coin (in the real world, are any coin flips actually fair?) That doesn't mean probabilities aren't useful as a means of communicating uncertainty, and it doesn't mean that we shouldn't use numbers vs English-language fuzziness, but I do think it's worth acknowledging that there isn't an underlying-fact-of-the-matter that we're trying to establish when we give competing probabilities.

I think probabilities are well defined, if the question is unambiguous. By this I mean, when looking at almost any world and asking "did the event happen" there is a clear yes/no answer.

Questions to which you can't really assign probabilities.

"will misinformation be widespread in 2026?"

Questions you can assign a probability

"will the world misinformation organization publish a global misinformation level of more than 6.6 in 2026?"

But every time you make the question well defined, you risk that world misinformation organization going bankrupt and not publishing any stats.

On the bright side, it's good that you keep having to explain this. It means that people who need to hear it are hearing it. Some of them are hearing it for the first time, which means you're reaching new people. It's a good sign, however frustrating it may be.

I'm late to this, but here's my response:

Probability Is Not Frequentist, Nor Math, Nor Real, Nor Subjective

https://open.substack.com/pub/wmbriggs/p/probability-is-not-frequentist-nor?r=b9swm&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

I'll use section 2 as an opportunity to plug [my LessWrong post on unknown probabilities](https://www.lesswrong.com/posts/gJCxBXxxcYPhB2paQ/unknown-probabilities). We have a 50% probability that the biased coin comes up heads, while also having uncertainty over the probability we _would_ have, if we knew the bias. Personally I don't like talking about biased coins [because they don't exist](http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf), but I address in detail the case of a "possibly trick coin": it is either an ordinary fair coin, or a trick coin where both sides are heads.

My main issue is that fractional probability estimates for one-off events are not falsifiable. Events either happen or they don't. If Joe Biden gets impeached, then the perfectly accurate prediction of his impeachment should have been 1. If he doesn't, then it should have been 0.

If I say that he will be impeached with a 17% probability and he does get impeached, well, I say, 17% is not 0. I was right. If he doesn't get impeached, well, I say, 17% is not 100%. I was right.

But if I predict that the L train will arrive on time 17% of the days, then post factum it can be said both whether I was accurate and how close I was to the accurate prediction (maybe, in reality, they arrive 16.7% or 95% of the days).

In the Samotsvety example, we are dealing with someone who predicts one-off events regularly. So, the probability we are talking about is not of the events happening but of Samotsvety's average performance over a series of events they predict. Basically, their numbers are bets which over the course of the prediction game spanning multiple years should yield the smallest difference from the de facto binary event probabilities when compared to other players.

If they say Joe Biden gets impeached with 17% probability, then in the case it actually happens they lose (100 - 17 = 83) points. If he doesn't get impeached they lose 17. The goal of the game is to lose the smallest possible number of points.

Thus, we are dealing with two fundamentally different numbers:

* expected frequency of a recurring event

* a bet on a one-off event in a series of predictions

We don't have a universally agreed upon way to distinguish them linguistically. I think many people grasp this intuitively and disagree to call both with the same word. Even though both are indeed "probabilities": subjective expectations of an uncertain event.

I would like to take historical performance of forecasting teams and round all of their predictions to 0 and 1. If this makes them lose more points than their original predictions then fractional bets make sense in the context of repeated one-off predictions. Otherwise, there's no practical use for them.

A quick Google search didn't take me to where you discuss this fully:

"Do vaccines cause autism? No. Does drinking monkey blood cause autism? Also no. My evidence on the vaccines question is dozens of well-conducted studies, conducted so effectively that we’re as sure about this as we are about anything in biology."

Could you please point me (or us) to your full post(s) on this so we can check the autism~vaccines studies ourselves? With RFK Jr's rise, this issue will rear its head even further.

I don't like the argument in the point 2.

There's a big difference between "I know the distribution very well and it so happens that the mean is 0.5" and "I don't know the distribution therefore I'll start with a conservative prior of equal probability on all outcome and it so happens mean is 0.5". The difference is in how you update on new information. In the first case you basically don't update at all on each subsequent sampling -- cause you sampled a lot before and you're confident about the distribution -- and in the second, each sampling should impact you heavily.

So again, for the first sampling both distributions give you best estimate of 0.5. But not for second etc.

edited Mar 28Eliezer wrote the post "When (Not) to Use Probabilities" which is mildly relevant here, at least as further reading.

https://www.lesswrong.com/posts/AJ9dX59QXokZb35fk/when-not-to-use-probabilities