175 Comments
Dec 16, 2022·edited Dec 16, 2022

1. Will an existing, validated personality assessment tool or validated questions be used in the upcoming survey to assess personality?

2. If it became widely known that people with X (personality) trait are poor predictors (or at least widely known by prediction model participants), will those people self-select out of prediction markets, especially those associated with actual money (some people already decline to participate in polls and some people don't "gamble")?

3. Could it be that some personalities predict some things well, but other things poorly? Would that correlation be looked for?

Expand full comment
author

I'm using that as a stand-in for "the kind of thing on the ACX Survey". See past surveys (eg https://docs.google.com/forms/d/e/1FAIpQLScGVSQbvDiqGMoTQbOP4Fyj07rQ3c50i58cuNIy8rpY0QIa8A/viewform) for examples of what that might be.

Expand full comment

I edited my answer while Scott was answering (face-palm).

My first original question (I think?!), was how would the personalities be assessed?

Scott, straighten me out if this was wrong.

Expand full comment

Your definition of "major US political figure" in #24 would exclude Donald Trump (or Mike Pence). Did you mean to do that?

Expand full comment
author

No but it's not worth changing the question around for this now.

Expand full comment
Dec 16, 2022·edited Dec 16, 2022

How is the contest scored please?

Hugely excited to participate!

Edit: Sorry, I see the contest rules are deliberately ambiguous ("probably Briar score"), probably to prevent me from doing exactly the sort of gaming of the system I was planning :) However just to say that on my own forecasting tournament Briar and log -loss produce quite different results some years

Expand full comment
founding

was the 2022 one not official?

Expand full comment
author

It was official for Eric and Sam, but I'm not going to do anything particular with the results, and I don't think other prediction markets officially participated (except sort of Manifold).

Expand full comment
founding

didnt a lot of readers participate? will you be releasing how well calibrated we are?

Expand full comment
author
Dec 16, 2022·edited Dec 16, 2022Author

Yes, though most of this will be asking Eric and Sam to do the hard work and publishing what they find.

Expand full comment
Dec 16, 2022·edited Dec 16, 2022

Might I suggest giving a slightly below-average score for skipped questions? Maybe 40th percentile or so? As someone who'll probably skip some questions but hopefully not too many I don't want to feel like I'm incentivized to skip questions I'm less confident on.

Expand full comment

On the other hand, as someone with no vested interested in either US or UK politics and no intuition on how their political systems work, I skipped a lot of questions. I don't mind being penalised for it, honestly, I didn't participate because I think I'm good at this, I participated out of curiosity, but I'm not sure it ought to be penalised?

Expand full comment

Yeah, some questions I felt like I had no idea, and 5 minutes of research wouldn't help at all.

Expand full comment

I feel not predicting things should count against someone in a prediction competition. It seems reasonable that the person who wins the prediction competition oughtn't be someone who says "dunno" to a bunch of questions.

Expand full comment

I didn't skip, but I gave 50% as a "heck if I know" guess for questions I had no idea about and which would have taken longer than five minutes to properly research.

Expand full comment

I did the same thing. For some reason 50% seemed like a better solution than leaving it blank.

Expand full comment

Having the skipping be 50th percentile encourages people not to cheat. If it was 40th percentile, I figure some people may look up a market for a question rather than just skipping it

Expand full comment

Worth noting here that the market consensus would almost certainly outperform the average participant, so this incentive is still there.

Expand full comment

How do you plan to prevent the blind mode prizes from going to people who cheated by spending more than 5 minutes on research or looking at Manifold or Metaculus?

https://manifold.markets/IsaacKing/in-the-acx-2023-prediction-contest

Expand full comment
author

Honor system; I will be naming the names of winners and I don't think $500 is enough incentive to cheat. If I notice that someone's answers are exactly identical to the markets I'll take notice. And I think there are enough people, and the markets are imperfect enough, that even if someone tried this strategy it might not win.

Expand full comment

$500 may not be enough incentive, but being seen by the community as a good forecaster might be

Expand full comment
Dec 17, 2022·edited Dec 17, 2022

If they don't mind the stigma of being a dirty rotten cheater, then they're not good forecasters 😁

But this contest is not going to find good forecasters, it'll find people who are good at working out strategies to maximise their placings. So if they get into the Top Three, it won't matter if they really believe that (say) the Third World War is going to happen by February 2023, they'll only care if that is the answer that gets them a higher placing than the truth.

This is why I think prediction markets are nonsense to find out 'truth'. If there's a reward to be made, people will game answers to get the reward. If there's no reward, people are not going to bother participating (except for fun). Would you set national policy on housing/health care/foreign affairs by how a bunch of Wordle players decided it should go?

Expand full comment

Actual prediction markets or contests which give prizes in proportion to a proper scoring rule fix this, but your criticism on prizes like the one Scott is giving is correct.

Expand full comment

"Will any new country join NATO?"

Does this question include Sweden and Finland as "new countries" or no? They've signed accession protocols so their membership is imminent, but they aren't strictly speaking members of NATO yet.

Expand full comment
author

Yes.

Expand full comment

So, is this one factually resolved as yes, given that Finland has joined?

Expand full comment

For new countries joining NATO, I assume that's beyond Finland and Sweden? I'm pretty sure they're not formal members yet so it might be good to clarify if they count.

Expand full comment

F&Sweden it is about, mostly. Turkey and Hungary have not yet agreed/ratified. Hungary is expected to do in January 2023 - Turkey ... Erdogan is less predictable than Orhan even. Plus technicalities after ratification. - But, yeah, pretty sure it will happen in 2023. Any others? Ukraine? Much less likely, esp. as even Sweden was not the expected shoo-in.

Expand full comment
deletedDec 17, 2022·edited Dec 17, 2022
Comment deleted
Expand full comment

Indeed, thus both have to ratify the admission of FI+SW to NATO. All other NATO-members already did so, most within a few days, the US in 4 weeks, Slovakia after 8 weeks . https://www.nato-pa.int/content/finland-sweden-accession

Expand full comment
deletedDec 18, 2022·edited Dec 18, 2022
Comment deleted
Expand full comment

Actually, I am Mark. Not remark. And I do that kinda typo, too. ;) (Plus I can't ever resist making a bad joke.) Wishing you a new year full of peace and joy. May your holidays be full of warmth and cheer!

Expand full comment

7. Does this have to be an accident? As opposed to say an intentional strike, or sabotage? What about intentional mismanagement leading to a predictable but plausibly deniable "accident"? If accident just means nuclear related event leading to evacuation (due to the nuclear part as opposed to say some fighting/strikes in the area), I think my predictions are different.

Expand full comment
author

Good point, I have changed this to "issue". If there's a non-accidental issue I may disqualify this question.

Expand full comment

Small request on the survey, based on looking at last year's — I don't remember ever taking a formal IQ test, and I don't have my SAT score, but I still have my ACT score on hand. Can we use ACT as an entry in that section? (If not, should I just leave the entire intelligence section blank?)

Expand full comment
author

I'll figure this out when I do the survey; I think in the past there's been a section for ACTs.

Expand full comment

If you're collecting IQ test data, will you have a category for "we don't do official IQ tests round here but here is my Crappy Online Test result"? I have one, which is minorly crappy but based on a cut-down version of the Cattell Culture Free Intelligence Test, and the result feels intuitively within the ballpark of where I'd estimate my mathematical/pattern matching ability to lie.

Expand full comment
Dec 16, 2022·edited Dec 16, 2022

This should be fun! I entered your 2022 contest, and was inspired to host my own contest for friends that was derived from the questions you sent out + more fun/dumb cultural questions about sports, James Bond, etc to try to capture a category of friends who wouldn't typically try this kind of thing. Every few months I send out an email that provides a rundown of the questions that resolved, the questions in the news and the current leaderboard. I plan to do the same this year but make it more community-oriented by crowdsourcing at least a few questions from participants, maybe we'll have a t-shirt or something dumb like that. There isn't really a way to scale this without losing some specificity (it takes time and it's more fun amongst people who know each other), but I do think there's more buy-in knowing that it's a year-long thing, that your friends are doing it and that there's some minimal interaction with the content.

I wish there was some way to easily plug in the data I collect back into the data you all are collecting.

Expand full comment

Oh also, at least 10% of my friends messed up the scoring (submitted the opposite of what they thought) despite aggressive attempts at explaining the system.

Expand full comment

> For the forecasting questions, you just have to give a percent chance, from 1 - 99.

Since people are rounding to the nearest integer percentage, a response of 0 or 100 is not necessarily irrational, and I think you should allow it.

Expand full comment
author

The main reasons I did this were:

1. I don't know what scoring rule we'll use, and it might be one that assigns infinite negative penalty for wrong answers of zero. I think people would be angry if this happened unexpectedly and I don't want to explain it to them.

2. If I allow answers below 1, then someone is going to put in 0.5 to a question that seems about even odds, and I'm not going to know whether they meant it to be 50% or 0.5%. I would rather just make this failure mode impossible.

3. I tried to choose things that were at least somewhat uncertain. I think if you put below 1% or above 99%, you are wrong.

Expand full comment

I don't mind having expected score of negative infinity if it means I can improve my max score by ~1%. Since you only reward the best predictor, I am incentivized to optimize my chance of winning rather than my expected score.

Expand full comment
Dec 17, 2022·edited Dec 17, 2022

"I don't know what scoring rule we'll use ... "

Hmm. I thought competent experimental design requires that you make such a decision IN ADVANCE of the experiment.

Expand full comment

Theoretically, as long as Scott chooses one of the many strict proper scoring rules, people are still incentivized to guess their true probability.

This is complicated slightly by the fact that due to the skewed payout structure of the contest, you want to maximize some complicated combination of both variance and expected value. But really: most people here are not doing it for the money, so I doubt this matters too much.

Expand full comment

Some things are truly 0%. It's a mistake to impose your own forecasts on the participants rather than just letting the results come out naturally

Expand full comment

>3. I tried to choose things that were at least somewhat uncertain. I think if you put below 1% or above 99%, you are wrong.

Or they know something you don't?

Expand full comment

TBH none of those questions sound like anyone could reasonably answer < 2% or > 98% based on publicly available evidence alone.

Expand full comment

Fun guessing game but I expect to be completely wrong on everything.

Expand full comment

Perfect! Just answer the opposite of what you truly believe on every question and you'll get a great score.

Expand full comment

I had a great CS professor who gave tests in true/false format and said if you got <10% you'd get an automatic A!

Expand full comment
Dec 16, 2022·edited Dec 16, 2022

TW: Very Pedantic. The questions are written as questions, not statements, but the answers are probability percentages which (to my understanding) correspond with an affirmative statement. I find it a bit unintuitive but I assume this is by some convention I'm not aware of. The 2022 survey had questions in the form of a statement

E.g. Question 1. "Will A be B of C". The actual statement I am giving a probability for is instead "A will be B of C".

Last year's was fun, I took a look back and think I did alright. I was a bit too bullish on the GOP and the economy, but correctly bearish on Crypto

38. Applepay (Apple) already allows users to pay using USDC, a stablecoin. Would this resolve to true assuming it doesn't break the partnership?

Expand full comment

I believe Tesla also accepts payment in Dogecoin. I would expect the question to resolve to positive if this remains the case on the resolution date, but it would be nice to confirm.

Expand full comment

50 and arguably 47 already resolved as true in 2022. For 50, Imagen Video (https://imagen.research.google/video/) is currently the best, but Meta has a similar model. For 47, early in the Ukraine - Russia war, there was a deepfake of Zelenskyy calling for Ukranian soldiers to surrender, and I find it highly probable someone was upset by this, though I can't name a specific person.

Expand full comment

There is also this:

https://www.theguardian.com/world/2022/jun/25/european-leaders-deepfake-video-calls-mayor-of-kyiv-vitali-klitschko

> The mayors of several European capitals have been duped into holding video calls with a deepfake of their counterpart in Kyiv, Vitali Klitschko.

Does this count? But the mayors were only "harmed" in having egg on their face and being in a negative article about deepfakes.

Expand full comment

This is super cool. As a machine learning scientist I recommend cross entropy loss as a scoring mechanism. Cross entropy loss is a principled and mathematically pleasing way to evaluate probability estimates. Let x be the probability you assigned to the true eventual outcome. The cross entropy loss of this forecast is -log(x). So if you said x is 100% your loss is zero. If you said x is 0% your loss is infinite. (Perhaps why Scott didn't allow 0% or 100% predictions - if you're wrong then in theory no one would ever be able to trust you again.) Cross entropy loss is how all the best machine learning models are trained and it would be a great way to score this competition as well.

Expand full comment
Dec 16, 2022·edited Dec 17, 2022

That was fun. Kinda. 5 min research did a lot to form my answers - in the few cases, I did so. :)

... - deleted not to violate blind mode -

Expand full comment

Reading this comment technically violates Blind Mode. So

(1) Please don't comment on your specific predictions here;

(2) Scott, please create a thread for people to share comments once they've entered their blind mode predictions, since the impulse to do so is almost irresistible, and move the bulk of Mark's comment over there.

Expand full comment

I agree! Let's not give any answers here, and only ask for question clarification.

Expand full comment

It won`t take 5 min to read my info about the train station, I figure. So ... - but otoh: I see your point - and oblige.

Expand full comment

43. <i>Will a new version of COVID be substantially able to escape Omicron vaccines?

This question will resolve positive if in the opinion of the judges the scientific consensus is that getting all currently-recommended vaccines, including the two original vaccines and the Omicron booster, decreases risk of the new variant by less than 50%.</i>

This seems underspecified. What risk do you mean? The risk of being hospitalized? The risk of dying? The risk of contracting the virus? The risk of transmitting the virus? And do you mean absolute risk reduction or relative? If a new variant emerges that is far less lethal and results in far less hospitalizations, but is totally unaffected by the vaccine, did risk increase or decrease ?

Expand full comment

We probably shouldn't be nitpicking, but I agree, this one seems quite vague to me.

Because, depending on the criteria, Omicron already “escapes” the vaccines—CDC data doesn’t indicate robust sterilizing immunity in most cases. It also varies a whole lot over age ranges.

https://www.cdc.gov/mmwr/volumes/71/wr/mm7148e1.htm?s_cid=mm7148e1_w#T1_down

Expand full comment

To be fair, the injected polio vaccine doesn't produce sterilizing immunity either, but can't be said to be ineffective.

That aspect can pose a problem when population vaccination levels drop, if the virus is introduced and passed on asymptomatically till it reaches the unvaccinated. (But as long as vaccination levels stay high, it's safer to use.)

With Covid, the mRNA vaccines probably reduce transmission by 20-40%, but so far are most effective against hospitalization anf severe disease. Those last seems like the criteria to measure escape against at this point.

(There's work on next gen vaccines hoped to be better against transmission. But at current likely levels of funding and the vanishing unlikelihood of a second Warp Speed, even if they're possible they won't be seen in 2023.)

Expand full comment

Was about to ask the same. Usually newspapers give three different percentages about vaccines: risk reduction of

- infection

- severe disease (aka hospitalization)

- death

Which of the three is meant?

Expand full comment

I am also waiting for an answer to this question.

The recent variants (e.g. BQ1.1 and BXX) very likely dodge the omicron BA.5 vaccine in terms of symptomatic infection. It's plausible they dodge for hospitalization too. It would help if Scott said what he believes the scientific consensus to be right now -- what does the scientific consensus supposedly say about BQ1.1 and BXX?

Another complication: I view it as very likely that there will be updated boosters by 1/1/2024 (but they'd still be called "omicron booster"). Does, say, BXX count as dodging omicron boosters if it turns out to dodge the current BA5 boosters but not some future BXX booster?

Yet another complication: most people already had previous omicron infection. Are we measuring the efficacy of boosters in the previously-infected population or in the immunonaive population?

Basically I'm quite bearish about vaccine efficacy in the future, but I don't understand resolution criteria and I'm afraid it will resolve against me due to weird interpretations of the scientific consensus.

Expand full comment

The boosters have been doing reasonably well on ER visits, hospitalization, and death during the fall, which is a mix of mostly BA.5, BQ.1, and BQ1.1.

https://yourlocalepidemiologist.substack.com/p/fall-bivalent-boosters-science-update-0a9?r=96jx7&utm_medium=ios&utm_campaign=post

We're always going to be chasing data on the most recent variants, which we'll generally have only when they're on the wane. But a recent booster of any sort seems to be helpful and the closer the match, the better.

Expand full comment

The studies mentioned in this article are all predominantly about BA.5. It is true that one study covered Sept 2022 to mid Nov 2022, a period which had a bit of BQ1 and BQ1.1 at the end; but it did not do a separate analysis on those variants, and according to the study itself, BA.5 predominated.

So all those results are consistent with the vaccines only being effective against BA.5 and not against BQ1 at all. We just have no information about efficacy against BQ1, except for antibody studies which would imply a ton of immune evasion.

I do expect the vaccines to help, to be clear. But (1) not against infection, only against hospitalization/death, and (2) even there, the efficacy is likely modest, not like the 90% we had at the beginning (I would not be surprised at 50%, though it could also be higher, depending on time since last infect/booster).

(I still recommend getting all boosters. I got my omicron BA5 booster. Later I also got omicron (BA5 probably, given timing), but it was reasonably mild.)

Expand full comment

Also, is it the protection as measured soon after the booster or later? This can make a huge difference.

https://www.nejm.org/doi/full/10.1056/NEJMoa2119451

Expand full comment

Yep, this one gave me trouble. My answers are *very* different depending on whether 'risk'' = risk of infection or risk of death.

Expand full comment

Also, is the decrease in risk measured as compared to an (increasingly hypothetical) immunologically naive person? Compared to the average person among those who haven't got the omicron booster (but who may have got the original vaccine and/or one or more earlier bouts of covid)? Compared to the average unvaccinated person (who has probably caught covid at some point)?

Expand full comment

Wish some of this was specified better. Meta has an ar headset coming out too already. Does new Covid variant count as a new pandemic?

Expand full comment

This is a winner-takes-all game without punishment for bad guesses. Doesn’t this push us away from making our best-calibrated guess toward a more extreme higher-risk strategy?

Expand full comment

This is a potential problem! Maybe he should give everyone who does better than average a payout that is somehow proportional to their score?

Expand full comment

I've mulled it over a bit, and I think I should probably just play my best-calibrated guesses, given there are 50 questions. I dunno--maybe there are some interesting correlations in the nuclear->Russia/Putin/NATO questions that would benefit a high-risk strategy.

I'd be interested to hear your opinion.

Expand full comment
Dec 16, 2022·edited Dec 16, 2022

Am I supposed to try and win the contest or make the most honest/accurate predictions? They aren't the same thing.

Expand full comment

Oh try to win (IMHO)

Expand full comment

Pick the girl who everyone else thinks is the most beautiful.

Expand full comment

If you think you can win, try for that (this seems to be what most people here are trying).

If you think you have no chance of winning, try for accuracy.

If you don't think you're likely to be accurate, be honest (that's my strategy) 😀

Expand full comment

Sorry if this is a stupid question but why aren't they the same thing? (If you can answer in a way that doesn't give away your winning strategy)

Expand full comment

No.

In a contest with many players, going for more extreme less accurate predictions will have a better result. Say the real chance of each think happening is either 75 or 25. And you are perfect and guess the right answer every time. You look uncertain.

Meanwhile I guess 100 or 0. If there are lots of people playing, the winner is more likely to have pursued my strategy and been lucky, than your strategy. The more players there are the more "false certainty + luck" you likely need to win.

Expand full comment

Ok, so "I" would get a higher score *on average* than the people following your strategy, but what you're saying is that doesn't matter because it's approximately a winner-takes-all competition, and it's likely that somebody will have got enough lucky guesses to get a higher score than "me"?

This doesn't seem right, though, if there is a reasonably large number of questions. The probability of guessing every question right goes down exponentially as the number of questions increases, whereas the number of people adopting a guessing strategy only goes up linearly with the number of contestants. For 50 questions where you have a 75% chance of being right, the odds of getting them all right is 0.75^50, which is less than 10^-6. I accept that you don't need to get *every* question right under your system to beat hypothetical perfect me, but my intuition is that the exponential beats the linear.

(I also think this depends on the scoring system Scott decides to go with.)

Expand full comment

It does depend on the scoring system, but I think you need to be pretty cagey to have one that isn't going to reward exaggerating your confidence.

Yes there are a lot of outcomes, but you just need to perform the "best". If you guess 75/25 and I guess 100/0, and the real answer is 50/50 each time, despite your estimates being WILDLY better, it doesn't take too many of me before one of me beats you (once again depending on how exactly we score).

Expand full comment

Some pedantic nitpicking about questions:

Question 9 is very specific about nuclear weapons, then question 10 is exceptionally vague. It's not clear if an act of terrorism involving a dirty bomb would qualify for question 10 and this could have a significant effect on the answer. It's also not clear what "used" means in question 10. You could argue that technically, Russia is already "using" its nuclear weapons in war, since their existence and its threatening posture is limiting the Western response.

In question 49, an "average big tech company employee" probably works in Amazon warehouse fulfillment or Meta content moderation. This probably should be limited to software engineers.

Expand full comment

Question 19 (Will the Supreme Court rule against Affirmative Action) needs much more clarification, in my opinion.

The range of outcomes that might be considered to resolve to true range from: "It is illegal to overtly and explicitly admit someone on the basis of their race, but it is perfectly fine to do so as long as you don't say you are doing it," to "any evidence whatsoever of preference on the basis of race alone is actionable."

Expand full comment

I agree. Perhaps "explicitly overrule any part of Grutter v. Bollinger" would be sufficiently concrete and similar in spirit.

Expand full comment

Rather than going off a predefined list of predictions, it would be more fun to let people come up with their own unlikely predictions. You could then choose, let's say, 20 most interesting ones and put them up for a vote. The winner would be the person whose prediction was judged as least likely to occur by voters, while also being judged as "interesting" by at least 50% of voters and you.

Expand full comment

Can someone explain (or link me to) some potential ways to score something like this?

Expand full comment

Here are 2 common ways:

1. For each question, take the probability you assigned to the "wrong" answer and square it. Take the average of your scores on all questions. Best possible score is 0, worst possible score is 1.

2. For each question, take the log of the probability you assigned to the "wrong" answer (this log is negative). Add the scores of all individual questions. Best possible score is 0, worst possible score is -∞.

Expand full comment

I think the logarithmic rule involves the probability you assigned to the right answer.

Expand full comment

You're right.

Expand full comment

I basically guessed most of the non-political questions. If I somehow win, either prediction is nonsense or God decided to briefly speak through me.

Expand full comment

This probably won't be relevant, but by "midnight" do you mean 0000 hours on the specified day?

Expand full comment
Dec 18, 2022·edited Dec 18, 2022

Seems likely Scott means 00:00:00 on day X. If he didn't, the end date of range of dates describing "2023" would be a full 24 hours into 2024, which he probably doesn't mean.

For what it's worth, I noticed "midnight on blah date" in (Australian) government legislation recently and looked it up. It turns out that, in legal convention here at least, midnight on day X refers to "the last moment of day X prior to 00:00:00 the next day. So "midnight <today's date>" is in the future - not 12AM this morning.

This matches colloquial usage, which is good, but the maths student in me notes that, if moments are real numbers, there is no last moment of the day, and the supremum is the next day, so it's all a bit unsatisfying. The legal definition should just be something like "midnight day X is defined as 00:00:00 on day X+1". Grumble grumble.

Expand full comment

The legislature could satisfy even you by inserting "measureable" before "moment." That would also future-proof the law, since as timekeeping got better the "last measureable moment" could automatically creep closer and closer to 00:00:00 of the next day.

Expand full comment

For question 47: "The harm must come directly from the victim believing the deepfake..."

So, to clarify, this specifically means the VICTIM (person harmed) has to be the one believing the deepfake, not anyone else?

For instance, suppose Alice makes a deepfake video of Bob committing a crime (that Bob did not actually commit), shows this video to the police, the police arrest Bob, and Bob is convicted of the crime. Is it the case that this does not count as positive resolution, because the harm did not come from **Bob** believing the deepfake (it came from the police and jury believing the deepfake)?

Expand full comment

I suspect the aim is to say that “Bob” must actually be harmed and think he was harmed. Not that some third party is concerned for Bob on his behalf.

Expand full comment

Think of the Hustler trial. Falwell claimed (and may well have believed) that the allegation he committed incest with his mother was harmful and damaging to him, the decision was that it was not, because no reasonable person would believe this was other than a parody.

So Falwell believed himself to be harmed (even if simply on grounds of "people will be repeating this false story about me") but the Supreme Court did not:

https://en.wikipedia.org/wiki/Hustler_Magazine_v._Falwell

Expand full comment

Re question 13, on wars other than the Russia-Ukraine. If the Russia-Ukraine war escalates to WWIII and starts to involve many other participants, does it count as the other war or as the same war?

Expand full comment

Also do civil wars count?

Expand full comment

Oh this is fun, When can we talk about these questions, and resolutions... where the data comes from to decide. I feel that for some of the questions, the final result source might be more important than the question. I also feel like some of the questions need more details.

For question 14 I need units clarification. Is this 25 million for a year, or 25 million in one day, week.. forever or instantaneously?

Expand full comment

Decided to participate for the lolz. Answered most of the questions, but using my gut instinct and mostly binning into one of a few categories:

- 75 is "almost sure"

- 50 is "tossup"

- 30 is "probably not"

- 10 is "almost sure not"

Skipped the ones I don't have any background to answer even with a gut guess.

Meta-prediction--I'm unlikely to be in the top 10 forecasters. Maybe 10% chance?

Expand full comment

Given the number of likely participants, many of whom will have given thought to gaming their responses to avoid clumping with others (as I did not), I'd put my own chances at 1% by the "must be an integer" rules, and rather less than that in reality.

Expand full comment

The talk about gaming in the comments is interesting, since I think that demonstrates a failure mode for "prediction markets as ways of finding out the truth and setting policy by results".

If you take the Platonic ideal prediction market where people are *only* interested in finding out THE TRUE PROBABILITY, and everyone is studious and answers honestly, then fine.

But even with a (relatively) small prize on offer here, even if it's only for the bragging rights, people are still "Yeah, I was going to game my answers depending on how you score this". They're not looking for true probabilities or "will this event really happen?", they're trying to estimate how the bulk of others will answer and then what they need to do to make sure they place above them.

So it would be all the same if the questions were not "Will the Ukraine take back the Crimea?" but "Will we discover a herd of unicorns in Broceliande?" or "How many wishes will the genie of this lamp I bought in a quaint old curiosity shop grant me?"

Expand full comment
Dec 17, 2022·edited Dec 17, 2022

That failure mode applies only to this kind of 1) one-shot, 2) skewed returns (no possibility to lose money) setting. Long-lasting prediction market with real money shouldn't have these problems, right?

Expand full comment

Gaming for top placement still requires that you actually be pretty good at prediction. In particular, just using a naive variance-maxing strategy where you assign 1 and 99 to things you think are less and more likely than 50% likely is quite unlikely to work; you would need to be judicious about how much accuracy you wanted to trade for variance...

Expand full comment
Dec 17, 2022·edited Dec 17, 2022

I went much the same way - 50 is "maybe/maybe not" since I have no idea which way it's likely to go and both alternatives seem equally likely.

Anything 55-70 was "It could happen, I don't think it will but I'm definitely not sure it won't and there is a chance it might"

Above 70 was "Pretty good chance that will happen"

45 and below was "Pretty good chance it won't happen, even if there is a slim chance it might/it could". The lower on that end was the more "probably not/fairly sure it won't happen".

Expand full comment

(45) I may be misunderstanding, but is Google Glass not an example?

Expand full comment

Whenever I take a test, I want to know how I did.

So, please provide a method come 2023 of retrieving my submissions (presumably based on email address or bespoke ID code) so that I can compare them to the resolutions.

Expand full comment

My strategy: bet on the "nothing really changes" side of each question with some unreasonably high probability. This doesn't maximise my expected score but might maximise my chances of winning some money.

Expand full comment

My initial strategy was to vote 99 on everything but I think you deliberately structured the questions in such a way as to be resistant to this. In hindsight, I should've alternated between 1 and 99 instead.

Are we allowed multiple submissions? I want to see if random patterns can beat humans. I'd like to submit a coinflip attempt, where I flip a coin to decide each answer.

Inspired by this video [0] it would be interesting to see an answer inspired by an animal selecting some kind of food. Can prediction markets beat a dog?

[0] https://youtu.be/USKD3vPD6ZA

Expand full comment

I don't understand your "vote 99 on everything" strategy, regardless of how the questions are structured. Won't that get you great scores on the ones that come true, and very poor scores on the ones that don't come true, averaging out to a score no better than if you'd voted 50 on everything?

Expand full comment

Actually, voting 99 on something false (or 1 on something true) hurts you much worse than voting 99 on something true (or 1 on something false) helps you. Depending on the exact scoring rule being used, you have to guess between 38/50 and 43/50 questions correct (as in, 99 for the true ones and 1 for the false ones) before you'll get a better score than guessing 50 for every question. That's somewhere between one-in-ten-thousand and one-in-ten-million odds of beating a very poor baseline.

Expand full comment

Maximum uncertainty I think compels a 50% guess.

How would a single coin flip help you come up with 1 - 99? But you could flip 9 coins, and take the percentage of heads (rounding 0 to 1, and 100 to 99). +/- 3 standard deviations of a Poisson distribution covers 0% to 100% nicely.

Expand full comment

From a game theoretic perspective, you maximize your expected score by answering with your true beliefs on probabilities, but maximize your chances of winning the contest by biasing towards overconfidence, because that increases the standard deviation of your score. Being the top scorer in a large group of people requires, high mean, high variance, and then getting lucky such that the variance turns out positive rather than negative. So even if some event genuinely has a 70% probability of happening (whatever that means) with the best information available, it may be worthwhile to guess 80% or 90% anyway, or even 99% under the assumption that if it doesn't happen you were going to lose anyway so might as well go all in.

This is significantly mitigated by having 50 questions. Someone who makes a habit of guessing 99% randomly is going to have too low of a chance and there aren't 2^50 people in the competition to drown it this way, but I would not at all be surprised to find that the winner is someone who does stuff like guessing 99% on what ought to be 90% questions and manages to get away with it, while more principled people end up dominating the high-but-not-top scores.

Expand full comment

It might also be effective to bias your answers in the opposite direction from what you think other people will say - e.g. if you think the true probability of X is 90%, and that everybody else also thinks that, then you can guess 1% and get a 10% chance of scoring far ahead of the pack and a 90% chance of falling far behind the pack, which may be worth it to maximize your probability of getting the prize. Again this effect is mitigated by having many questions.

Expand full comment
Dec 17, 2022·edited Dec 17, 2022

> Scoring will be through some proper scoring rule, probably Brier score

Shouldn’t this be established before the start of the contest? People who make different assumptions about this are playing different games, no?

Expand full comment

I agree. Probably not a huge issue but feels a bit unserious.

Expand full comment

"unserious" may not need the qualifier "a bit"

Expand full comment

Great idea, however I am a bit worried about the following: 'If you skip, we’ll give you an “average” score (that is, you lose the average amount of points that forecasters lost on that question).'

Usually scores are not linear, so a few very bad predictions (e.g. 1% and 99% ones) can skew it quite a lot. I feel like it might be dangerous to skip questions even if you know nothing about it. It could be better to default to 50%.

Shall loss for skipped question be changed to min(current scheme loss, 50% prediction loss) in the next year?

Expand full comment

For Full Mode (I just imagine this is a definite no for Blind), would you permit group entries if these were explicitly flagged as such? I'm really interested to see how groups perform in general compared with individuals, and I have a group I definitely want to do this with. My reading of the rules of Full Mode suggests this is allowed, because, in a sense, my group is just some evidence I've consulted for my entry, but it feels marginal in spirit so thought I'd ask. Would still do it if ineligible for prizes as a result, but don't want to mess stuff up by doing something you didn't expect.

Expand full comment

Shouldn't there be a question like: "how many jelly beans are in the jar pictured above?"

The most important aspect of prediction "markets" is the wisdom of the crowds, which is basically sampling theory based on convenience sample - at best (or a confounding self-selected participation).

Many, here and elsewhere, seem to think it is about the monetization aspect or skin in the game. But to me this is a fallacy. First, skin in the game analysis would, to my way of thinking, require that we know about the wealth of the gambler. A $100 bet by a billionaire is not the same "skin" as the $100 bet by the pauper. Yet, so far as I can tell there is never any attempt to capture that information by the skin in the game folks.

Guess the jelly beans in the jar - free - random sample

Guess the jelly beans in the jar - free - self selection participation

Guess the jelly beans in the jar - de minimus entry fee - like raffle

Guess the jelly beans in the jar - entry fee as surrogate for confidence - bet as little or as much.

Guess the jelly beans in the jar - entry fee as surrogate for confidence where bet size is proportion of wealth and/or income.

Who does better? Which "average" result is better?

Expand full comment

Maybe an obvious point, but most of these questions are very specific to the interests of the Bay Area rationalist community, such that giving clueful responses (especially in Blind Mode) will often be more dependent on familiarity with recent SpaceX and OpenAI press releases, hailing taxis in SF, how stablecoins are supposed to work, and similar niche topics than any kind of generalizable "forecasting accuracy" trait.

Accordingly, anyone using this dataset will have to take care to distinguish hypotheses like "Using LessWrong tends to make you less wrong" from the less exciting "rationalists are relatively up-to-date about the topics rationalists like to talk about".

Expand full comment

There are a lot of half measures the US Supreme Court might take in the case on affirmative action; that one could probably use some preemptive clarification to avoid grey area resolution down the road.

Expand full comment

My 2023 prediction is that it's going to somehow suck even more than 2022 did.

And 2022 started with me working for a company that was, at the time, headquartered in Ukraine. Things went downhill from there.

*sigh*

Expand full comment

There were a number of questions where my uncertainty was mostly driven by trying to guess what the evaluator's judgement would be, rather than the ground truth: 5, 38, 41, 43, 47, 49, 50. Some of these could be improved with more specifics in the resolution criteria, but for the most part I often felt like I needed more info about what the evaluator thought about similar questions in the past - was there a "cease fire" between Ukraine and Russia 2015-2019?

("The evaluator" would be Scott for most of these, but not for e.g. 41)

Expand full comment

Yeah, 14.) I think is how many infections as reported by China.

Expand full comment

I think it's how many infections Our World in Data shows for China - if China doesn't officially report them, it may choose to draw on a different data source. (I'm guessing someone will be trying to do an estimate.)

Expand full comment

I have no idea, but at the moment they are reporting numbers from China.. which shows cases going down... which looks totally wrong, as pointed out by Zvi.

Expand full comment

Having watched OWID for a couple of years, I trust them to eventually sort it out. If I'm wrong, it certainly won't be the only thing on the survey I misjudged. 🙂

Expand full comment

Apologies if this was clearly answered somewhere else, but if we provide our email address will our individual results/scores be sent to us after the completion of the contest?

If that's not planned, what would it take to make that happen?

Expand full comment

Yeah, I just realized (after I submitted) that I have no record of my predictions. So I would also like to get my score mailed to me (or something?).

Expand full comment

For anyone who hasn't yet submitted, printing to PDF before doing so works to preserve a record.

Expand full comment
founding

I came here to say this as well; I'm having a sub-contest with my brother to see who performs better

We're happy to score our own results, but can only do so if we receive our answers (e.g. by email)

Expand full comment

One more vote hoping to get my results emailed. I did a speedrun and would love to have my gut-instinct guesses preserved.

Expand full comment
founding

To my compatriots with whom I share this boat:

If you filled out the ACX survey, you can likely figure out which answers you gave in the prediction contest, by searching the csv file of everyone's answers, linked here:

https://open.substack.com/pub/astralcodexten/p/stage-2-of-prediction-contest?utm_source=share&utm_medium=android

E.g. your age, location, gender, etc might be enough to pick you out from the crowd.

If you didn't do ACX survey, it's more difficult, but maybe you can find yourself in the list some other way.

Hope this helps!

Expand full comment

I would argue that the Quest Pro is already an AR headset, as released by Meta.

If Quest 3 has color cameras and passthrough would that count?

Expand full comment
founding

Quest already has passthrough but it's way too shitty for anything. I guess it depends if there is software to make use of it?

Expand full comment

OK I made my guesses. So I don't understand the scoring, for questions I didn't know... like all UK politics. I just put down 50%. Why is this the wrong (or right) thing to do.

Expand full comment

The rational strategy is a high-variance approach aimed at getting the highest score out of anyone while also increasing the likelihood of getting a below average score. Even if this approach leads to a lower average score, it will be rewarded more than trying to make well-calibrated predictions that with a higher expected average score. This underminess the usefulness of such an exercise.

Expand full comment

Do you expect Twitter to continue reporting net income and mDAU on a quarterly basis, even though they are a private company and have no obligation to publish quarterly reports?

Expand full comment

I expect statistica to keep reporting numbers, even if they have to make them up. They have a very consistent record.

Expand full comment

Well, they have stopped. They do have a series for total monthly users (as opposed to monetizable daily users), but that's rather different.

Expand full comment

> I’m planning to give out at least 4 x $500 prizes: one for winner of Blind Mode, one for winner of Full Mode

One potential issue with this is that playing to win could end up a little at odds with giving your best prediction. Ie. someone who expects to have a non top-tier performance by giving their best guesses would likely have a higher chance of winning by maximising their variance, even at the cost of accuracy, as an average score can't win, but by making overconfident guesses, they increase the chance they'll get lucky, even though their average expected score would be lower.

Admittedly, I don't think you can really practically solve that - ideally you'd want a payout proportional to score, rather than a winner-takes-all setup, but the logistics of that would be incredibly impractical (and might also encourage other forms of gaming like spamming duplicate accounts/entries). Probably not a major issue, since I think most will be playing for fun / aiming just for a good performance, rather than solely aiming for the prize.

Expand full comment

Is there a way to download a transcript of how we answered?

Peter Robinson

Expand full comment
founding

I, too, would like a copy of my answers. I kind of expected to get one emailed to me after I pushed submit, which didn't happen. The back button from the post-submission page lead back to an empty survey :(

Expand full comment

I predicted a 72% chance of getting answers emailed to me right after submitting the form. And 88% chance of seeing my answers eventually. Now it's starting to sink in that I might not be good at predictions.

Expand full comment
founding

Yeah I'm in the same boat as the rest of you. I'm hoping Scott hears the message and emails everyone a copy of their responses.

Expand full comment

On a tangent, that you can't predict anything, the World Cup final is being played right now. Argentina were leading 2-0 right up to the 80th minute, then Kylian Mbappé got two goals.

So now the match really has caught fire again, because whoever gets the next goal in the next ten minutes (if anyone does) is going to win this, and Messi may never get that elusive world cup winner's medal after all (after eighty minutes that it looked done and dusted) 🤣

Expand full comment

Several people mention that the winner-take-all structure of the contest does not incentive reporting honest preferences. Does anyone know of an example of a forecasting contest where someone successfully exploited this? Maybe you should increase confidence, but I suspect most people would overdo it and sabotage themselves.

It seems to me that the more relevant strategy is to exploit the correlation between the questions. If you think there's a 30% chance of nuclear war, you should gamble on it actually happening. Put down 99 and analyze what it does to the rest of the questions. But if you think there's a 30% chance of nuclear war, you should spend a lot more than $500 optimizing your life around the risk. (I got the 30% figure from Tegmark, who did upend his life.) Far short of this, it might make sense to bet on Ukraine escalating or de-escalating, but that probably doesn't affect all the other questions as much.

Do you know any examples of people exploiting correlation between questions in forecasting contests?

Expand full comment

Is it a percent yes, or percent no?

Expand full comment

Can I change my prediction on whether Musk will still be head of twitter? lol

Expand full comment

As a curmudgeonly blogger, I felt compelled to write a blog post expounding on my various predictions.

https://www.newslettr.com/p/the-2023-prediction-dispatch - don't click that link if you plan to enter in Blind Mode!

Also: don't comment here on my comments there! I have disabled comments (a blessing for the comments: may השם‎‎ bless and keep the comments ... far away from us), but will enable them on that post so you aren't tempted to reply with spoilers.

Expand full comment

6. Would a kick scooter count as a vehicle?

8. You mean worldwide, not just in Ukraine and not even just on land, right?

43. Risk of what? Infection? Any noticeable symptoms? Symptoms bad enough to be unable to work for ≥1 day? Hospitalization? Death?

Expand full comment
Dec 20, 2022·edited Dec 20, 2022

I did gauntlet mode, no internet and 30 seconds pondering max. Being non-USA will hurt with some answers but we'll see. I'm confident that I'll be the dark horse in this :D

Expand full comment
Dec 21, 2022·edited Dec 21, 2022

General question: if any of the "did it happen" questions actually resolve in 2022 (eg 43 with BQ1.1, or say Meta surprises us with a "Quest Pro 2" on 12/25/22), will it resolve positively?

1-4. Are these point or duration questions?

4. The main train station in Zhaphorizia, Zhaphorizia-Live, including its rail yard, appears to be large enough (10km wide) that there could plausibly be Ukrainian control over the west side and Russian control over the East, and there is a platform at 4 km from the "0km" platform that is still labeled as Zhaphorizia-Live. Can you be more specific about what point must be controlled?

7. Is this at any point during the year, and if the evacuation is precautionary, and turns out to have been unnecessary, ie, people go back home before 1/1/24, will it resolve positively?

8. Does a fizzle (attempted to detonate a nuke but only detonates as a dirty bomb) resolve positively?

11. Which step at https://www.nato.int/cps/en/natolive/topics_49212.htm will cause this to resolve positively? After completion of 5 seems best to me, but during/after 6 or 7 seems reasonable as well. Asking because I think there's a reasonable chance 5 finishes late next year, not to be pedantic.

19. This is really, really vague. A ruling upholding the status quo could be interpreted as "against affirmative action". Unfortunately I don't have quite enough familiarity with the subject to know which precedents being overturned or ignored should count towards a positive resolution.

22. Is this a point, duration, or at-any-point question?

30. Will this be reevaluated when subsequent numbers come out a couple months later, or just based on preliminary numbers in January?

37. Ditto.

40. The first "orbital flight test" is currently intended to stop just short of "orbital velocity", and it will not complete an entire orbit. Technically it's still on an orbit, just a guaranteed decaying one; the perigee will be above the surface of the Earth, but at ~70km (https://twitter.com/planet4589/status/1411781300063813648), below the Kármán line) for mission simplicity, not because the vehicle will be incapable of getting the last 30 m/s. So there's 4 potential boundaries for resolution here: A. Getting into LEO (the Orbital Flight Test alone would not resolve positively), B. Any perigee above the Kármán Line will resolve positively (the OTF will only meet this if Starship gets unexpectedly fast, but still below LEO), C. Getting to the target perigee (within a reasonable deviation) or higher (only a nominal OTF or higher perigee will resolve positively), or D. getting into any orbit with a > sea level perigee (even a sub-nominal OTF will resolve as long as the perigee condition is met by the termination of the almost-orbital-insertion burn, even if Starship ends up deorbiting unpowered). Again asking because I think there's a reasonable chance only the OTF, and not subsequent flights, will take place before 2024.

48. This question asks about "Global Health Emergency"s, but gives the previous 3 "Public Health Emergency of International Concern"s as examples instead, while there have been several other "Global Health Emergency"s in 2022, and the odds of a GHE seem much higher than of a PHEIC.

Thanks in advance.

Expand full comment

Q.40, how do you define reaching orbit? Does it need to complete a full revolution around the Earth?

Expand full comment
Dec 23, 2022·edited Dec 23, 2022

The problem with question 14 is it's not actually a question about if there will be a large covid outbreak in china, but if they will bother going through the trouble of recording and reporting it officially. Just today the government officially said they estimate that there were 37 Millon cases in a singular day. Even though this Is directly said from the CCP it doesn't count in this circumstance. Link to article https://www.bloomberg.com/news/articles/2022-12-23/china-estimates-covid-surge-is-infecting-37-million-people-a-day?srnd=premium-europe&leadSource=uverify%20wall

Expand full comment

Does a North Korean nuclear test qualify for question 8, "Will a nuclear weapon be detonated (including tests and accidents)?"

Expand full comment

> Will Twitter's average monetizable daily users be higher in 2023 than in 2022?

Which specific dates are being compared?

Expand full comment

I meant to save my predictions for future reference, but I forgot to do so. Is there any way to access that?

Expand full comment

I had that thought the moment I hit submit, assuming I would be able to take a copy for myself. Welp. If there's a way to send me a copy of mine, Scott, please please do.

Expand full comment

I cannot answer Q13, the term "any other war" is very open-ended. If we're talking about literally any other war, then no, absolutely not. WW2 alone saw 16+ million military personnel die, just amongst the Allies, iirc.

Any other war in 2023? I'm not sure there's going to be one.

That leaves a huge range to work with.

Expand full comment

This seems like a good opportunity to see how Metacalculus and Manifold perform against an array of human experts + non-experts. It might make sense to pre-register a date and time (Midnight GMT on Feb 1st?) to consider their current prices as their answer for the purpose of the contest? (Or is there some better way of doing it, like looking at the average price over some time interval?)

Registering my own prediction... my bet is that the markets outperform the median participant, because the lowest-effort participants in Scott's context will likely make some wild guesses and are less likely to participate in the markets. But I bet they under-perform the top participants and act more as an averaging function across the guesses of people who bothered to participate in the markets (call it maybe 75th percentile performance in the total contest), because I don't think people more likely to be right will meaningfully correlate with people likely to spend more

Expand full comment

Open question for anyone, not just Scott: is there a condition you can imagine which would cause you to update towards prediction markets being less important, less valuable, less relevant, etc?

Expand full comment

Is there a way to get a copy of my responses?

Expand full comment

For "Will any other war have more casualties than Russia-Ukraine?" does the "other war" have to be ongoing in 2023 (otherwise surely World War II wins)?

Does it have to *start* in 2023? Are the casualties being compared just 2023 casualties vs. 2023 casualties, or all casualties for that war? I think any of the Tigray War, Yemeni Civil War, Syrian Civil War, Somali Civil War, Boko Haram Insurgency, or War in Darfur could beat Ukraine War if it's ok for the war to not have started in 2023 and/or we don't compare 2023 vs 2023 only.

Expand full comment

Can you work with someone else for blind mode or no?

Expand full comment

Are you able to send us an email (if we included it) with our responses? I was expecting we would receive some record of our guesses and I did not make a copy of them.

Expand full comment

Would love a copy also, now almost a year later!

Expand full comment

I have the same question as others- can we get a copy of our responses?

Expand full comment

Comment/critique on specific questions: the Covid-19 statistics especially on China seem one that would be hard to have an accurate figure for at any point in time (except perhaps way into the future from independent investigations) and hence doesn’t seem fair or useful when assessing results.

Expand full comment

As others have mentioned, question 13 is phrased extremely ambiguously. The rest seemed basically fine to me, although at least one will probably be ambiguous by the time it resolves (i.e. true Covid deaths - worldwide and in China - are going to be higher than the official count).

Expand full comment

Wait, blind mode is closed? It says "until January 10" here – I took that to be inclusive. Did you already release the "anonymized versions of all Blind Mode predictions"?

(The form says blind mode will no longer count.)

Expand full comment

1. Is there any way to get a copy of our submission before the end of 2023?

2. Will a “running tally” of events that have/have not occurred be kept visible of the course of the year? For instance, gpt-4 has been released; is there anywhere that that's marked as “has happened”?

Expand full comment

Hi, did this happen? I’d like to see what I predicted. Wishing I’d screenshot it. Is there a way to go back and look?

Expand full comment

I need to talk to experts in the world of 3Dprinting technology,the future of manufacturing,addictive manufacturing AM

Expand full comment