Astral Codex Ten

So, now we need a meta alcohol account to participate?

Expand full comment

Medical Story

It appears so. I have not participated this year for that reason.

Expand full comment

Mr. Doolittle

Same, I will skip rather than setting up another account on the internet.

Expand full comment

Robert Leigh

Jan 24

Meta alcohol is excellent

I am baffled by the website on mobile and have given up

Expand full comment

hodag

Fair warning. I did a 2 of Tetlock's forecasting contests. I found myself bothering my friends who were subject matter experts and spent way too much time on it (tho kinda proud I was top 25 in the COVID one, until I quit).

Some people like the grind. I don't.

Expand full comment

The Tetlock contest is a huge amount of work because you can keep updating all year long. I didn't read the rules for this one: Is it just one time winging it or are you expected to grind for 11.4 months?

Expand full comment

hodag

I didn't read them either. Turned the page on forecasting contests. It is like a job after a while.

Expand full comment

Ether

This one is just spot scoring - whatever your prediction is on jan 31 is what gets evaluated, no need to keep updating

Expand full comment

Jan 22

Thanks.

Expand full comment

Auspicious

I saw this on Metaculus before it got posted here and wondered if I had missed something. Glad to see this is up now, the prediction contests are one of my favorite parts of ACX.

Expand full comment

Monkyyy

may as well roll the dice, anyone know of "group efforts" where people share their political shitposting and some basic research for every topic(off site)

Expand full comment

duck_master

I recently noticed a potential contradiction in Metaculus's rules:

- In the official Terms of Use (see https://www.metaculus.com/terms-of-use/ ) it states that you are not allowed to view the source code or make changes to it.

- HOWEVER, there is actually a GitHub repository (at https://github.com/Metaculus/metaculus ) which is released under the BSD license, and which I presume is more-or-less the production website. So that would seem to imply that you *can* view the source code and also make changes to it.

So which of these is correct?

Expand full comment

April

Jan 21Edited

So first of all I imagine that Terms of Use line is absolutely not ever going to be enforced.

But like, I do not see a contradiction here at all? Just because you can doesn't mean you're allowed to. Maybe they only want people who don't have a Metaculus account to contribute to the website—I don't think that's very likely, but it's not self-contradictory.

Expand full comment

draaglom

Jan 22

probably that clause was written before Metaculus got open-sourced and then was forgotten about. I've opened an issue: https://github.com/Metaculus/metaculus/issues/2036

Expand full comment

Dustin

Jan 21Edited

I thought I'd give it a try, but...."Wrong captcha" every time I try to sign up.

Tried both Edge and Firefox.

Expand full comment

Reply (3)

Gres

Scott said bots would be allowed to compete this year. Have you tried getting the captchas wrong?

Expand full comment

Dustin

https://developers.cloudflare.com/turnstile/concepts/widget/#non-interactive

Actually, there is no captcha other than the Cloudflare Turnstile widget that non-interactively approves itself.

Expand full comment

Ryan Beck

Hey sorry about this, we had a bug last night that was preventing signups that has now been fixed. If you try to register again it should work, but if it doesn't please let me know!

Expand full comment

Dustin

Ouch! I've had the exact same sort of "oh noes we had a push for signups and signups are broken" type of bug before and it sucks!

Expand full comment

Ryan Beck

Yeah not a fun one to wake up to!

Expand full comment

Thasvaddef

I'm sorry you had to find out this way.

Expand full comment

Luke Johnson

Could someone explain how scoring works or send me a link to an explanation? I find the scoring details inadequate as someone who's never done a prediction competition and has no clue how scores are calculated. I just have an slight inclination that different scoring methods could favor different answering strategies.

Expand full comment

Thomas Kehrenberg

The score is the (natural) logarithm of the probability which you assigned to the actual outcome. So, for example, if you said event A will happen with 70% probability, then:

If A happens, you assigned 0.7 to the actual outcome, so your score is –0.357.

If A doesn't happen, you assigned 0.3 to the actual outcome (the inverse of 0.7), so your score is –1.2.

The first score is higher (keep in mind they're both negative numbers), so that would be better.

The logarithmic score is always negative, except if you predicted 100% for an event that did actually happen, in which case the score is 0. But you should never do that because if the event doesn't happen, you predicted 0 and the logarithm of 0 is negative infinity, so you immediately lost forever.

People find it a bit weird that all scores are negative, so Metaculus introduced the "peer score" which centers them around the average, so that anyone who has a better score than the average score gets a positive peer score, but this doesn't change anything for the ranking of the predictions.

Logarithmic scoring is a so-called*proper scoring rule*, which are scoring rules that are mathematically designed in such a way that it's in your interest to report your true probability.

Expand full comment

Reply (3)

Thanks.

Expand full comment

Muster the Squirrels

I've read that Brier scores are also a proper scoring rule. If this is true, why do you think Metaculus prefers log/peer scores? And why do others prefer Brier?

Expand full comment

lalaithion

from https://ericneyman.wordpress.com/2020/04/24/scoring-rules-part-3-incentivizing-precision/

> So by our metric, the log scoring rule is the best of the three commonly used rules at incentivizing precision.

Expand full comment

Jacob Steel

Log scoring is the only scoring rule that satisfies both of the following conditions:

:- Additivity: your score for a composite prediction of independent events is the sum of your scores for the components - that is, if we are flipping two independent coins, it doesn't matter if we score them separately as two events with two outcomes each, or as one event with four outcomes.

:- Monotonicity: If the outcome that Alice assigned to the outcome that actually happened is higher than the outcome that Bob assigned to it, then Alice will score higher than Bob; you can never increase your score by making a worse prediction.

To see this, observe that in order for a score to be monotonic, it must be a monotonic function of the probability you assigned to the observed event - in other words, the simplest possible scoring system is "Alice said that there was a 0.3 chance of seeing the combination of things we actually saw, Bob said there was a 0.2 chance, and so Alice scores higher than Bob", and all other monotonic score are just recalibrations of that. "Observed probability" is multiplicative: for independent things P(a,b) = P(a)P(b), and the way to rescale a multiplicative function into an additive one is to take the logarithm, which gives us the log score.

Brier scores are additive but not monotonic; to be honest I've never been sure why people use them either - from a purely mathematical perspective log score is clearly the "correct" way to evaluate predictions, but it may be that there are messy real-world applications where that isn't what you want.

Expand full comment

Luke Johnson

Thanks, this is extremely helpful. Now I know not to put 100% for anything 😅

Expand full comment

Natalie

Proofreading: the parenthetical (no “AI benchmark #44523, really!) is missing its closing quotation mark.

Expand full comment

Good questions. I'm stumped by all of them.

I'd recommend changing the wording of "Will Iran possess a nuclear weapon before 2026?" to "Will Iran test a nuclear weapon before 2026?" That would be less arguable.

Expand full comment

I'd like to see a separate forecasting contest in which contestants submit 250 word essays on what they forecast will happen that will turn out to be a very big deal in 2025 that isn't specifically asked about in any of the regular quantitative questions, with Scott's choice of the winner being final.

These kind of Tetlockian contests, such as Scott's collaboration with Metaculus, are excellent, but there is also something to be said for the flash of prophetic intuition in which somebody comes up with a forecast that so few people are thinking about that no questions about it are framed.

For example, in 1790, when the French Revolution seemed to be proceeding constructively, British politician Edmund Burke forecast that it would lead to the execution of the monarchs, terror, inflation, and end in a military dictatorship (e.g., Bonaparte six or seven years later). That's a famously impressive forecast.

Of course, these kind of forecasts are hard to score fairly. Fortunately, we possess in Scott a famously fair-minded host, so I would value his judgment.

Another problem is that more off the wall forecasts are harder to get the timing right than Tetlockian forecasts. A lot of becoming a Tetlockian super-forecaster is not underestimating how long things can bump along before they become a crisis.

For example, a massive war between Turkey and Greece over Cyprus, with disastrous consequences for NATO, is not likely to happen in 2025. But reasonable people can disagree over how worrisome it is in a longer timespan. Tetlockian super-forecasters are good at reading articles by experts on Cyprus saying, "Hey, everybody, it's important to pay attention to my topic of expertise for reasons" and remembering that they also said this many time in the past when it turned out it wasn't hugely important to listen to them and thus properly discounting Cyprus's likelihood to be important in the next 12 months.

But that doesn't mean a Cyprus crisis might not happen in, say, my ever-shrinking years left.

So, I'd also like to see Scott scan over 2025 forecast essays again in 2030 and see if any of the losers in 2026 now look like premature prophets.

Expand full comment

Reply (3)

Andreas

Manifold has these "add your own answer" questions. It is "headline-size" instead of 250 word essays though. Here is an example: https://manifold.markets/Bayesian/what-will-happen-during-trumps-seco

Expand full comment

Jeffrey Soreff

Jan 22Edited

Hmm... I'm frustrated by the 12 month limit (if I'm understanding it).

I just did a check of how well ChatGPT o1 is currently doing, see https://www.astralcodexten.com/p/open-thread-365/comment/87433836

>tl;dr: ChatGPT o1 1/18/2025 7 questions, tl;dr of results:

>a) correct

>b) partially correct (initially evaded answering part of the question, 1st prod gave wrong answer, 2nd prod gave right answer)

>c) mostly correct (two errors)

>d) correct

>e) initially incorrect, one prod gave correct result

>f) misses a lot, argues incorrectly that some real compounds don't exist

>g) badly wrong

From the rate of improvement that I've been seeing, I'm roughly 75% confident that by 1/1/2027 ChatGPT should get all 7 questions right. Kind-of

"What is an AGI to me?" ( to the tune of "What is America to me?" )

I think of this as roughly what a bright, conscientious Chemistry and Physics undergraduate should be able to do (with internet access, which current ChatGPT has, IIRC). This isn't exactly everything: Incremental learning is important too. Data efficiency is important too.

edit: I just saw https://openai.com/index/announcing-the-stargate-project/ which explicitly aims for AGI, and has explicit White House endorsement. So I'm bumping up my odds guess from 75% to 80%

Expand full comment

Expand full comment

Is it not possible to use the mobile website to make predictions? I tried signing in via my Google account successfully but I can’t understand how I’m supposed to make predictions in the contest. Possibly I’m just an idiot…

Expand full comment

Ether

You should be able to make predictions on mobile by moving around the slider under the question. Sadly as far as I know you can't just type in a number, which might be easier.

Expand full comment

Ether

Don't know about anyone else, but I have a lot more uncertainty on the questions this year compared to last. I hate predicting at 50% but there are several where I'm close to that.

Expand full comment

Wanda Tinasky

Jan 22

No thank you. I don't play fantasy football either.

Expand full comment

Reprisal

Jan 26

Metaculus became unusable for normal people after their UI redesign. The mobile experience is atrocious to the point of looking like it's a broken webpage. Their business model is explicitly designed to harvest predictor intelligence. $10,000 * .001 probability of success isn't worth the headache. I was top 10 for years and won't return until their UI and compensation models change.

Expand full comment