What did we do?

In January 2024 me and ErrorMargin1 worked together on the 36 questions of the 2024 ACX Prediction Contest. We wanted to maximise the value we got from collaborating, so we came up with uncorrelated forecasts first, after which we discussed our models and updated our predictions. We recorded both the initial and the updated predictions in a spreadsheet, which this post is based on.

We submitted our forecasts individually on Metaculus. We also submitted the average of our predictions on the shared l1z1x account, to see if the wisdom of a crowd of two would outperform us individually. Don’t do this, by the way, it’s against Metaculus’s T&Cs. More on that at the end of this post.

To get through all the questions reasonably quickly, we paced ourselves with a repeating timer: 7 minutes for individual research and 3 minutes for discussion and updating. We didn’t force ourselves to stick to those timings and we often spent longer on questions that were harder or required more data wrangling.

Getting through the 2024 questions took about a weekend, with generous time for breaks. I have included our predictions for each question at the end of this post.

How well did we do?

Pretty well! The shared account came 3rd in the leaderboard, ErrorMargin came 4th and I came 6th. Since the shared account has now been deactivated, the final ranks for ErrorMargin and me are 3rd and 5th, respectively. Our Brier scores were approximately -0.16.

I am very happy with that result, especially considering that it was the first time I had done a significant bit of forecasting.

Analysis

Notes on Metrics

I will be comparing our individual predictions using Brier scores, which, as I just discovered, are just inverted MSEs between the forecast and the indicator variable of a question resolving positive.

I will also represent our predictions as odds, not probabilities, for clarity of presentation. When plotted, a 1% and a 2% forecast look almost the same, even though the second one is twice as confident. Odds of 100:1 vs 50:1 are more clear. I will drop the “:1” bit: Odds of 50 mean 50:1, odds of 0.02 mean 1:50.

Basic statistics

Of the 36 questions in the contest, 13 (37%) resolved positive2. Here are the average Brier scores of me and ErrorMargin, and our shared account, l1z1x, before and after we compared notes.

  ErrorMargin jshapiro l1z1x
Initial -0.1605 -0.1933 -0.1569
After Update -0.1595 -0.1590 -0.1567

The discussions were about 30 times as beneficial to me as they were to ErrorMargin, improving my Brier score by +0.03, while their score improved only by +0.001. I don’t understand why exactly I scored higher than ErrorMargin despite being ranked a fair bit lower in the tournament but it will be either due to Metaculus not using raw Brier scores or due to errors in the data which I’m basing this analysis on3.

The shared account outperformed both of us. Discussion improved it’s score, but only by +0.0002.

Breakdown by question

Plot of all odds and updates.

Here is a breakdown of all of our predictions. Arrows indicate the update from the initial to the post-discussion forecast. Questions that resolve positive are shaded green. The black line in the centre represents 50:50 odds.

Plot of Brier score change for both of us.

And here is the same data again, this time sorted by ErrorMargin’s update.

Plot of Brier score change for both of us.

And finally, the actual Brier scores for the individual questions. Remember that 0 is the best result.

Plot of Brier scores for both of us.

I think a lot of the value I got out of the discussions was avoiding blunders, mostly from misunderstanding the question in some way or from misreading the resolution criteria. At least that’s what I remember causing the biggest updates for me.

Why didn’t we get prizes?

We entered into the competition with three accounts: two individual ones and a shared one. We did not realise this at the time, but shared accounts are against the Metaculus T&Cs. We did not expect to win anything but we decided that if we did that would be a happy-problem and we would contact Metaculus to make sure we don’t get an unfair share of the prize pool. We thought about checking with Metaculus at the time, but decided against it since it felt silly to discuss our share of winnings without knowing if we would even rank in the top half of the leaderboard.

Once the initial winning announcement was out, that decision did not hold up.

January 17 Update: We are currently investigating some irregularities that will affect the final rankings. We will provide an updated announcement soon.

Uh oh. After scrambling to get an email out to them as soon as we could, we were able to convince Metaculus that we were not being malicious. We agreed to forego our prizes but were able to keep our rankings on the leaderboard.

I am very grateful to Metaculus for listening to us with an open mind and for showing and assuming good will. They could have easily doubled down on their T&Cs and disqualified us from the contest but they let themselves be convinced to let us keep our rankings. For Metaculus’s account of what happened, see the updated winner announcement, in particular the note at the bottom of it.

Takeaways

  • I will be writing down explicit models for future forecasts so that I can better learn from mistakes. Predicting that Starship will reach orbit with 90% probability lost me a lot of points but it’s hard to update on that since I don’t remember what I was thinking at the time.
  • We overran a lot, so this time we planned four weekend days to work on the questions without too much stress.
  • We should have definitely sent that clarifying email to Metaculus even though it would have felt silly.
  • Being a news junkie seems to help with forecasting, I’m going to increase how much news I consume (from 0 to slightly above 0).
  • I seem to remember that I wasn’t very confident about any of the space questions, which I could have used as a signal to research for longer. In future contests, I will update all my predictions right before the cutoff time, and spend extra time on the ones I feel the worst about.
  • We would often duplicate effort, for example we would both spend 5 minutes getting the data from a weird excel file indexed by year and adjusted for inflation. This is somewhat alleviated by AI now, since it can do the wrangling for us, but it would be nice to avoid duplicating effort somehow, without losing the initial uncorrelated forecasts. Not sure what the solution is here.
  • Making predictions together is a good way to stay in touch with people later. We would often send each other news articles as things kept happening. And since things are always happening, we ended up messaging each other quite often.4
  • Also, making predictions made me more invested in global events which was both fun in its own right, as well as making for good conversation. On the flip side, it creates a weird relationship between me and the news. When bad things happened in the world, often my first reaction was to think how that affects my forecasts, which troubles me.
  • Metaculus notifications can be terrifying. I sometimes got emails like “Question just resolved: Will Russia use nuclear weapons…” where the … was “by yesterday’s date. Resolved: No.” Those would always get my heart beating.

Our predictions

  jshapiro: initial jshapiro: after_update ErrorMargin: initial ErrorMargin: after_update
Will the FDA or EMA withdraw approval of semaglutide for the treatment of obesity or diabetes in 2024? 2 1 2 1
Will there be 100 or more military conflict deaths between Ethiopia and Eritrea in 2024? 1 1 3 3
Will a nuclear weapon detonation kill at least 10 people in 2024? 1 1.2 0.1 0.5
Will there be 10 or more armed forces conflict deaths between China and Taiwan in 2024? 3 3 10 5
Will there be 10 or more armed forces conflict deaths between India and Pakistan in 2024? 2 3 3 2
Will an AI win a coding contest on Codeforces in 2024? 4 4 10 3.5
Will X declare bankruptcy in 2024? 5 5 15 7
Will there be a serious radiation incident at any nuclear plant in Ukraine before 2025? 12 5 4 4
In 2024 will there be any change in the composition of the US Supreme Court? 4 10 13 10
Will the 2024 light duty electric vehicle sales share exceed 11% in the US through November 2024? 60 10 10 10
Will Ukraine control central Bakhmut at the end of 2024? 20 17 15 17
Will a crewed Artemis II flight approach the moon in 2024? 60 20 8 15
Will Ali Khamenei cease to be supreme leader of Iran in 2024? 38 22 20 20
Will the Fed Funds Rate on December 31, 2024 be below 4%? 25 29 35 31
Will there be a bilateral cease-fire or peace agreement in the Russo-Ukraine conflict in 2024? 40 30 50 40
Will cannabis be removed from Schedule I of the Controlled Substance Act before 2025? 18 30 57.1 40
Will SpaceX attempt to catch a Starship booster with the tower in 2024? 25 30 60 35
Will there be a US government shutdown before January 1, 2025? 28 33 30 30
Will OpenAI publish information describing Q* (Q-Star) in 2024? 18 35 95 80
Will the WHO declare a global health emergency (PHEIC) in 2024? 38 39 42 42
Will Donald Trump be convicted of a felony before the 2024 presidential election? 40 40 50 50
Will annual US core CPI inflation be above 3% in December 2024? 75 45 15 35
Will a debate be held between Joe Biden and Donald Trump before the 2024 US presidential election? 8 49.5 40 48
Will the Shanghai (SSE) Composite Index go up over 2024? 65 53 45 45
Will Ilya Sutskever still lead OpenAI’s Superalignment team at the end of 2024? 82 58 35 40
Will the Democratic Party will win the 2024 US presidential election? 60 69 60 57
Will the New Glenn launch vehicle reach an altitude of 100 kilometers in 2024? 75 70 30 55
Will the S&P 500 index go up over 2024? 68 73 75 78
Will there be faithless electors in the 2024 US Presidential election? 82 78 15 70
Will the US unemployment rate be above 4% in November 2024? 55 78 70 70
Will US refugee admissions exceed 100,000 in fiscal year 2024? 75 80 85 80
Will Bitcoin go up over 2024? 80 82 75 77
Will a member of the United States Congress introduce legislation limiting the use of LLMs in 2024? 65 85 95 95
Will Benjamin Netanyahu remain Prime Minister of Israel throughout 2024? 70 85 94 92
Will SpaceX’s Starship reach orbit in 2024? 78 90 90 90
Will Mike Johnson remain Speaker for all of 2024? 96 94 90 94

  1. To avoid confusion: they entered the contest under a different alias initially. 

  2. For the purposes of this analysis, I have rephrased “Which party will win the 2024 US presidential election?” as “Will the Democratic Party will win the 2024 US presidential election?”. I am excluding the question about the Mexican election since we messed up the data collection on that one. 

  3. I omitted one question and there might be discrepancies between our spreadsheet and what we actually submitted to Metaculus. 

  4. Though admittedly there were other factors making it likely that ErrorMargin and I would stay in touch.