My notes from doing the contest first, then some thoughts on this year’s competition. Thanks to Metaculus for organizing!

This year me and ErrorMargin followed the same procedure as last year: researching the questions individually, then sharing notes. I estimate that I spent around 15 hours on the competition.

Questions

1. Will NASA’s Artemis II complete its mission successfully before 2027?

Before discussion: 77%

After discussion: 85%

Lots of attention, monthly windows, they’ll get lots of chances. Delays seem to be OTOO 3 months, so they can stomach 2-3 of those, especially given they want to accelerate timelines (must allow for time to return, ~10 days).

Say the chance of another delay is 32%, .32^2.5=6%.

Space is hard though, p(launch) = 82%.

Only resolves positive if everyone makes it alive. Out of the 160 manned space missions since 2000, there was only one accident. So 0.6%. Let’s say the risk is 10x for a moon landing, that’s 6%.

Final: .82 * .94 = .77

2. Will Donald Trump cease to exercise presidential powers for 48 hours during 2026?

Before discussion: 10%

After discussion: 9.5%

If D win the House, they’ll try to impeach. They’ll probably fail (90%) unless they also control the house.

According to Polymarket: p(D house) = 78% p(D senate) = 33%

Also. Elections are November 3rd p(impeachment succeeds in time | the votes are there) = 15%

p(Trump removed in 2026 through impeachment) = .78 * (.15 * .67 + .85 * .33) * .1 = .03

What about through death?

5.48% according to actuarial tables. I’ll adjust down to 4.8% for wealth but no further because of stress.

Oh and I’ll tag on 0.5% assasination risk, for 5.3% total.

What about through a temporary medical emergency?

(apparently colonoscopies are the biggest risk to US national security??)

Hard to find data, I’ll just say a serious medical incident that doesn’t cause death and lasts 48 hours, would be maybe one third as likely as a lethal one? I expect strokes and heart attacks to kill quickly for the most part, and not drag into a 48+ hour surgery + coma period, that seems rare.

Add an extra 2%.

For a grand total of

1 - .98 * .947 * .97 = .1

3. What percent of the top 5 human average score will the best bot score in ACX 2026?

Before discussion: N(110%, .17²)

After discussion: no update

How do I even write this down? Went on a rabbit hole on this one, follow up post coming, if time allows.

For now, I took the variance of one bot across tournaments (the “same bot” from background info σ=.17) & extrapolated the trend between the benchmarking tournament 2024Q3 (99.4%) and 2025Q2 (119%), two quarters forward. I’ve added this increase (13%) to the ratio between the top bot in ACX2025 and the mean of the top five forecasters1 This gives me a normal distribution with μ=110%, σ=17%.

It is important to multiply by 100.

4. Will GTA VI be released during 2026?

Before discussion: 45%

After discussion: 57%

It’s hard to know how much of the game is finished, but they have been able to record hi-fi trailers since May 25 (probably doesn’t mean much?).

Internally, they’ve just did RTO and one of the OG devs left in 2020.

If there’s another delay, they’ll miss 2026. After the most recent delay announcement on Nov 7, stock prices dropped 8%, recovering towards the end of dec.

There have been two delays so far, let’s say there are at most 4. The chance of 3 or 4 delays is 2/3 if it’s equal ….. 45%

This is the worst prediction I’ve ever done.

5. Will Benjamin Netanyahu cease to be Prime Minister of Israel during 2026?

Before discussion: 41.2%

After discussion: 33%

The election is coming up, so basically will he survive it?

I find the polls hard to interpret, but Benett polling seems tricky, with a few outliers pulling it down. Given polling reliability, I’ll give 65% for Likud reelection, and 63% for that being Netanyahu in power (he could get ousted if he does something outrageous, though I am not sure what that would be).

Risk of death: 3.29% (natural) + 0.5% assassination = 3.79% Risk of accidentally getting detained for war crimes for more than 14 days in Switzerland or something: 0.5%

Total risk of losing power: 41.2%

12% coalition winning if lose, 15% of coalitions getting their act together this year

p in power at EOY if not ill = 8% + .92 * .85 = 86.2% total p in power = 86.2 - 3.79 - 0.5 = 81.91% answer:

6. Will Keir Starmer cease to be Prime Minister of the UK during 2026?

Before discussion: 65%

After discussion: 53%

The last 10 PMs have lasted 2055 days on average, if you exclude Liz Truss (lol). Starmer has been in office for 552, so assuming uniform resignation odds over remaining days, we get 365 / (2055 - 552) = 23%

According to YouGov polling, Starmer’s disapproval is at the level where Sunak’s was when he resigned.

Betting markets say 60% for him stopping, there are two specific potential candidates lined up.

7. Will an AI-created song chart in the top 20 of the Billboard Hot 100 before 2027?

Before discussion: 36%

After discussion: 44%

We’ve had the first AI artist, Xiana Moneta, sign a big deal.

The songs enter weekly. Let’s say we’ll have 5 AI songs competing for the top 20, from the top 50, that would give it 92%.

I think controversy around this will help, more than anything. There’s still a capabilities gap, but a talented artist could already make good music with AI in a way that would resolve as yes. Question is only if they’d admit to it. Controversy would help a newcomer but harm the reputation of an incumbent.

The annulling on ban cuts down the biggest tail risk.

There are about 20000 songs released each year, approximately 14 new distinct songs / week (728/year) make it into the Hot 100. Let’s say all 14 songs get approximately 1.2 shots at getting into the top 20 per week.

So if we say we’ll release 50 physical AI albums in 2026, for 500 songs total. That’s 728/20000 * 500=18 entries into the Hot 100, each with an 8.5% chance of getting into the top 20. That’s 81%.

I think it’s more like 2 albums will be actually good, for 5 entries into the Hot 100. So more like 36%.

I think I’m sticking to 36%.

8. How many days will the US government be shutdown in 2026?

Before discussion: days = N(0, 0) * .95 + N(3, 2²) * .03 + N(35, 20²) * 0.02

After discussion: no update

More continuous variables. Let’s split it out into fake cases, with base rates since 1990:

  • No shutdown: 0 (24: 68%)
  • Small shutdown: 1-5 (7: 20%)
  • Big shutdown: 20-50 (4: 11%)

Also, there’ve been an average of 8.1 days of shutdown per year, since 2016.

Claude made me this

Fiscal Year Completion Date Days on CR Days Proper Approp. Days No Funding Total Notes
2004 Jan 23, 2004 114 252 0 366 Omnibus (P.L. 108-199)
2005 Dec 08, 2004† 68 297 0 365  
2006 Dec 30, 2005† 90 275 0 365  
2007 Full-year CR **36 *0 0 365 P.L. 110-5 (9 of 12 bills)
2008 Dec 26, 2007† 86 280 0 366  
2009 Mar 11, 2009† 161 204 0 365  
2010 Dec 16, 2009† 76 289 0 365  
2011 Apr 15, 2011 196 169 0 365 P.L. 112-10 (11 of 12 bills via CR)
2012 Dec 23, 2011† 83 283 0 366  
2013 Mar 26, 2013 176 189 0 365 P.L. 113-6 (7 of 12 bills via full-year CR)
2014 Jan 17, 2014 108 241 16 365 Oct 1-16 shutdown
2015 Mar 13, 2015† 163 202 0 365  
2016 Dec 18, 2015 78 288 0 366 Omnibus (P.L. 114-113)
2017 May 05, 2017 216 149 0 365  
2018 Mar 23, 2018 173 189 3 365 Brief Jan/Feb shutdowns
2019 Feb 15, 2019 137 193 35 365 Dec 22, 2018 - Jan 25, 2019 shutdown
2020 Dec 20, 2019 80 286 0 366  
2021 Dec 27, 2020 87 278 0 365  
2022 Mar 15, 2022 165 200 0 365  
2023 Dec 29, 2022 89 276 0 365 Omnibus (P.L. 117-328)
2024 Mar 23, 2024 174 192 0 366 Two minibus packages (P.L. 118-42, 118-47)
2025 Mar 15, 2025 165 200 0 365 P.L. 119-4 (all 12 bills via full-year CR)

The trend in days on CR and days on “proper appropriation” (congress debated spending and passed a bill instead of just temporarily using last-year’s budget) are both flat. If the general level of funding instability was increasing, I would expect rising and declining trends, respectively.

So I’ll stick close to base rate … and now I’ve actually looked at the fucking news and apparently it’s all over and done with. So finally:

days = N(0, 0) * .95 + N(3, 2²) * .03 + N(35, 20²) * 0.02

I’ve actually put in a smoother distribution and had to make concessions to what the UI would allow me to enter.

9. Will Nvidia’s stock price close below $100 on any day in 2026?

Before discussion: 4%

After discussion: no update

Monte Carlo sim shows 4%.

10. How many of these 15 top US executive branch officials will be out before 2027?

Before discussion: binom(n=15, p=.2)

After discussion: binom(n=15, p=.17)

I … really don’t care. They’ve survived one year, so let’s look at Y2-4 average as a base rate (20%) & a trump rate (%), then interpolate, then try entering a simple binomial model.

The data implies that the Trump effect mostly took place in y1 of his presidency, other than that his turnover rates line up with the others. So p(survive) = .8 and I predict binom(n=15, p=.2)

11. Will the composition of the US Supreme Court change in 2026?

Before discussion: 30%

After discussion: 32.5%

Risk of anyone dying from actuarial tables: .13.

The linked calls to step down mostly look like noise, I’ll stick to base rates.

Based on the linked spreadsheet, I construct the CDF of p_retirement(age).

The probability of any of them retiring is 0.33

I feel that I’m double-counting.

I’ll stick to the CDF using end of service for any reason, which comes out at 30% for their ages.

12. What will be the price of Bitcoin at the end of 2026?

I just did the Monte Carlo here, I’d post a chart but I’m running into issues that I don’t feel like solving.

13. What will be Donald Trump’s net approval on December 31, 2026?

Before discussion: N(-12, (2x12.3)²)

After discussion: N(-11.5, (2x10)²)

If I just cast his current approval (-12) forward, and adjust for std, I get N(-12, 97^2), which is wayy to wide.

Incumbent advantage has passed, I’ll just guess 2x the current polling uncertainty (σ=2x12.3) and case it forward.

14. Will Saudi Arabia and Israel agree to normalise diplomatic relations during 2026?

Before discussion: 4.3%

After discussion: 4.5%

Base rate:

The State of Israel and the Kingdom of Saudi Arabia have never had formal diplomatic relations. In 1947, Saudi Arabia voted against the United Nations Partition Plan for Palestine, and currently does not recognize Israeli sovereignty.

~ Wikipedia

No recognition for 78 years, my favourite silly rule wants to assign a base rate of 1/79=1.3%.

Khalid bin Bandar Al Saud, the Saudi ambassador to the United Kingdom, said in a 9 January 2024 BBC interview that Saudi Arabia was still interested in peace and normalized relations with Israel following the war, on the condition of the creation of a Palestinian state.

~ Wikipedia

This looks pretty bad, I think even if Netanyahu goes, it will take a while to rebuild trust. If he doesn’t go, I basically don’t see this happening.

So 8% if he goes, 1.8% if he doesn’t (up from 1.3% because of the recent thawing).

After discussion:

So 9% if he goes, 2.3% if he doesn’t (up from 1.3% because of the recent thawing).

15. How will the Supreme Court rule on Trump’s tariffs in 2026?

Before discussion: no decision .05 unlawful 37, lawful .095, mixed 45.6, 0.029

After discussion: no decision 0.05, unlawful .2, lawful .17, mixed .56, other .029

In event of any other ruling before January 1, 2027 in which the Court does not issue a merits holding on legality of any challenged measure (e.g., dismissal as improvidently granted, lack of standing, mootness, GVR, vacate-and-remand without a merits holding, a plurality with no majority judgment on legality) the question will resolve as Other ruling. ~ Resolution Criteria

Need to get a law degree, brb.

These are two questions, pretending to be one.

  1. Will the Supreme Court rule on tariffs? Since the case is already being heard by the SCOUS, most likely it will finish in 2026, cases rarely stretch beyond a year at that stage. Maybe it gets re-argued? 5%.
  2. What will the ruling be? Assuming they do rule, I expect 80% unlawful, 20% lawful, directionally, based on the McConnell/Claybourn brief. Now I need to understand Mixed and Other. Looking at last year’s prediction, Other is negligible, call it 3% of this conditional bc I did not actually get a law degree. There are broadly two classes of measures: “reciprocal” and “trafficking”.

    I think for the trafficking tariffs, the brief’s argument is weaker, call it 50/50.

    So roughly: unlawful: 39, lawful 10, mixed 48, other 3 Multiplying by .95: 37, 9.5, 45.6, 2.85

    Review: All tariffs unlawful in lower chambers, begging question why SCOUS is taking this up? Probs to declare trafficking lawful. So changing that prob to 65 lawful 35 unlawful

    .26 unlawful .12 lawful .59 mixed .03 other .247 .114 .56 .028

16. Will the United States experience negative GDP growth during Q1, Q2, or Q3 2026?

Before discussion: 30%

After discussion: no update

Guessing the base rate per quarter, which is 13%.

Actually, they are correlated, so I’ll look at per-year data, which gets me 30% (not the otherwise implied 40%).

17. Will the winner of the 2026 FIFA World Cup be a country that has never won before?

Before discussion: 29%

After discussion: 18%

The base rate for a new winner since 2000 is 30%, since 1970 it’s 28%.

BUG: 18%

18. Will the WHO declare a Public Health Emergency of International Concern in 2026?

Before discussion: 41%

After discussion: no update

I’m again going to guess the base rate here.

Looking at the H5N1 plots doesn’t change my mind. Existing cases seem contained. Future ones will either be contained or not. Already baked in.

19. Will the S&P 500 close above 7,500 at the end of 2026?

Before discussion: 80%

After discussion: 54%

More Monte Carlo.

After discussion: Bug, also bumping up slightly bc AI.

20. What will be the highest score achieved on ARC-AGI-2 before 2027?

Before discussion: 5% below .7, median .9

After discussion: no update

I am struggling to find max performance vs time data. That seems to be the sort of thing I would show on a website meant to demonstrate that the current AI paradigm can never be Truly Intelligent™. Denial?

These people to the rescue.

For ARC1 it took about a year to go from 50% to saturation. I’ll guess 5% below .7 with median .9.

21. Will there be a bilateral ceasefire in the Russo-Ukrainian conflict before 2027?

Before discussion: 35%

After discussion: 35%

Like 35%? Going up slightly, but Russia has made no direct moves clearly showing intention. So only if Ukraine breaks, and I think Europe will keep them at the brink of collapse through support.

22. Will China attack or blockade Taiwan during 2026?

Before discussion: 8.5%

After discussion: no update

Trump seems to be undermining the US’s position of strategic ambiguity. Also, aggressive drills over weapons deal, but also the weapons deal itself & the fact that no invasion followed. I still think hybrid warfare is more likely than direct military intervention, maybe going up to 8.5%?

any sinister schemes to obstruct China’s reunification are doomed to fail ~ Lin Jian in a news briefing, as reported by Al Jazeera

This guy missed the part of media training titled “How not to sound evil”.

23. Will an AI model reach a 3 hour time horizon with 80% reliability during 2026?

Before discussion: 15%

After discussion: 37%

Simple regression on log METR data shows 9.4%

I’ll bump up to 15 for alg breakthroughs and error bars.

Discussion: Mixture model of three regressions, 20% all time, 60% since 2023, 20% since 2024

24. Will OpenAI file for an IPO during 2026?

Before discussion: 65%

After discussion: 62.5%

Obviously not, Sam Altman wants world domination. 3%

Only if catastrophic funding issues, which don’t look likely unless The Bubble™ Pops™.

Actually, they’re planning to, according to Reuters. So like 65%. Could be delayed, if Altman doesn’t feel like doing it it will make him less motivated to get it done, could be a some mysterious ploy.

25. Will SpaceX successfully refuel a Starship in orbit during 2026?

Before discussion: 42%

After discussion: 38%

Let’s say 73% of success because they’ve already done an internal tank-to-tank transfer so the fluid mechanics have been derisked a bit, so the big uncertainty is the tricky alignment?

But also, two rocket launches must go smothly, let’s say 95% for each. And nothing can get postponed. Since the date isn’t fixed yet, let’s call that 65%.

Total: .42

Discussion:

NASA says unlikely, Musk says “50/50” -> it happening becomes 57%, total forecast 38%

26. Will TikTok US be banned or sold during 2026?

Before discussion: Sold/Banned/Neither 75/5/20

After discussion: Sold/Banned/Neither 78/7.5/14.5

Seems like a deal has been reached. 85% it goes through, otherwise 10% ban, 5% no action.

I think that joint venture deal would resolve as “Sold”, but I’ll leave 10% to resolve as “Neither” because of weird quirks

Sold/Banned/Neither 75/10/15

However, this Metaculus market seems this will resolve as owned by adversary

MSN reaches a new high of online journalism:

Oracle, Silver Lake, and MGX will each hold an equal 45% stake in the joint venture.

CNBC to the rescue:

The U.S. joint venture will be 50% held by a consortium of new investors, including Oracle, Silver Lake and MGX, with 15% each. Just over 30% will be held by affiliates of certain existing investors of ByteDance, and almost 20% will be retained by ByteDance, the memo said.

I expect this will be anulled because “controlled” isn’t defined enough. But under my interpretation, this will count as Sold.

Adjusting to 75/5/20, but really this is just lots of resolution lawyering uncertainty.

27. Will there be a ceasefire in the Sudanese Civil War during 2026?

Before discussion: 12%

After discussion: 8%

Again, it’s been going for 3 years, so anchoring at 33%.

If the question was about “peace in Sudan”, we’d have to much lower than that because of the number of groups involved. But the question is only about RSF/SAF.

The failed ceasefire gets me down to 15%.

There are US mediation efforts which the SAF have rejected afaict. I don’t think it benefits them.

28. What will the average global surface air temperature be in 2026 relative to the pre-industrial baseline?

Before discussion: N(1.39, 0.11²)

After discussion: N(1.41, 0.11²)

Eyeballing figure 10.4 here, we go up by 0.03°C per year.

But in the end, OLS on the linked dataset is enough, I get N(1.39, 0.11²).

Discuss:

El Ninjo seasonality, I’ll bump it up to N(1.41)

29. Will the US, UK or EU approve a gene editing therapy for a new condition during 2026?

Before discussion: 7%

After discussion: 2%

According to a quick perplexity check, nothing pending for UK or EU. I expect those apps would follow FDA approval.

For FDA, the only therapy I can find that’s reasonably close is Intellia’s NTLA-2001, which would happen 2027 earliest.

So nothing seems likely, call it 7% for emergency authorizations & model error.

Discussion:

NTLA paused for safety concerns

30. How many of the negotiating chapters required to join the EU will Montenegro have closed at the end of 2026?

Before discussion: mostly uniform between 16-22, with a bit less outside

After discussion: no update

This inspires confidence:

The goal of finalizing negotiations with the EU by the end of 2026 could be realistic if the parliament continues with constructive decision-making practices ~ Vijesti

But tbh since this is a question about the pmf of the number of chapters, I’m not sure I can do much better than mostly uniform between 16-22, with a bit less outside?

I’m really struggling to enter my true views into Metaculus.

31. Will the FDA approve a psilocybin treatment during 2026?

Before discussion: 62%

After discussion: 73%

Perplexity: Usona Institute’s uAspire (NCT06308653) has a phase 3 clinical trial completing in April 26.

Compass Pathways’ COMP360 phase 3 trial will complete in “mid-2026”.

Usona could be fast-tracked, Compass is already fast-tracked. So review for both could complete in 6 months.

Under normal conditions, that’s 80% completed review for Usona, 20% completed review for Compass. Say 71% approval chance, docking 9% given presumed drugs bias.

Crunching numbers gets me 62%. Cool!

discussion:

up to 73% because my maths is suss (maths says 76% but I’ll dock a few points out of fear)

32. Will the EU require mandatory age verification on social media or AI before 2027?

Before discussion: 9%

After discussion: 11%

There’s this age verification blueprint thing, so they’re working on something.

They passed a resolution, but that won’t necessarily be followed by passing a binding law. There might be back-and-forth with privacy advocates.

I don’t see it, 9%.

33. Will restrictions on the use of the Traditional Latin Mass in the Catholic Church be loosened during 2026?

Before discussion: 38%

After discussion: 32%

Leo is politicking, his big “the Tory mass is ok” thing has already happened, good chance he’ll move on to other issues, since it’s not about the TLM for anyone anyway. Or he won’t.

38%

Discussion:

Leo “does not wish to abrigate” the existing restrictions

34. Will the U.S. enact an AI safety federal statute or executive order in 2026?

Before discussion: 67%

After discussion: 52%

Probably, but could be use based rather than capability based. Trump not likes regulation, but there’s some work being done with his blessing.

Thoughts about this year’s contest

Many questions’s Background Info contained all the information needed to form a good base rate, for example this question about the US supreme court linked a spreadsheet which calculated the base rate for this question. I really like this move, as it saves us from doing a lot of predictable, tedious work.

Speaking of predictable and tedious. AI tools are on their way to outperform human forecasters, in fact I predicted that this will happen in this contest. But I also found it very helpful to delegate some of the rote tasks of forecasting to them:

Asking Claude to look through some actuarial tables for me.

Another difference in this year’s contest was having more questions asking for a continuous variable. I think that makes the contest more interesting, but I have found that the amount of work involved in those questions is much larger. I have no intuition about standard deviations, meaning that I had to build quantitative models even for the simpler questions. Arguably a skill issue, but combined that with closing the forecast for submissions almost two weeks earlier than the year before, this year’s contest I found myself scrambling a bit. I’m not sure I could have taken part if I didn’t happen to be between jobs at the start of 2026.

One last thing about continuous variables: It was tricky to enter its outputs into Metaculus, and I had to write a lot of code like this

values = my_fancy_model(...)
(
    np.median(values),
    np.quantile(values, [.25, .75]),
    (values > 200000).sum() / len(values),
)

and then spend a good few minutes on wiggling the CDF sliders in the web UI until the numbers approximated mine. It would be nice if there was a more standardized format I could paste into the UI, e.g. the 0.1, 0.2, … 0.9th quantile of my distribution.

Overall though, a great contest as always, and I look forward to tracking the news over the course of the year to see how I fared.


  1. As of me writing this, two questions are still unresolved, and one of those five is me! Update 2026-03-22: :(