I’ve just returned from LWCW2025. One thing that kept coming up was people’s AI timelines. I decided to finally write down an explicit model.

About half an afternoon’s worth of thinking went into this, and it should be treated with the appropriate scepticism. I didn’t do much research, so this is a detailed, explicit prior, no more, no less. These are my personal notes, lightly edited. I’ve added clarifications as footnotes just so that I can get something out quickly.

I try to answer two questions: When will all white-collar jobs be automated? And: How likely is it we all die?1

If you would like to construct a model like this yourself, and I recommend that you do, I would suggest you skip this post until you have done so, so that you don’t anchor yourself too early. You can skip to the last section for a brief discussion of how such a model can be useful if you’re doing AI safety work.

How long until all jobs automated?

For that to happen, we need

  • Capabilities: The models need to be good enough to do the work.
  • Compute: There need to be enough computers to run all that inference.
  • Adoption:

Capabilities

Two possible scenarios here. Either Scaling Is All We Need™, we just need to throw more training at it, or we need a Paradigm Shift™, consisting in one or more algorithmic breakthroughs.

  1. No new paradigm needed (p=20%)
    • Can we just follow METR until we hit 95% success at a 1 month project? That would take 4-6 years, at which point AIs are very good ICs.
    • Either: management2 is easy, and AI at this point can basically do projects at arbitrary length (p=18%)
    • Or: management is hard, we’re bottlenecked by data, every white collar job is management for a while (p=82%). If we’re in this world, we’ll need to collect activity-specific data and fine-tune on that, say 4-8 years.
    • If we don’t need a new paradigm, we’re 4-14 years away, median 10
  2. Yes new paradigm (p=80%)
    • How often do big algorithmic breakthroughs occur? Let’s count them3. we’ve had the following big breakthroughs in AI recently:
      • LSTM (1997)
      • attention (2014)
      • RLHF (2020)
    • We get 3 transformative breakthroughs per 28 years, for a 10% yearly risk. If I’ve missed ~2, that’s an 18% yearly risk. This puts us 3.5-6.5 * x years away, where x is the number of needed breakthroughs.
    • I’ll put 50% on one breakthrough, 25% on two & 25% on three.
    • I’ll allow 2 years for infra changes (e.g. with online learners, the entire fronteer model backend would have to be         re-architected) and incremental improvements: median 10 years away (4-16)

Compute

I’m assuming we’ll get there way before capabilities are there. We’re already building compute like crazy, and potentially there will be arms-race dynamics causing us to build more than we actually need.

Adoption

Who knows, probably quite a while, 5 years in more conservative industries wouldn’t be unreasonable, much more on the tails. I expect that established organisations will resist this because people won’t want to automate themselves away. They will be less competitive than an equivalent company that’s more automated, but they will have all the other advantages of an incumbent.

Overall

Under this model, you’re looking at 4-16 (median=10) years for all white collar work to be automatable, and after that you can still probably hang on a few years due to institutional inertia. So perhaps 12 years median at a tech startup?

Common sense applies here:

  • If your role has an aspect of maintaining long-term professional relationships, you’re probably fine as long as your counterparty sticks around.
  • If your role has a physical component to it, you’ll keep it until robotics is solved, which I suspect follows a different curve.
  • If your role is very regulated or you’re a part of a union or other professional association that protects your interests, that helps too.

How likely is an existential catastrophe in the 5 years after they’re at the any-(white-collar)-job-level?

There are two main categories of risk here. Risk from misaligned AI, and risk from humans using aligned AI to do human things.

AI vs humans

If this was a human, and we were worried about them killing a large number of people, we’d be looking at means, motive and opportunity.

If we’re at the “any white collar job” mark, I’ll say there are means. I’ll model that as 100% to keep the maths easy.

Motive is present by default for any goal-directed optimizer, because of instrumental convergence. Let’s grant 8% that that’s somehow compatible with continued human non-torturous existence.

If current LLMs had more capabilities, they would do some damage, then get distracted. It wouldn’t be catastrophic, the world is too resilient.

The RL agents are currently too sociopathic to be useful, disabling their own tests and all that. Commercial pressure is towards calming them down. But OTOH, goal achievers are good at .. achieving goals, so let’s say 60% that we end up with hypercapable RL agents deployed in production.

So that’s 60% for motive, or more specifically, for intent to do things that would lead to existential catastrophe. Remember, this is just the case where these systems are misaligned with their operators.

The model will succeed if it passes through our defences. We can model those as a layered (perhaps half-hearted) defence in depth model: (p_attempted/p_success)

  • design models that are inherrently safe4 (80%/6%)
  • catch daangerous models before releasing them (100%/10%)
  • don’t give models access to dangerous affordances (2%/40%)
  • supervise models to detect dangerous behaviour at runtime (80%/30%)
  • something like Pause.AI puts an end to everything (80%/10%)

If we multiply all of this out, each individual attempt has a chance of success of 59% (gulp). Assuming each major lab has to have their own hiccup to implement perfect countermeasures in perpetuity, we need to have 3 filters like these succeeding (not counting META because they don’t actually care about AGI). If you allow that, this puts the chance of catastrophe at 93%, given an RL model.

This gives us 56% chance of doom from misaligned AI.

I want to push this down to like 45%, because oneshotting anything, including existential catastrophe, is very hard, and I think after a serious near-miss we’ll do what it takes to make another attempt much harder. Would adjust further down if I was certain of that.

Humans + AI vs humans

Not much for superpower conflict that isn’t already covered by misalignment. Superpowers want to win, not anihilate everything. MAD is a counterexample but already possible with nukes.

Biggest source of risk here are small unilateralists: terrorists & similar. Maybe like 1.5% per group and 6 motivated groups on a 10 year timeline? So 0.9%/year once AI is at white-collar level.

I mostly trust safety measures here (+ meta’s willingness to protect its own IP once its model gets big enough). Sophisticated terrorists are very few and I think mostly they would need to have good execution, which is a filter; AND terrorism is actually one of the issues NSAs are paying attention to.

Using this model

Useful life advice Look at this neat maths!

I realized while writing this that if you were planning to work on de-risking AI and weighing between different approaches, having an explicit quantitative model could be helpful (although probably you should spend more than half an afternoon on it).

For each approach, try to figure out how much your work shifts the probability of catastrophe (defined in a way that you care about) downwards5. Then work on the thing that moves the number the most.

For example, if you think that there are five factors pip_i you consider important, and the probability of success is the product of those five factors; and if you think that you could have a roughly equal percentage-point impact, you are essentially looking for

arg maxipiΠj5pj=arg minipi. \argmax_i \frac {\partial} {\partial p_i} \Pi_j^5 p_j = \argmin_i { p_i }.

We’ve re-derived the neglectedness criterion for evaluating philanthropic cause areas.


  1. Assuming we don’t discover any new physics, 100%. Next question. (But yes, I’ll make this specific enough to forecast on later, don’t worry). 

  2. Management of work, that is, at the scale of a corporation 

  3. I know that this seems insane, but back-of-the-envelope estimation like that can be shockingly effective. Also, notice that my model is not actually very sensitive to the exact rate of breakthroughs. 

  4. I think this is the same as solving the hard alignment problem. 

  5. This assumes that you’re a good person.