New Paradigms Won't Save You
...
One popular objection to AI concerns is to declare that LLMs can never be AGI. You need a “new paradigm”. Therefore, AGI is so far in the future that it’s not worth worrying about.
A common counterargument is to claim that no, LLMs can become AGI. But even without that counterargument, I think the “therefore” fails on its own terms. The key question is: how much of a new paradigm do we need?
The landmark discoveries on the road to modern LLMs are something like:
1950s: Neural networks
1967: Multi-layer perceptron
2010: Modern deep learning
2017: Transformer, LLM
2022: RLHF, chatbots
2024: Chain of thought / test-time compute
We can think of this as an “evolutionary tree”, where a given LLM (let’s say Claude Opus 4.7) shares a recent “common ancestor” with all other chatbots, and only a very distant “common ancestor” with everything else descended from the multi-layer perceptron. If AGI needs a “new paradigm”, what common ancestor can we expect AGI and LLMs to share?
AGI will very likely use neural networks, because the human brain is a neural network and qualifies as an AGI. It will probably use deep learning, because although deep learning isn’t exactly analogous to the brain, it seems like a pretty reasonable way to emulate the brain’s learning algorithms onto computer hardware.
Skeptics like Yann LeCun and Gary Marcus usually pinpoint LLMs/transformers as the step where we went wrong. LeCun thinks that the first AGIs may be within the deep learning paradigm (but not LLMs); Marcus thinks that they’ll combine insights from deep learning with something else.
How soon should we expect a new paradigm as revolutionary as LLMs/transformers? Since we got LLMs/transformers nine years ago, Lindy’s Law suggests nine more years. How soon should we expect a new paradigm as revolutionary as deep learning? By the same logic, sixteen years from now.
Lindy’s Law has a heavy tail, which means we can’t simply halve these to find our 25th percentile estimate. Our 25th percentile estimate for the next advance as exciting as LLMs should be three years from now; for deep learning, it’s five years.
So even if you think AGI will require a further paradigm shift as big as the invention of the LLM or as deep learning itself, you should have 25% chance it will be developed in the next 3 - 5 years. Which is about as long as the LLM-only crowd think things will take! This isn’t an excuse for relegating the risk of AGI to some vague indefinite future. It could still be the late 2020s or early 2030s!
(Might we expect that low-hanging-fruit effects make the next paradigm harder to find than the last one? In practice, fields get more researchers as time goes on, and that effect usually causes time-between-advances to be approximately constant. And in fact, the number of AI researchers has grown at an unprecedented pace for a scientific field, and growth will enter an even faster regime once AIs themselves can contribute. Overall these make me think things will go even faster than Lindy’s Law predicts - but I think Lindy’s Law is a useful upper bound.)
(Would there still be a long time between the invention of the new paradigm and the point where it could be used to maximum effect? It took five years between the invention of the transformer and ChatGPT, the first commercially-successful transformer-based project. But most of that time was spent scaling up, and we’ve already scaled up. If we invent a new paradigm in 2030, then any frontier lab willing to bet on it can quickly provide it with levels of compute sufficient to train human-brain-sized models.)
This is my attempt to talk to the new-paradigm-wanters in their own language, but I think there’s also a subtler point that undermines this worldview. In the past, new paradigms have proven useful in allowing scaling to continue after an old paradigm passed the regime where it could efficiently convert scale to results. LLMs still seem to be able to convert scale to results; while this continues, new paradigms won’t be necessary, and frontier labs won’t risk pursuing them. If scaling ever hits a wall, there will be a few months of confusion as frontier labs look over various new-paradigm-proposals that they already have lying around, and throw them at the wall to see what breaks through. Then scaling will continue from wherever it left off.
The best way to forecast future AI progress is to extrapolate from current LLM scaling. This should work if LLMs scale all the way to AGI. But it may also work even if they don’t. First, because we might get the new paradigm so soon that it won’t be a significant source of delay. And second, because the most likely place for a new paradigm to start is wherever LLMs stop working, going at the same rate.
