Just because something exhibits an exponential growth at one point in time, that doesn’t mean that a particular subject is capable of sustaining exponential growth.
Their Covid example is a great counter argument to their point in that covid isn’t still growing exponentially.
Where the AI skeptics (or even just pragmatists, like myself) chime in is saying “yeah AI will improve. But LLMs are a limited technology that cannot fully bridge the gap between what they’re producing now, and what the “hypists” claim they’ll be able to do in the future.”
People like Sam Altman know ChatGPT is a million miles away from AGI. But their primary goal is to make money. So they have to convince VCs that their technology has a longer period of exponential growth than what it actually will have.
The argument is not that it will keep growing exponentially forever (obviously that is physically impossible), rather that:
- given a sustained history of growth along a very predictable trajectory, the highest likelihood short term scenario is continued growth along the same trajectory. Sample a random point on an s-curve and look slightly to the right, what’s the most common direction the curve continues?
- exponential progress is very hard to visualize and see, it may appear to hardly make any progress while far away from human capabilities, then move from just below to far above human very quickly
So it's an argument impossible to counter because it's based on a hypothesis that is impossible to falsify: it predicts that there will either be a bit of progress, or a lot of progress, soon. Well, duh.
My point is that the limits of LLMs will be hit long before we they start to take on human capabilities.
The problem isn’t that exponential growth is hard to visualise. The problem is that LLMs, as advanced and useful a technique as it is, isn’t suited for AGI and thus will never get us even remotely to the stage of AGI.
The human like capabilities are really just smoke and mirrors.
It’s like when people anthropomorphisise their car; “she’s being temperamental today”. Except we know the car is not intelligence and it’s just a mechanical problem. Whereas it’s in the AI tech firms best interest to upsell the human-like characteristics of LLMs because that’s how they get VC money. And as we know, building and running models isn’t cheap.
My problem with takes like this is it presumes a level of understanding of intelligence in general that we simply do not have. We do not understand consciousness at all, much less consciousness that exhibits human intelligence. How are we to know what the exact conditions are that result in human-like intelligence? You’re assuming that there isn’t some emergent phenomenon that LLMs could very well achieve, but have not yet.
I'm not making a philosophical argument about what human-like intelligence is. I'm saying LLMs have many weaknesses that make in incapable of performing basic functions that humans take for granted. Like count and recall.
Ostensibly, AGI might use LLMs in parts of it's subsystems. But the technology behind LLMs doesn't adapt to all of the problems that AGI would need to solve.
It's a little like how the human brain isn't just one homogeneous grey lump. There's different parts of the brain that specialize on different parts of cognitive processing.
LLMs might work for language processing, but that doesn't mean it would work for maths reasoning -- and in fact we already know it doesn't.
This is why we need tools / MCPs. We need ways of turning problems LLM cannot solve into standalone programs that LLMs can cheat and ask the answers for.
AI services are/will be going hybrid. Just like we have seen in search, with thousands of dedicated subsystems handling niches behind the single unified ui element or api call.
>the limits of LLMs will be hit long before we they start to take on human capabilities.
Why do you think this? The rest of the comment is just rephrasing this point ("llms isn't suited for AGI"), but you don't seem to provide any argument.
The problem with LLMs are that they’re, at their core, a token prediction model. Tokens, typically text, are given a numeric value and can then be used to predict what tokens should follow.
This makes them extremely good things like working with source code and other source of text where relationships are defined via semantics.
The problem with this is that it makes them very poor at dealing with:
1. Limited datasets. Smaller models are shown to be less powerful. So often LLMs need to inject significantly more information than a human would learn in their entire life time, just to approximate what that human might produce in any specific subject.
2. Learning new content. Here we have to rely on non-AI tooling like MCPs. This works really well under the current models because we can say “scrape these software development references” (etc) to keep itself up to date. But there’s no independence behind those actions. An MCP only works because it includes into the prompt how to use that MCP and why you should use that. Whereas if you look at humans, even babies know how to investigate and learn independently. Our ability to self-learn is one of the core principles of human intelligence.
3. Remember past content that resides outside of the original model training. I think this is actually a solvable problem in LLMs but there’s current behaviour of them is to bundle all the current interactions into the next prompt. In reality, the LLM hasn’t really remembered anything, you’re just reminding it about everything with each exchange. So each subsequent prompt gets longer and thus more fallible. It also means that context is always volatile. Basically it’s just a hack that only works because context sizes have grown exponentially. But if we want AGI then there needs to be a persistent way of retaining that context. There are some work around here, but they depend on tools.
4. any operation that isn’t semantic-driven. Things like maths, for example. LLMs have to call a tool (like MCPs) to perform calculations. But that requires having a non-AI function to return a result rather than the AI reason about maths. So it’s another hack. And there are a lot of domains that fall into this kind of category where complex tokenisation is simply not enough. This, I think, is going to be the biggest hurdle for LLMs.
5. Anything related to the physical world. We’ve all seen examples of computer vision models drawing too many fingers on a hand or have disembodied objects floating. The solutions here are to define what a hand should look like. But without an AI having access to a physical 3 dimensional world to explore, it’s all just guessing what things might look like. This is particularly hard for LLMs because they’re language models, not 3D coordinate systems.
There’s also the question about whether holding vector databases of token weights is the same thing as “reasoning”, but I’ll leave that argument for the philosophers.
I think a theoretical AGI might use LLMs as part of its subsystems. But it needs to leverage AI throughout, which LLMs cannot, as it needs handle topics that are more than just token relationships, which LLMs cannot do.
There is no particular reason why AI has to stick to language models though. Indeed if you want human like thinking you pretty much have to go beyond language as we do other stuff too if you see what I mean. A recent example: "Google DeepMind unveils its first “thinking” robotics AI" https://arstechnica.com/google/2025/09/google-deepmind-unvei...
> There is no particular reason why AI has to stick to language models though.
There’s no reason at all. But that’s not the technology that’s in the consumer space, growing exponentially, gaining all the current hype.
So at this point in time, it’s just a theoretical future that will happen inevitably but we don’t know when. It could be next year. It could be 10 years. It could be 100 years or more.
My prediction is that current AI tech plateaus long before any AGI-capable technology emerges.
That's a rather poor choice for an example considering Gemini Robotics-ER is built on a tuned version of Gemini, which is itself an LLM. And while the action model is impressive, the actual "reasoning" here is still being handled by an LLM.
From the paper [0]:
> Gemini Robotics 1.5 model family. Both Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 inherit Gemini’s multimodal world knowledge.
> Agentic System Architecture. The full agentic system consists of an orchestrator and an action model that are implemented by the VLM and the VLA, respectively:
> • Orchestrator: The orchestrator processes user input and environmental feedback and controls the overall task flow. It breaks complex tasks into simpler steps that can be executed by the VLA, and it performs success detection to decide when to switch to the next step. To accomplish a user-specified task, it can leverage digital tools to access external information or perform additional reasoning steps. We use GR-ER 1.5 as the orchestrator.
> • Action model: The action model translates instructions issued by the orchestrator into low-level robot actions. It is made available to the orchestrator as a specialized tool and receives instructions via open-vocabulary natural language. The action model is implemented by the GR 1.5 model.
AI researchers have been trying to discover workable architectures for decades, and LLMs are the best we've got so far. There is no reason to believe that this exponential growth on test scores would or even could transfer to other architectures. In fact, the core advantage that LLMs have here is that they can be trained on vast, vast amounts of text scraped from the internet and taken from pirated books. Other model architectures that don't involve next-token-prediction cannot be trained using that same bottomless data source, and trying to learn quickly from real-world experiences is still a problem we haven't solved.
That feels like you're moving the goal posts a bit.
Exponential growth over the short term is very uninteresting. Exponential growth is exciting when it can compound.
E.g. if i offered you an investing opportunity 500% / per year compounded daily - that's amazing. If the fine print is that that rate will only last for the very near term (say a week), then it would be worse than a savings account.
Well, growth has been on this exponential already for 5+ years (for the METR eval), and we are at the point where models are very close to matching human expert capabilities in many domains - only one or two more years of growth would put us well beyond that point.
Personally I think we'll see way more growth than that, but to see profound impacts on our economy you only need to believe the much more conservative assumption of a little extra growth along the same trend.
> we are at the point where models are very close to matching human expert capabilities in many domains
That's a bold claim. I don't think it matches most people's experiences.
If that was really true people wouldn't be talking about exponential growth. You don't need exponential growth if you are already almost at your destination.
What I’ve seen is that LLMs are very good at simulating an extremely well read junior.
Models know all the tricks but not when to use them.
And because of that, you’re continually have to hand hold them.
Working with an LLM is really closer to pair programming than it is handing a piece of work to an expert.
The stuff I’ve seen in computer vision is far more impressive in terms of putting people out of a job. But even there, it’s still highly specific models left to churn away at tasks that are ostensibly just long and laborious tasks. Which so much of VFX is.
> we are at the point where models are very close to matching human expert capabilities in many domains
This is not true because experts in these domains don't make the same routine errors LLMs do. You may point to broad benchmarks to prove your point, but actual experts in the benchmarked fields can point to numerous examples of purportedly "expert" LLMs making things up in a way no expert would ever.
Expertise is supposed to mean something -- it's supposed to describe both a level of competency and trustworthiness. Until they can be trusted, calling LLMs experts in anything degrades the meaning of expertise.
The most common part of the S-curve by far is the flat bit before and the flat bit after. We just don't graph it because it's boring. Besides which there is no reason at all to assume that this process will follow that shape. Seems like guesswork backed up by hand waving.
Very much handwaving. The question is not meaningful at all without knowing the parameters of the S-curve. It's like saying "I flipped a coin and saw heads. What's the most likely next flip?"
> Just because something exhibits an exponential growth at one point in time, that doesn’t mean that a particular subject is capable of sustaining exponential growth.
Which is pretty ironic given the title of the post
>People notice that while AI can now write programs, design websites, etc, it still often makes mistakes or goes in a wrong direction, and then they somehow jump to the conclusion that AI will never be able to do these tasks at human levels, or will only have a minor impact. When just a few years ago, having AI do these things was complete science fiction!
Both things can be true, since they're orthogonal.
Having AI do these things was complete fiction 10 years ago. And after 5 years of LLM AI, people do start to see serious limits and stunted growth with the current LLM approaches, while also seeing that nobody has proposed another serious contended to that approach.
Similarly, going to the moon was science finction 100 years ago. And yet, we're now not only not in Mars, but 50+ years without a new moon manned landing. Same for airplanes. Science fiction in 1900. Mostly stale innovation wise for the last 30 years.
A lot of curves can fit an exponential line plot, without the progress going forward being exponential.
We would have 1 trillion transistor cpus following Moore's "exponential curve"
I agree with all your points, just wanted to say that transistor count is probably a counter example. We have been keeping with the Moore's Law more or less[1] and M3 Max, a 2023 consumer-grade CPU, has ~100B of transistors, "just" one order of magnitude away from yout 1T. I think that shows we haven't stagnated much in transistor density and the progress is just staggering!
That one order of magnitude is about 7 years behind the Moore's Law. We're still progressing but it's slower, more expensive and we hit way more walls than before.
Except it’s not been five years, it’s been at most three, since approximately no one was using LLMs prior to ChatGPT’s release, which was just under three years ago. We did have Copilot a year before that, but it was quite rudimentary.
And really, we’ve had even less than that. The first large scale reasoning model was o1, which was released 12 months ago. More useful coding agents are even newer than that. This narrative that we’ve been using these tools for many years and are now hitting a wall doesn’t match my experience at all. AI-assisted coding is way better than it was a year ago, let alone five.
>Except it’s not been five years, it’s been at most three,
Why would it be "at most" 3? We had Chat GPT commercially available as private beta API on 2020. It's only the mass public that got 3.5 3 years ago.
But those who'd do the noticing as per my argument is not just Joe Public (which could be oblivious), but people already starting in 2020, and includes people working in the space, who worked with LLM and LLM-like architectures 2-3 years before 2020.
> Given consistent trends of exponential performance improvements over many years and across many industries, it would be extremely surprising if these improvements suddenly stopped.
I'm sure people were saying that about commercial airline speeds in the 1970's too.
But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
With LLM's at the moment, the limiting factors might turn out to be training data, cost, or inherent limits of the transformer approach and the fact that LLM's fundamentally cannot learn outside of their context window. Or a combination of all of these.
The tricky thing about S curves is, you never know where you are on them until the slowdown actually happens. Are we still only in the beginning of the growth part? Or the middle where improvement is linear rather than exponential? And then the growth starts slowing...
> a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
Yes of course it’s not going to increase exponentially forever.
The point is, why predict that the growth rate is going to slow exactly now? What evidence are you going to look at?
It’s possible to make informed predictions (eg “Moore’s law can’t get you further than 1nm with silicon due to fundamental physical limits”). But most commenters aren’t basing their predictions in anything as rigorous as that.
And note, there are good reasons to predict a speedup, too; as models get more intelligent, they will be able to accelerate the R&D process. So quality per-researcher is now proportional to the exponential intelligence curve, AND quantity of researchers scales with number of GPUs (rather than population growth which is much slower).
NOTE IN ADVANCE: I'm generalizing, naturally, because talking about specifics would require an essay and I'm trying to write a comment.
Why predict that the growth rate is going to slow now? Simple. Because current models have already been trained on pretty much the entire meaningful part of the Internet. Where are they going to get more data?
The exponential growth part of the curve was largely based on being able to fit more and more training data into the models. Now that all the meaningful training data has been fed in, further growth will come from one of two things: generating training data from one LLM to feed into another one (dangerous, highly likely to lead to "down the rabbit hole forever" hallucinations, and weeding those out is a LOT of work and will therefore contribute to slower growth), or else finding better ways to tweak the models to make better use of the available training data (which will produce growth, but much slower than what "Hey, we can slurp up the entire Internet now!" was producing in terms of rate of growth).
And yes, there is more training data available because the Internet is not static: the Internet of 2025 has more meaningful, human-generated content than the Internet of 2024. But it also has a lot more AI-generated content, which will lead into the rabbit-hole problem where one AI's hallucinations get baked into the next one's training, so the extra data that can be harvested from the 2025 Internet is almost certainly going to produce slower growth in meaningful results (as opposed to hallucinated results).
This is a great question, but note that folks were freaking out about this a year or so ago and we seem to be doing fine.
We seem to be making progress with some combination of synthetic training datasets on coding/math tasks, textbooks authored by paid experts, and new tokens (plus preference signals) generated by users of the LLM systems.
It wouldn’t surprise me if coding/math turned out to have a dense-enough loss-landscape to produce enough synthetic data to get to AGI - though I wouldn’t bet on this as a highly likely outcome.
I have been wanting to read/do some more rigorous analysis here though.
This sort of analysis would count as the kind of rigorous prediction that I’m asking for above.
I am extremely confident that AGI, if it is achievable at all (which is a different argument and one I'm not getting into right now), requires a world model / fact model / whatever terminology you prefer, and is therefore not achievable by models that simply chain words together without having any kind of understanding baked into the model. In other words, LLMs cannot lead to AGI.
I disagree that generic LLMs plus CoT/reasoning/tool calling (ie the current stack) cannot in principle implement a world model.
I believe LLMs are doing some sort of world modeling and likely are mostly lacking a medium-/long-term memory system in which to store it.
(I wouldn’t be surprised if one or two more architectural overhauls end up occurring before AGI, I also wouldn’t be surprised if these occurred seamlessly with our current trajectory of progress)
Isn’t the memory the pre-trained weights that let it do anything at all? Or do you mean they should be capable of refining them in real-time (learning).
The human brain has many systems that adapt on multiple time-frames which could loosely be called “memory”.
But here I’m specifically interested in real-time updates to medium/long term memory, and the episodic/consciously accessible systems that are used in human reasoning/intelligence.
Eg if I’m working on a big task I can think through previous solutions I learned, remember the salient/surprising lessons, recall recent conversations that may indirectly affect requirements, etc. The brain is clearly doing an associative compression and indexing operation atop the raw memory traces. I feel the current LLM “memory” implementations are very weak compared to what the human brain does.
I suppose there is a sense in which you could say the weights “remember” the training data, but it’s read-only and I think this lack of real-time updating is a crucial gap.
To expand on my hunch about scaffolding - it may be that you can construct an MCP module that can let the LLM retrieve or ruminate on associative memories in such a way as to allow the LLM to not make the same mistake twice and be steerable on a longer timeframe.
I think the best argument against my hunch is that human brains have systems which update the synaptic weights themselves over a timeframe of days-to-months, and so if neural plasticity is the optimal solution here then we may not be able to efficiently solve the problem with “application layer” memory plugins.
But again, there is a lot of solution-space to explore; maybe some LoRA-like algorithm can allow an LLM instance to efficiently update its own weights at test-time, and persist those deltas for efficient inference, thus implementing the required neural plasticity algorithms?
Curiously, humans don't seem to require reading the entire internet in order to perform at human level on a wide variety of tasks... Nature suggests that there's a lot of headroom in algorithms for learning on existing sources. Indeed, we had models trained on the whole internet a couple years ago, now, yet model quality has continued to improve.
Meanwhile, on the hardware side, transistor counts in GPUs are in the tens of billions and still increasing steadily.
This is a time horizon thing though. Over the course of future human history AI development might look exponential but that doesn’t mean there won’t be significant plateaus. We don’t even fully understand how the human brain works so whilst the fact it does exist strongly suggests it’s replicable (and humans do it naturally) that doesn’t make it practical in any time horizon that matters to us now. Nor does there seem to be fast movement in that direction since everyone is largely working on the same underlying architecture that isn’t similar to the brain.
Alternative argument, there is no need for more training data, just better algorithms. Throwing more tokens at the problem doesn't solve the fact that training llms using supervised learning is a poor way to integrate knowledge. We have however seen promising results coming out of reinforcement learning and self play. Which means that anthropic and openais' bet on scale is likely a dead end, but we may yet see capability improvements coming from other labs, without the need for greater data collection.
Better algorithms is one of the things I meant by "better ways to tweak the models to make better use of the available training data". But that produces slower growth than the jaw-droppingly rapid growth you can get by slurping pretty much the whole Internet. That produced the sharp part of the S curve, but that part is behind us now, which is why I assert we're approaching the slower-growth part at the top of the curve.
> The point is, why predict that the growth rate is going to slow exactly now? What evidence are you going to look at?
Why predict that the (absolute) growth rate is going to keep accelerating past exactly now?
Exponential growth always assumes a constant relative growth rate, which works in the fiction of economics, but is otherwise far from an inevitability. People like to point to Moore's law ad nauseam, but other things like "the human population" or "single-core performance" keep accelerating until they start cooling off.
> And note, there are good reasons to predict a speedup, too; as models get more intelligent, they will be able to accelerate the R&D process.
And if heaven forbid, R&D ever turns out to start taking more work for the same marginal returns on "ability to accelerate the process", then you no longer have an exponential curve. Or for that matter, even if some parts can be accelerated to an amazing extent, other parts may get strung up on Amdahl's law.
It's fine to predict continued growth, and it's even fine to predict that a true inflection point won't come any time soon, but exponential growth is something else entirely.
> Why predict that the (absolute) growth rate is going to keep accelerating past exactly now?
By following this logic you should have predicted Moore’s law would halt every year for the last five decades. I hope you see why this is a flawed argument. You prove too much.
But I will answer your “why”: plenty of exponential curves exist in reality, and empirically, they can last for a long time. This is just how technology works; some exponential process kicks off, then eventually is rate-limited, then if we are lucky another S-curve stacks on top of it, and the process repeats for a while.
Reality has inertia. My hunch is you should apply some heuristic like “the longer a curve has existed, the longer you should bet it will persist”. So I wouldn’t bet on exponential growth in AI capabilities for the next 10 years, but I would consider it very foolish to use pure induction to bet on growth stopping within 1 year.
And to be clear, I think these heuristics are weak and should be trumped by actual physical models of rate-limiters where available.
> By following this logic you should have predicted Moore’s law would halt every year for the last five decades. I hope you see why this is a flawed argument. You prove too much.
I do think it's continually amazing that Moore's law has continued in some capacity for decades. But before trumpeting the age of exponential growth, I'd love to see plenty of examples that aren't named "Moore's law": as it stands, one easy hypothesis is that "ability to cram transistors into mass-produced boards" lends itself particularly well to newly-discovered strategies.
> So I wouldn’t bet on exponential growth in AI capabilities for the next 10 years, but I would consider it very foolish to use pure induction to bet on growth stopping within 1 year.
Great, we both agree that it's foolish to bet on growth stopping within 1 year. What I'm saying that "growth doesn't stop" ≠ "growth is exponential".
A theory of "inertia" could just as well support linear growth: it's only because we stare at relative growth rates that we treat exponential growth as a "constant" that will continue in the absence of explicit barriers.
Solar panel cost per watt has been dropping exponentially for decades as well...
Partly these are matters of economies of scale - reduction in production costs at scale - and partly it's a matter of increasing human attention leading to steady improvements as the technology itself becomes more ubiquitous.
This is where I’d really like to be able to point to our respective Manifold predictions on the subject; we could circle back in a year’s time and review who was in fact correct. I wager internet points it will be me :)
I think progress per dollar spent has actually slowed dramatically over the last three years. The models are better, but AI spending has increased by several orders of magnitude during the same time, from hundreds of millions to hundreds of billions. You can only paper over the lack of fundamental progress by spending on more compute for so long. And even if you manage to keep up the current capex, there certainly isn't enough capital in the world to accelerate spending for very long.
> why predict that the growth rate is going to slow exactly now?
why predict that it will continue? Nobody ever actually makes an argument that growth is likely to continue, they just extrapolate from existing trends and make a guess, with no consideration of the underlying mechanics.
Oh, go on then, I'll give a reason: this bubble is inflated primarily by venture capital, and is not profitable. The venture capital is starting to run out, and there is no convincing evidence that the businesses will become profitable.
Indeed you can't be sure. But on the other hand a bunch of the commentariat has been claiming (with no evidence) that we're at the midpoint of the sigmoid for the last three years. They were wrong. And then you had the AI frontier lab insiders who predicted an accelerating pace of progress for the last three years. They were right. Now, the frontier labs rarely (never?) provide evidence either, but they do have about a year of visibility into the pipeline, unlike anyone outside.
So at least my heuristic is to wait until a frontier lab starts warning about diminishing returns and slowdowns before calling the midpoint or multiple labs start winding down capex. The first component might have misaligned incentives, but if we're in a realistic danger of hitting a wall in the next year, the capex spending would not be accelerating the way it is.
Capex requirements might be on a different curve than model improvements.
E.g. you might need to accelerate spending to get sub-linear growth in model output.
If valuations depend on hitting the curves described in the article, you might see accelerating capex at precisely the time improvements are dropping off.
I don’t think frontier labs are going to be a trustworthy canary. If Anthropic says they’re reaching the limit and OpenAI holds the line that AGI is imminent, talent and funding will flee Anthropic for OpenAI. There’s a strong incentive to keep your mouth shut if things aren’t going well.
I think you nailed it. The capex is desperation in the hopes of maintaining the curve. I have heard actual AI researchers say progress is slowing, just not from the big companies directly.
> Indeed you can't be sure. But on the other hand a bunch of the commentariat has been claiming (with no evidence) that we're at the midpoint of the sigmoid for the last three years.
I haven’t followed things closely, but I’ve seen more statements that we may be near the midpoint of a sigmoid than that we are at it.
> Thy were wrong. And then you had the AI frontier lab insiders who predicted an accelerating pace of progress for the last three years. They were right.
I know it’s an unfair question because we don’t have an objective way to measure speed of progress in this regard, but do you have evidence for models not only getting better, but getting better faster? (Remember: even at the midpoint of a sigmoid, there still is significant growth)
I thought the original article included the strongest objective data point on this: recent progress on the METR long task benchmark isn't just on the historical "task length doubling every 7 months" best fit, but is trending above it.
A year ago, would you have thought that a pure LLM with no tools could get a gold medal level score in the 2025 IMO finals? I would have thought that was crazy talk. Given the rates of progress over the previous few years, maybe 2027 would have been a realistic target.
> I thought the original article included the strongest objective data point on this: recent progress on the METR long task benchmark isn't just on the historical "task length doubling every 7 months" best fit, but is trending above it.
There is selection bias in that paper. For example, they chose to measure “AI performance in terms of the length of tasks the system can complete (as measured by how long the tasks take humans)”, but didn’t include calculation tasks in the set of tasks, and that’s a field in which machines have been able to reliably do tasks for years that humans would take centuries or more to perform, but at which modern LLM-based AIs are worse than, say, Python.
I think leaving out such taks is at least somewhat defensible, but have to wonder whether there are other tasks at which LLMs do not become better as rapidly they also leave out.
Maybe it is a matter of posing different questions, with the article being discussed being more interested in “(When) can we (ever) expect LLMs to do jobs that now require humans to do?” than in “(How fast) do LLMs get smarter over time?”
Or are the model author’s, i.e the blog author with a vested interest, getting better at optimizing for the test while real world performance aren’t increasing as fast?
There are a few other limitations, in particular how much energy, hardware and funding we (as a society) can afford to throw at the problem, as well as the societal impact.
AI development is currently given a free pass on these points, but it's very unclear how long that will last. Regardless of scientific and technological potential, I believe that we'll hit some form of limit soon.
There's a Mulla Nasrudin joke that's sort of relevant here:
Nasrudin is on a flight, when suddenly the pilot comes on the intercom, saying, "Passengers, we apologize, but we have experienced an engine burn-out. The plane can still fly on the remaining three engines, but we'll be delayed in our arrival by two hours."
Nasrudin speaks up "let's not worry, what's 2 hours really"
A few minutes later, the airplane shakes, and passengers see smoke coming out of another engine. Again, the intercom crackles to life.
"This is your captain speaking. Apologies, but due to a second engine burn-out, we'll be delayed by another two hours."
The passengers are agitated, but the Mulla once again tries to remains calm.
Suddenly, the third engine catches fire. Again, the pilot comes on the intercom and says, "I know you're all scared, but this is a very advanced aircraft, and it can safely fly on only a single engine. But we will be delayed by yet another two hours."
At this, Nasrudin shouts, "This is ridiculous! If one more engine goes, we'll be stuck up here all day"
> I'm sure people were saying that about commercial airline speeds in the 1970's too.
Or CPU frequencies in the 1990's. Also we spent quite a few decades at the end of the 19th century thinking that physics was finished.
I'm not sure that explaining it as an "S curve" is really the right metaphor either, though.
You get the "exponential" growth effect when there's a specific technology invented that "just needs to be applied", and the application tricks tend to fall out quickly. For sure generative AI is on that curve right now, with everyone big enough to afford a datacenter training models like there's no tomorrow and feeding a community of a million startups trying to deploy those models.
But nothing about this is modeled correctly as an "exponential", except in the somewhat trivial sense of "the community of innovators grows like a disease as everyone hops on board". Sure, the petri dish ends up saturated pretty quickly and growth levels off, but that's not really saying much about the problem.
Progress in information systems cannot be compared to progress in physical systems.
For starters, physical systems compete for limited resources and labor.
For another, progress in software vastly reduces the cost of improved designs. Whereas progress in physical systems can enable but still increase the cost of improved designs.
Finally, the underlying substrate of software is digital hardware, which has been improving in both capabilities and economics exponentially for almost 100 years.
Looking at information systems as far back as the first coordination of differentiating cells to human civilization is one of exponential improvement. Very slow, slow, fast, very fast. (Can even take this further, to first metabolic cycles, cells, multi-purpose genes, modular development genes, etc. Life is the reproduction of physical systems via information systems.)
Same with human technological information systems, from cave painting, writing, printing, telegraph, phone, internet, etc.
It would be VERY surprising if AI somehow managed to fall off the exponential information system growth path. Not industry level surprising, but "everything we know about how useful information compounds" level surprising.
> Looking at information systems as far back as the first coordination of differentiating cells to human civilization is one of exponential improvement.
Under what metric? Most of the things you mention don't have numerical values to plot on a curve. It's a vibe exponential, at best.
Life and humans have become better and better at extracting available resources and energy, but there's a clear limit to that (100%) and the distribution of these things in the universe is a given, not something we control. You don't run information systems off empty space.
Life has been on Earth about 3.5-3.8 billion years.
Break that into 0.5-0.8, 1 billion, 1 billion, 1 billion "quarters", and you will find exponential increases in evolutions rate of change and production of diversity across them by many many objective measures.
Now break up the last 1 billion into 100 million year segments. Again exponential.
Then break up the last 100 million into segments. Again.
Then the last 10 million years into segments, and watch humans progress.
The last million, in 100k year segments, watch modern humans appear.
the last 10k years into segments, watch agriculture, civilizations, technology, writing ...
The last 1000 years, incredible aggregation of technology, math, and the appearance of formal science
last 100 years, gets crazy. Information systems appear in labs, then become ubiquitous.
last 10 years, major changes, AI starts having mainstream impact
last 1 year - even the basic improvements to AI models in the last 12 months are an unprecedented level of change, per time, looking back.
I am not sure how any of could appear "vibe", given any historical and situational awareness.
This progression is universally recognized. Aside from creationists and similar contingents.
The progression is much less clear when you don't view it anthropocentrically. For instance, we see an explosion in intelligible information: information that is formatted in human language or human-made formats. But this is concomitant with a crash in natural spaces and biodiversity, and nothing we make is as information-rich as natural environments, so from a global perspective, what we have is actually an information crash. Or hell, take something like agriculture. Cultured environments are far, far simpler than wild ones. Again: an information crash.
I'm not saying anything about the future, mind you. Just that if we manage to stop sniffing our own farts for a damn second and look at it from the outside, current human civilization is a regression on several metrics. We didn't achieve dominion over nature by being more subtle or complex than it. We achieved that by smashing nature with a metaphorical club and building upon its ruins. Sure, it's impressive. But it's also brutish. Intelligence requires intelligible environments to function, and that is almost invariably done at the expense of complexity and diversity. Do not confuse success for sophistication.
> last 1 year - even the basic improvements to AI models in the last 12 months are an unprecedented level of change, per time, looking back.
Are they? What changed, exactly? What improvements in, say, standards of living? In the rate of resource exploitation? In energy efficiency? What delta in our dominion over Earth? I'll tell you what I think: I think we're making tremendous progress in simulating aspects of humanity that don't matter nearly as much as we think they do. The Internet, smartphones, AI, speak to our brains in an incredible way. Almost like it was by design. However, they matter far more to humans within humanity than they do in the relationship of humanity with the rest of the universe. Unlike, say, agriculture or coal, which positively defaced the planet. Could we leverage AI to unlock fusion energy or other things that actually matter, just so we can cook the rest of the Earth with it? Perhaps! But let's not count our chickens before they hatch. As of right now, in the grand scheme of things, AI doesn't matter. Except, of course, in the currency of vibes.
I am curious when you think we will run out of atoms to make information systems.
How many billions of years you think that might take.
Of all the things to be limited by, that doesn't seem like a near term issue. Just an asteroid or two alone will provide resources beyond our dreams. And space travel is improving at a very rapid rate.
In the meantime, in terms of efficiency of using Earth atoms for information processing, there is still a lot space at the "bottom", as Feynman said. Our crude systems are limited today by their power waste. Small energy efficient systems, and more efficient heat shedding, will enable full 3D chips ("cubes"?) and vastly higher density of packing those.
The known limit for information for physical systems per gram, is astronomical:
• Bremermann’s limit : 10^47 operations per second, per gram.
Other interesting limits:
• Margolus–Levitin bound - on quantum state evolution
• Landauer’s principle - Thermodynamic cost of erasing (overwriting) one bit.
• Bekenstein bound: Maximum storage by volume.
Life will go through many many singularities before we get anywhere near hard limits.
By physical systems, I meant systems whose purpose is to do physical work. Mechanical things. Gears. Struts.
Computer hardware is an information system. You are correct that it is has a physical component. But its power comes from its organization (information) not its mass, weight, etc.
Transistors get more powerful, not less, when made from less matter.
Information systems move from substrate to more efficient substrate. They are not their substrate.
They still depend on physical resources and labor. They’re made by people and machines. There’s never been more resources going into information systems than right now, and AI accelerated that greatly. Think of all the server farms being built next to power plants.
> The amount of both matter and labor per quantity of computing power is dropping exponentially. Right?
Right. The problem is the demand is increasing exponentially.
It’s not like when computers got 1000x more powerful we were able to get by with 1/1000x of them. Quite the opposite (or inverse, to be more precise).
Just to go back to my original point, I think drawing a comparison that physical systems compete for physical resources and implying information systems don’t is misleading at best. It’s especially obvious right now with all the competition for compute going on.
>[..] to first metabolic cycles, cells, multi-purpose genes, modular development genes, etc.
One example is when cells discovered energy production using mitochondria. Mitochondria add new capabilities to the cell, with (almost) no downside like: weight, temperature-sensitivity, pressure-sensitivity. It's almost 100% upside.
If someone tried to predict the future number of mitochondria-enabled cells from the first one, he could be off by 10^20 less cells.
I am writing a story the last 20 days, with that exact story plot, have to get my stuff together and finish it.
That's fallacious reasoning, you are extrapolating from survivorship bias. A lot of technologies, genes, or species have failed along the way. You are also subjectively attributing progression as improvements, which is problematic as well, if you speak about general trends. Evolution selects for adaptation not innovation. We use the theory of evolution to explain the emergence of complexity, but that's not the sole direction and there are many examples where species evolved towards simplicity (again).
Resource expense alone could be the end of AI. You may look up historic island populations, where technological demands (e.g. timber) usually led to extinction by resource exhaustion and consequent ecosystem collapse (e.g. deforestation leading to soil erosion).
Doesn't answer the core fallacy. Historical "technological progress" can't be used as argument for any particular technology. Right now, if we are talking about AI, we're talking about specific technologies, which may just as well fail and remain inconsequential in the grand scheme of things, like most technologies, most things really, did in the past. Even more so since we don't understand much anything in either human or artificial cognition. Again and again, we've been wrong about predicting the limits and challenges in computation.
You see, your argument is just bad. You are merely guessing like everyone else.
Information technology does not operate by the rules of any other technology. It is a technology of math and organization, not particular materials.
The unique value of information technology is that it compounds the value of other information and technology, including its own, and lowers the bar for its own further progress.
And we know with absolute certainty we have barely scratched the computing capacity of matter. Bremermann’s limit : 10^47 operations per second, per gram. See my other comment for other relevant limits.
Do you also expect a wall in mathematics?
And yes, an unbroken historical record of 4.5 billions years of information systems becoming more sophisticated with an exponential speed increase over time, is in fact a very strong argument. Changes that took a billion years initially, now happen in very short times in today's evolution, and essentially instantly in technological time. The path is long, with significant acceleration milestones at whatever scale of time you want to look at.
Your argument, on the other hand, is indistinguishable from cynical AI opinions going back decades. It could be made any time. Zero new insight. Zero predictive capacity.
Substantive negative arguments about AI progress have been made. See "Perceptrons" by Marvin Minksy and Seymour Papert, for an example of what a solid negative argument looks like. It delivered insights. It made some sense at the time.
> Your argument, on the other hand, is indistinguishable from cynical AI opinions going back decades. It could be made any time. Zero new insight. Zero predictive capacity.
> Historical "technological progress" can't be used as argument for any particular technology.
Historical for billions of years of natural information system evolution. Metabolic, RNA, DNA, protein networks, epigenetic, intracellular, intercellular, active membrane, nerve precursors, peptides, hormonal, neural, ganglion, nerve nets, brains.
Thousands of years of human information systems. Hundreds of years of technological information systems. Decades of digital information systems. Now in in just the last few years, progress year to year is unlike any seen before.
Significant innovations being reported virtually every day.
Yes track records carry weight. Especially with no good reason for any reason for a break, while every tangible reason to believe nothing is slowing down, right up to today.
"Past is not a predictor of future behavior" is about asset gains relative to asset prices in markets where predictable gains have had their profitability removed by the predictive pricing of others. A highly specific feedback situation making predicting asset gains less predictable even when companies do maintain strong predictable trends in fundamentals.
It is a narrow specific second order effect.
It is the worst possible argument for anything outside of those special conditions.
Every single thing you have ever learned was predicated on the past having strong predictive qualities.
You should understand what an argument means, before throwing it into contexts where its preconditions don't exist.
> Right now, if we are talking about AI, we're talking about specific technologies, which may just as well fail and remain inconsequential in the grand scheme of things, like most technologies, most things really, did in the past. Even more so since we don't understand much anything in either human or artificial cognition. Again and again, we've been wrong about predicting the limits and challenges in computation.
> Your argument [...] is indistinguishable from cynical AI opinions going back decades. It could be made any time. Zero new insight. Zero predictive capacity.
If I need to be clearer, nobody could know when you wrote that by reading it. It isn't an argument it's a free floating opinion. And you have not made it more relevant today, than it would have been all the decades up till now, through all the technological transitions up until now. Your opinion was equally "applicable", and no less wrong.
This is what "Zero new insight. Zero predictive capacity" refers to.
> Substantive negative arguments about AI progress have been made. See "Perceptrons" by Marvin Minksy and Seymour Papert, for an example of what a solid negative argument looks like. It delivered insights. It made some sense at the time.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
I'd argue all of them. Any true exponential eventually gets to a point where no computer can even store its numerical value. It's a physically absurd curve.
The narrative quietly assumes that this exponential curve can in fact continue since it will be the harbinger of the technological singularity. Seems more than a bit eschatological, but who knows.
If we suppose this tech rapture does happen, all bets are off; in that sense it's probably better to assume the curve is sigmoidal, since the alternative is literally beyond human comprehension.
Barring fully reversible processes as the basis for technology, you still quickly run into energy and cooling constraints. Even with that, you'd have time or energy density constraints. Unlimited exponentials are clearly unphysical.
Yes, this is an accurate description, and also completely irrelevant to the issue at hand.
At the stage of development we are today, no one cares how fast it takes for the exponent to go from eating our galaxy to eating the whole universe, or whether it'll break some energy density constraint before it and leave a gaping zero-point energy hole where our local cluster used to be.
It'll stop eventually. What we care about is whether it stops before it breaks everything for us, here on Earth. And that's not at all a given. Fundamental limits are irrelevant to us - it's like worrying that putting too many socks in a drawer will eventually make them collapse into a black hole. The limits that are relevant to us are much lower, set by technological, social and economic factors. It's much harder to say where those limits lay.
Sure, but it reminds us that we are dealing with an S-curve, so we need to ask where the inflection point is. i.e. what are the relevant constraints, and can they reasonably sustain exponential growth for a while still? At least as an outsider, it's not obvious to me whether we won't e.g. run into bandwidth or efficiency constraints that make scaling to larger models infeasible without reimagining the sorts of processors we're using. Perhaps we'll need to shift to analog computers or something to break through cooling problems, and if the machine cannot find the designs for the new paradigm it needs, it can't make those exponential self-improvements (until it matches its current performance within the new paradigm, it gets no benefit from design improvements it makes).
My experience is that "AI can write programs" is only true for the smallest tasks, and anything slightly nontrivial will leave it incapable of even getting started. It doesn't "often makes mistakes or goes in a wrong direction". I've never seen it go anywhere near the right direction for a nontrivial task.
That doesn't mean it won't have a large impact; as an autocomplete these things can be quite useful today. But when we have a more honest look at what it can do now, it's less obvious that we'll hit some kind of singularity before hitting a constraint.
I am getting the sense that the 2nd deriative of the curve is already hitting negative teritory. models get updated, and I don't feel I'm getting better answers from the LLMs.
On the application front though, it feels that the advancements from a couple of years ago are just beginning to trickle down to product space. I used to do some video editing as a hobby. Recently I picked it up again, and was blown away by how much AI has chipped away the repetitive stuff, and even made attempts at the more creative aspects of production, with mixed but promising results.
one example is auto generating subtitles -- elements of this tasks, e.g. speech to text with time coding, have been around for a while (openai whisper and others), but they have only recently been integrated into video editors and become easy to use for non-coders.
other examples: depth map (estimating object distance from the camera; this is useful when you want to blur the background), auto-generating masks with object tracking.
>> it would be extremely surprising if these improvements suddenly stopped.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
An S-curve is exactly the opposite of "suddenly" stopping.
It is possible for us to get a sudden stop, due to limiting factors.
For a hypothetical: if Moore's Law had continued until we hit atomic resolution instead of the slowdown as we got close to it, that would have been an example of a sudden stop: can't get transistors smaller than atoms, but yet it would have been possible (with arbitrarily large investments that we didn't have) to halve transistor sizes every 18 months until suddenly we can't.
Now I think about it, the speed of commercial airlines is also an example of a sudden stop: we had to solve sonic booms first before even considering a Concorde replacement.
And, maybe I'm missing something, but to me it seems obvious that flat top part of the S curve is going to be somewhere below human ability... because, as you say, of the training data. How on earth could we train an LLM to be smarter than us, when 100% of the material we use to teach it how to think, is human-style thinking?
Maybe if we do a good job, only a little bit below human ability -- and what an accomplishment that would still be!
But still -- that's a far cry from the ideas espoused in articles like this, where AI is just one or two years away from overtaking us.
The standard way to do this is Reinforcement Learning: we do not teach the model how to do the task, we let it discover the _how_ for itself and only grade it based on how well it did, then reinforce the attempts where it did well. This way the model can learn wildly superhuman performance, e.g. it's what we used to train AlphaGo and AlphaZero.
The cost of the next number in a GPT (3>4>5) seems to be in 2 ways:
1) $$$
2) data
The second (data) also isn't cheap. As it seems we've already gotten through all the 'cheap' data out there. So much so that synthetic data (fart huffing) is a big thing now. People tell it's real and useful and passes the glenn-horf theore... blah blah blah.
So it really more so comes down to just:
1) $$$^2 (but really pick any exponent)
In that, I'm not sure this thing is a true sigmoid curve (see: biology all the time). I think it's more a logarithmic cost here. In that, it never really goes away, but it gets really expensive to carry out for large N.
[To be clear, lots of great shit happens out there in large N. An AI god still may lurk in the long slow slope of $N, the cure for boredom too, or knowing why we yawn, etc.]
Yes. It's true that we don't know, with any certainty, (1) whether we are hitting limits to growth intrinsic to current hardware and software, (2) whether we will need new hardware or software breakthroughs to continue improving models, and (3) what the timing of any necessary breakthroughs, because innovation doesn't happen on a predictable schedule. There are unknown unknowns.[a]
However, there's no doubt that at a global scale, we're sure trying to maintain current rates of improvement in AI. I mean, the scale and breadth of global investment dedicated to improving AI, presently, is truly unprecedented. Whether all this investment is driven by FOMO or by foresight, is irrelevant. The underlying assumption in all cases is the same: We will figure out, somehow, how to overcome all known and unknown challenges along the way. I have no idea what the odds of success may be, but they're not zero. We sure live in interesting times!
"I'm sure people were saying that about commercial airline speeds in the 1970's too."
But there are others that keep going also. Moore's law is still going (mostly, slowing), and made it past a few pinch points where people thought it was the end.
The point is, that over 30 decades, many people said Moore's law was at an end, and then it wasn't, there was some breakthrough that kept it going. Maybe a new one will happen.
The thing with AI is, maybe the S curve flattens out , after all the jobs are gone.
Everyone is hoping the S curve flattens out somewhere just below human level, but what if it flattens out just beyond human level? We're still screwed.
Each specific technology can be S-shaped, but advancements in achieving goals can still maintain an exponential curve. e.g. Moore's law is dead with the end of Dennard scaling, but computation improvements still happen with parallelism.
Meta's Behemoth shows that scaling number of parameters has diminished returns, but we still have many different ways to continue advancements. Those who point at one thing and say "see", isn't really seeing. Of course there are limits, like energy but with nuclear energy or photon-based computing were nowhere near the limits.
Ironically, given that it probably mistakes a sigmoid curve for an exponential curve, "Failing to understand the exponential, again" is an extremely apt name for this blog post.
Infectious diseases rarely see actual exponential growth for logistical reasons. It's a pretty unrealistic model that ignores that the disease actually needs to find additional hosts to spread, the local availability of which starts to go down from the first victim.
If you assume the availability of hosts is local to the perimeter of the infected hosts, then the relative growth is limited to 2/R where R is the distance from patient 0 in 2 dimensions. It's becuase an area of the circle defines how many hosts are already ill but the interaction can only happen on the perimeter of the circle.
The disease is obviously also limited by the total amount of hosts, but I assume there's also the "bottom" limit - i.e. the resource consumption of already-infected hosts.
It also depends on how panicked people are. Covid was never going to spread like ebola, for instance: it was worse. Bad enough to harm and kill people, but not bad enough to scare them into self-enforced isolation and voluntary compliance with public health measures.
Back on the subject of AI, I think the flat part of the curve has always been in sight. Transformers can achieve human performance in some, even many respects, but they're like children who have to spend a million years in grade school to learn their multiplication tables. We will have to figure out why that is the case and how to improve upon it drastically before this stuff really starts to pay off. I'm sure we will but we'll be on a completely different S-shaped curve at that point.
Yes the model where the S curves comes out is extremely simplified. Looking at covid curves we could have well said it was parabolic, but that’s much less worrisome
It's obvious, but the problem was that enough people would die in the process for people to be worried. Similarly, if the current AI will be able to replace 99% of devs in 5-10 years (or even worse, most white collar jobs) and flatten out there without becoming a godlike AGI, it will still have enormous implications for the economy.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
S curves are exponential before they start tapering off though. It's hard to predict how long that could continue, so there's an argument to be made that we should remain optimistic and milk that while we can lest pessimism cut off investment top early.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
S curves are exponential before they start tapering off though. It's hard to predict how long that could continue, so there's an argument to be made that we should milk that while we can.
> I'm sure people were saying that about commercial airline speeds in the 1970's too.
They'd be wrong, of course - for not realizing demand is a limiting factor here. Airline speeds plateaued not because we couldn't make planes go faster anymore, but because no one wanted them to go faster.
This is partially economical and partially social factor - transit times are bucketed by what they enable people to do. It makes little difference if going from London to New York takes 8 hours instead of 12 - it's still in the "multi-day business trip" bucket (even 6 hours goes into that bucket, once you add airport overhead). Now, if you could drop that to 3 hours, like Concorde did[0], that finally moves it into "hop over for a meet, fly back the same day" bucket, and then business customers start paying attention[1].
For various technical, legal and social reasons, we didn't manage to cross that chasm before money for R&D dried out. Still, the trend continued anyway - in military aviation and, later, in supersonic missiles.
With AI, the demand is extreme and only growing, and it shows no sign of being structured into classes with large thresholds between them - in fact, models are improving faster than we're able to put them to any use; even if we suddenly hit a limit now and couldn't train even better models anymore, we have decades of improvements to extract just from learning how to properly apply the models we have. But there's no sign we're about to hit a wall with training any time soon.
Airline speeds are inherently a bad example for the argument you're making, but in general, I don't think pointing out S-curves is all that useful. As you correctly observe:
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
But, what happens when one technology - or rather, one metric of that technology - stops improving? Something else starts - another metric of that technology, or something built on top of it, or something that was enabled by it. The exponent is S-curves on top of S-curves, all the way down, but how long that exponent is depends on what you consider in scope. So, a matter of accounting. So yeah, AI progress can flatten tomorrow or continue exponentially for the next couple years - depending on how narrowly you define "AI progress".
[1] - This is why Elon Musk wasn't immediately laughed out of the room after proposing using Starship for moving people and cargo across the Earth, back in 2017. Hopping between cities on an ICBM sounds borderline absurd for many reasons, but it also promised cutting flight time to less than one hour between any two points on Earth, which put it a completely new bucket, even more interesting for businesses.
Yes, though "far" isn't so large as to be inconceivable: the city of Starbase is only 2.75 km from the Starship launch tower.
That kind of distance may or may not be OK for a whole bunch of other reasons, many of which I'm not even qualified to guess at the nature of, but the noise at least isn't an absolute issue for reasonable scale civil infrastructure isolated development in many places.
There’s a key way to think about a process that looks exponential and might or might not flatten out into an S curve: reasoning about fundamental limits. For COVID it would obviously flatten out because there are finite humans, and it did when the disease had in fact infected most humans on the planet. For commercial airlines you could reason about the speed of sound or escape velocity and see there is again a natural upper limit- although which of those two would dominate would have very different real world implications.
For computational intelligence, we have one clear example of an upper limit in a biological human brain. It only consumes about 25W and has much more intelligence than today’s LLMs in important ways. Maybe that’s the wrong limit? But Moore’s law has been holding for a very long time. And smart physicists like Feynman in his seminal lecture predicting nanotechnology in 1959 called “there’s plenty of room at the bottom” have been arguing that we are extremely far from running into any fundamental physical limits on the complexity of manufactured objects. The ability to manufacture them we presume is limited by ingenuity, which jokes aside shows no signs of running out.
Training data is a fine argument to consider. Especially since there are training on “the whole internet” sorta. The key breakthrough of transformers wasn’t in fact autoregressive token processing or attention or anything like that. It was that they can learn from (memorize / interpolate between / generalize) arbitrary quantities of training data. Before that every kind of ML model hit scaling limits pretty fast. Resnets got CNNs to millions of parameters but they still became quite difficult to train. Transformers train reliably on every size data set we have ever tried with no end in sight. The attention mechanism shortens the gradient path for extremely large numbers of parameters, completely changing the rules of what’s possible with large networks. But what about the data to feed them?
There are two possible counter arguments there. One is that humans don’t need exabytes of examples to learn the world. You might reasonably conclude from this that NNs have some fundamental difference vs people and that some hard barrier of ML science innovation lies in the way. Smart scientists like Yann LeCun would agree with you there. I can see the other side of that argument too - that once a system is capable of reasoning and learning it doesn’t need exhaustive examples to learn to generalize. I would argue that RL reasoning systems like GRPO or GSPO do exactly this - they let the system try lots of ways to approach a difficult problem until they figure out something that works. And then they cleverly find a gradient towards whatever technique had relative advantage. They don’t need infinite examples of the right answer. They just need a well chosen curriculum of difficult problems to think about for a long time. (Sounds a lot like school.) Sometimes it takes a very long time. But if you can set it up correctly it’s fairly automatic and isn’t limited by training data.
The other argument is what the Silicon Valley types call “self play” - the goal of having an LLM learn from itself or its peers through repeated games or thought experiments. This is how Alpha Go was trained, and big tech has been aggressively pursuing analogs for LLMs. This has not been a runaway success yet. But in the area of coding agents, arguably where AI is having the biggest economic impact right now, self play techniques are an important part of building both the training and evaluation sets. Important public benchmarks here start from human curated examples and algorithmically enhance them to much larger sizes and levels of complexity. I think I might have read about similar tricks in math problems but I’m not sure. Regardless it seems very likely that this has a way to overcome any fundamental limit on availability of training data as well, based on human ingenuity instead.
Also, if the top of the S curve is high enough, it doesn’t matter that it’s not truly exponential. The interesting stuff will happen before it flattens out. E.g. COVID. Consider the y axis “human jobs replaced by AI” instead of “smartness” and yes it’s obviously an S curve.
> For computational intelligence, we have one clear example of an upper limit in a biological human brain. It only consumes about 25W and has much more intelligence than today’s LLMs in important ways. Maybe that’s the wrong limit?
It's a good reference point, but I see no reason for it to be an upper limit - by the very nature of how biological evolution works, human brains are close to the worst possible brains advanced enough to start a technological revolution. We're the first brain on Earth that crossed that threshold, and in evolutionary timescales, all that followed - all human history - happened in an instant. Evolution didn't have time yet to iterate on our brain design.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
S curves are exponential before they start tapering off though. It's hard to predict how long that could continue, so there's an argument to be made that we should remain optimistic and milk that while we can lest pessimism cut off investment too early.
As they say, every exponential is a sigmoid in disguise. I think the exponential phase of growth for LLM architectures is drawing to a close, and fundamentally new architectures will be necessary for meaningful advances.
I'm also not convinced by the graphs in this article. OpenAI is notoriously deceptive with their graphs, and as Gary Marcus has already noted, that METR study comes with a lot of caveats: [https://garymarcus.substack.com/p/the-latest-ai-scaling-grap...]
Exponential curves don't last for long fortunately, or the universe would have turned into a quark soup. The example of COVID is especially ironic, considering it stopped being a real concern within 3 years of its advent despite the exponential growth in the early years.
Those who understand exponentials should also try to understand stock and flow.
Reminds me a bit of the "ultraviolet catastrophe".
> The ultraviolet catastrophe, also called the Rayleigh–Jeans catastrophe, was the prediction of late 19th century and early 20th century classical physics that an ideal black body at thermal equilibrium would emit an unbounded quantity of energy as wavelength decreased into the ultraviolet range.
[...]
> The phrase refers to the fact that the empirically derived Rayleigh–Jeans law, which accurately predicted experimental results at large wavelengths, failed to do so for short wavelengths.
Right. Nobody believed that the intensity would go to infinity. What they believed was that the theory was incomplete, but they didn't know how or why. And the solution required inventing a completely new theory.
Exponentials exist in their environment. Didn't Covid stop because we ran out of people to infect. Of course it can't keep going exponential, because there aren't exponential people to infect.
What is this limit on AI? It is technology, energy, something. All these things can be over-come, to keep the exponential going.
And of course, systems also break at the exponential. Maybe AI is stopped by the world economy collapsing. AI advancement would be stopped, but that is cold comfort to the humans.
Data. Think of our LLMs like bacteria in a Petri dish. When first introduced, they achieve exponential growth by rapidly consuming the dish's growth medium. Once the medium is consumed, growth slows and then stops.
The corpus of information on the Internet, produced over several decades, is the LLM's growth medium. And we're not producing new growth medium at an exponential rate.
> What is this limit on AI? It is technology, energy, something. All these things can be over-come, to keep the exponential going.
That's kind of begging the question. Obviously if all the limitations on AI can be overcome growth would be exponential. Even the biggest ai skeptic would agree. The question is, will it?
Long COVID is still a thing, the nAbs immunity is pretty paltry because the virus keeps changing its immunity profile so much. T-cells help but also damage the host because of how COVID overstimulates them. A big reason people aren't dying like they used to is because of the government's strategy of constant infection which boosts immunity regularly* while damaging people each time, that plus how Omicron changed SARS-CoV-2's cell entry mechanism to avoid cell-cell fusion (syncytia) that caused huge over-reaction in lung tissue.
It's possible to understand both exponential and limiting behavior at the same time. I work in an office full of scientists. Our team scrammed the workplace on March 10, 2020.
To the scientists, it was intuitively obvious that the curve could not surpass 100% of the population. An exponential curve with no turning point is almost always seen as a sure sign that something is wrong with your model. But we didn't have a clue as to the actual limit, and any putative limit below 100% would need a justification, which we didn't have, or some dramatic change to the fundamental conditions, which we couldn't guess.
The typical practice is to watch the curve for any sign of a departure from exponential behavior, and then say: "I told you so." ;-)
The first change may have been social isolation. In fact that was pretty much the only arrow in our quivers. The second change was the vaccine, which changed both the infection rate and the mortality rate, dramatically.
I'm curious as to whether the consensus is that the observed behaviour of COVID waves was ever fully and satisfactorily explained - the tend to grow exponentially but then seemingly saturate at a much lower point than a naïve look at the curve might suggest?
To those interested in numbers it was explained early - even on TV. Anyone interested saw that it was going like a seasonal flue wave. Numbers were following strict mathematics. My area was early - the numbers peaked right before people started to go crazy - the rest was censorship - There was a lot of fakery going on by using very soft numbers. Very often they used reporting date instead of infection date.. and some numbers were delayed 9 months... So most curves out there were seriously flawed. But if you were really interested you could see real epidemiological curves - but you had to do real work to find the numbers. Strict mathematics of a seasonal virus was something people didn't want to see - and this is still the consensus...
Well, the shapes look very seasonal... Do you know something about epidemiological curves?!
The wave 2020 in Europe was often smaller than 2018. And the data was perfectly seasonal. If you know people working in nursing homes and hospitals, you can ask them what happened later in 2021...
I heard a lot of stories - from first hand... They parked old ladies in the cold in front of open windows for fresh air - until they were blue... They vaccinated old people right into an ongoing wave and of course they had more problems caused from a wrongly trained vulnerable immune system - sane doctors don't vaccinate into an ongoing wave. What was going on in hospitals and nursing homes was a crime for money. Just ask the people that were there. A combat medic I know that now works in a hospital called 2021 a crime.
And still - solid Epidemiological data - wherever you could find it - was still perfectly seasonal. You could see some perfect mathematical curves. Just very high because they actively killed people. Even pupils in school spent all day in front of open windows in the cold... To remain healthy... How stupid is that...
Not all places are equal, but I've taken a look at German all cause mortality. 2020 was not special. In 2021 it started rising synchronous with vaccinations.
This repeatedly confuses correlation and causation. The shape is seasonal - of what relevance is the shape? Why shouldn't we expect there to be a seasonal component of an airborne virus?
Similarly, to say "deaths increased when vaccines happened" is the most clear illustration. Why did the vaccines exist? Could that be related to the mortality increase? You can see charts here for Switzerland, US, UK: https://science.feedback.org/review/misleading-instagram-pos...
The shape is relevant if you want to evaluate measures.
If you can get your hands on some good data you'll find perfect mathematical seasonal functions. This is a serious criterion to exclude any measures from having any influence on the curve. It was just the seasonal thing happening. The data proves that measures were all useless - you could have worn any fancy hat for government measures instead. There are no trend changes in seasonal data you can corrolate to measures. The only trend changes you can find are in the reporting data. There's a decrease in reporting delay before a measure and there's a lot of reporting delay after the measure. Accidentally or intentionally reporting delay tried to make government measure look good.
For vaccines I know 3 cases where people died and 2 who have serious health problems after vaccines. There is a reason, why there's no good official data on vaccine efficency - and why all placebo groups were killed as soon as possible.
Why did vaccines exists? The answer is simpler: Because of Money!
It would probably be hard to do. The really huge factor may be easier to study, since we know where and when every vaccine dose was administered. The behavioral factors are likely to be harder to measure, and would have been masked by the larger effect of vaccination. We don't really know the extent of social isolation over geography, demographics, time, etc..
There's human behavioural factors yes, but I was kinda wondering about the virus itself, the R number seemed to fluctuate quite a bit, with waves peaking fast and early and then receding equally quickly.. I know there were some ideas around asymptomatic spread and superspreaders (both people with highly connected social graphs, and people shedding far more active virus than the median), I just wondered whether anyone had built a model that was considered to have accurately reproduced the observed behaviour of number of positive tests and symptomatic cases, and the way waves would seemingly saturate after infecting a few % of the population.
> By the end of 2027, models will frequently outperform experts on many tasks.
In passing the quiz-es
> Models will be able to autonomously work for full days (8 working hours) by mid-2026.
Who will carry responsibility for the consequences of these model's errors? What tools will be avaiable to that resposible _person_?
--
Tehchno optimists will be optimistic. Techno pessimists will be pessimistic.
Processes we're discussing have their own limiting factors which no one mentiones. Why to mention what exactly makes graph go up and holds it from going exponential? Why to mention or discuss inherit limitations of the LLMs architecture? Or what is legal perspective on AI agency?
Thus we're discussing results of AI models passing tests and people's perception of other people opinions.
You don't actually need to have a "responsible person"; you can just have an AI do stuff. It might make a mistake; the only difference between that and an employee is that you can't punish an AI. If you're any good at management and not a psychopath, the ability to have someone to punish for mistakes isn't actually important
The importance of having a human be responsible is about alignment. We have a fundamental belief that human beings are comprehensible and have goals that are not completely opaque. That is not true of any piece of software. In the case of deterministic software, you can’t argue with a bug. It doesn’t matter how many times you tell it that no, that’s not what either the company or the user intended, the result will be the same.
With an AI, the problem is more subtle. The AI may absolutely be able to understand what you’re saying, and may not care at all, because its goals are not your goals, and you can’t tell what its goals are. Having a human be responsible bypasses that. The point is not to punish the AI, the point is to have a hope to stop it from doing things that are harmful.
I will worry when I see Startups competing on products with companies 10x, 100x, or 1000x times their size. Like a small team producing a Photoshop replacement. So far I haven't seen anything like that. Big companies don't seem to be launching new products faster either, or fixing some of their products that have been broken for a long time (MS teams...)
AI obviously makes some easy things much faster, maybe helps with boilerplate, we still have to see this translate into real productivity.
I think the real turning point is when there isn’t the need for something like photoshop. Creatives that I speak to yearn for the day when they can stop paying the adobe tax.
There will always be an adobe tax so to speak. Creatives want high quality and reliable tools to be able to produce high quality things.
I could imagine a world where a small team + AI creates an open source tool that is better than current day Photoshop. However if that small team has that power, so does adobe, and what we perceive as "good" or "high quality" will shift.
Exponential curves happen when a quantity's growth rate is a linear function of its own value. In practice they're all going to be logistic, but you can ignore that as long as you're far away from the cap of whatever factor limits growth.
So what are the things that could cause "AI growth" (for some suitable definition of it) to be correlated with AI?
The plausible ones I see are:
- growing AI capabilities spur additional AI capex
- AI could be used to develop better AIs
The first one rings true, but is most definitely hitting the limit since US capex into the sector definitely cannot grow 100-fold (and probably cannot grow 4-fold either).
The second one is, to my knowledge, not really a thing.
So unless AI can start improving itself or there is a self-feeding mechanism that I have missed, we're near the logistic fun phase.
It's interesting that he brings up the example of "exponential" growth in the case of COVID infections even though it was actually logistic growth[1] that saturates once resources get exhausted. What makes AI different?
This reminds me -- very tenuously -- of how the shorthand for very good performance in the Python community is "like C". In the C community, we know that programs have different performance depending on algorithms chosen..
> In the C community, we know that programs have different performance depending on algorithms chosen..
Yes. Only the C community knows this. What a silly remark.
Regarding the "Python community" remark, benchmarks against C and Fortran go back decades now. It's not just a Python thing. C people push it a lot, too.
Nah, that part is ok. Human wherever you set it, human competence takes decades to really change, and those things have visible changes ever year or so.
The problem with all of the article's metrics is that they are all absolutely bullshit. It just throws claims like that AI can write full programs 50% of the time by itself in there and moves on like if it had any resemblance to what happens on the real world.
A lot of this post relies on the recent open ai result they call GDPval (link below). They note some limitations (lack of iteration in the tasks and others) which are key complaints and possibly fundamental limitations of current models.
But more interesting is the 50% win rate stat that represents expert human performance in the paper.
That seems absurdly low, most employees don’t have a 50% success rate on self contained tasks that take ~1 day of work. That means at least one of a few things could be true:
1. The tasks aren’t defined in a way that makes real world sense
2. The tasks require iteration, which wasn’t tested, for real world success (as many tasks do)
I think while interesting and a very worthy research avenue, this paper is only the first in a still early area of understanding how AI will affect with the real world, and it’s hard to project well from this one paper.
That's not 50% success rate at completing the task, that's the win rate of a head-to-head comparison of an algorithm and an expert. 50% means the expert and the algorithm each "win" half the time.
For the METR rating (first half of the article), it is indeed 50% success rate at completing the task. The win rate only applies to the GDPval rating (second half of the article).
You'd think that boosters for a technology whose very foundations rely on the sigmoid and tanh functions used as neuron activation functions would intuitively get this...
When people want a smooth function so they can do calculus they often use something like gelu or the swish function rather than relu. And the swish function involves a sigmoid. https://en.wikipedia.org/wiki/Swish_function
Julian Schrittwieser (author of this post) has been in AI for a long time, he was in the core team who worked on AlphaGo, AlphaZero and MuZero at DeepMind, you can see him in the AlphaGo movie. While it doesn't make his opinion automatically true, I think it makes it worth considering, especially since he's a technical person, not a CEO trying to raise money
"extrapolating an exponential" seems dubious, but I think the point is more that there is no clear sign of slowing down in models capabilities from the benchmarks, so we can still expect improvements
Benchmarks are notoriously easy to fake. Also he doesn’t need to be a CEO trying to raise money in order to have an incentive here to push this agenda / narrative. He has a huge stock grant from Anthropic that will go to $0 when the bubble pops
"Models will be able to autonomously work for full days (8 working hours)" does not make them equivalent to a human employee. My employees go home and come back retaining context from the previous day; they get smarter every month. With Claude Code I have to reset the context between bite-sized tasks.
To replace humans in my workplace, LLMs need some equivalent of neuroplasticity. Maybe it's possible, but it would require some sort of shift in the approach that may or may not be coming.
Maybe when we get updating models. Right now, they are trained, and released, and we are using that static model with a context window. At some point when we have enough processing to have models that are always updating, then that would be plastic. I'm supposing.
I am flabbergasted by the naivety around predicting the future. While we have hints and suggestions, our predictions are best expressed as ranges of possibilities with varying weights. The hyperbolic among us like to pretend that predictions come in the form of precise lines of predetermined direction and curve; how foolish!
Predicting exponential growth is exceptionally difficult. Asymptotes are ordinary, and they often are not obvious until circumstances make them appear (in other words, they are commonly unpredictable).
(I do agree with the author regarding the potential of LLM's remaining underestimated by much of the public, however I cannot hang around such abysmal reasoning.)
> I am flabbergasted by the naivety around predicting the future. While we have hints and suggestions, our predictions are best expressed as ranges of possibilities with varying weights. The hyperbolic among us like to pretend that predictions come in the form of precise lines of predetermined direction and curve; how foolish!
I dont see why the latter is any more foolish than the former.
It’s a matter of correctness and utility. You can improve your odds of correctness (and thus usefulness) by adjusting the scope of your projection.
This applies not only to predicting the future. Consider measuring something: you carefully choose your level of precision for practical reasons. Consider goal setting: you leave abundant room for variation because your goal is not expressed in hyper narrow terms, but you don’t leave it so loose that you don’t know what steps to take.
When expressed in sufficiently narrow terms, no one will ever predict anything. When expressed in sufficiently broad terms, everyone can predict everything. So the point is to modulate the scope until attaining utility.
> When just a few years ago, having AI do these things was complete science fiction!
This is only because these projects only became consumer facing fairly recently. There was a lot of incremental progress in the academic language model space leading up to this. It wasn't as sudden as this makes it sound.
The deeper issue is that this future-looking analysis goes no deeper than drawing a line connecting a few points. COVID is a really interesting comparison, because in epidemiology the exponential model comes from us understanding disease transmission. It is also not actually exponential, as the population becomes saturated the transmission rate slows (it is worth noting that unbounded exponential growth doesn't really seem to exist in nature). Drawing an exponential line like this doesn't really add anything interesting. When you do a regression you need to pick the model that best represents your system.
This is made even worse because this uses benchmarks and coming up with good benchmarks is actually an important part of the AI problem. AI is really good at improving things we can measure so it makes total sense that it will crush any benchmark we throw at it eventually, but there will always be some difference between benchmarks and reality. I would argue that as you are trying to benchmark more subtle things it becomes much harder to make a benchmark. This is just a conjecture on my end but if something like this is possible it means you need to rule it out when modeling AI progress.
There are also economic incentives to always declare percent increases in progress at a regular schedule.
Will AI ever get this advanced? Maybe, maybe even as fast as the author says, but this just isn't a compelling case for it.
Aside from the S-versus-exp issue, this area is one of these things where there's a kind of disconnect between my personal professional experience with LLMs and the criteria measures he's talking about. LLMs to me have this kind of superficially impressive feel where it seems impressive in its capabilities, but where, when it fails, it fails dramatically, in a way humans never would, and it never gets anywhere near what's necessary to actually be helpful on finishing tasks, beyond being some kind of gestalt template or prototype.
I feel as if there needs to be a lot more scrutiny on the types of evaluation tasks being provided — whether they are actually representative of real-world demands, or if they are making them easy to look good, and also more focus on the types of failures. Looking through some of the evaluation tasks he links to I'm more familiar with, they seem kind of basic? So not achieving parity with human performance is more significant than it seems. I also wonder, in some kind of maxmin sense, whether we need to start focusing more on worst-case failure performance rather than best-case goal performance.
LLMs are really amazing in some sense, and maybe this essay makes some points that are important to keep in mind as possibilities, but my general impression after reading it is it's kind of missing the core substance of AI bubble claims at the moment.
Wow, an exponential trendline, I guess billions of years of evolution can just give up and go home cause we have rigged the game my friends. At this rate we will create an AI which can do a task 10 years long! And then soon after that 100 years long! And that's that. Humans will be kept as pets because that's all we will be good for QED
The 50% success rate is the problem. It means you can’t reliably automate tasks unattended. That seems to be where it becomes non-exponential. It’s like having cars that go twice as far as the last year but will only get you to your destination 50% of the time.
> It’s like having cars that go twice as far as the last year but will only get you to your destination 50% of the time
Nice analogy. All human progress is based on tight-abstractions describing a well-defined machine model. Leaky abstractions with an undefined machine are useful too but only as recommendations or for communication. It is harder to build on top of them. Precisely why programming in english is a non-starter - or - just using english in math/science instead of formalism.
I think the author of this blog is not a heavy user of AI in real life. If you are, you know there things AI is very good at, and thing AI is bad at. AI may see exponential improvements in some aspects, but not in other aspects. In the end, those "laggard" aspects of AI will put a ceiling on its real-world performance.
I use AI in my coding for many hours each day. AI is great. But AI will not replace me in 2026 or in 2027. I have to admit I can't make projections many years in the future, because the pace of progress in AI is indeed breathtaking. But, while I am really bullish on AI, I am skeptical of claims that AI will be able to fully replace a human any time soon.
I am an amateur programmer and tried to port a python 2.7 library to python 3 with GPT5 a few weeks ago.
After a few tries, I realized both myself and the model missed that a large part of the library is based on another library that was never ported to 3 either.
That doesn't stop GPT5 from trying to write the code as best it can with a library that doesn't exist for python 3.
That is the part we have made absolutely no progress on.
Of course, it can do a much better react crud app than in Sept 2023.
In one sense, LLMs are so amazing and impressive and quite fugazi in another sense.
>they somehow jump to the conclusion that AI will never be able to do these tasks at human level
I don’t see that, I mostly see AI criticism that it’s not up to the hype, today. I think most people know it will approach human ability, we just don’t believe the hype that it will be here tomorrow.
I’ve lived through enough AI winter in the past to know that the problem is hard, progress is real and steady, but we could see a big contraction in AI spending in a few years if the bets don’t pay off well in the near term.
The money going into AI right now is huge, but it carries real risks because people want returns on that investment soon, not down the road eventually.
> Instead, even a relatively conservative extrapolation of these trends suggests that 2026 will be a pivotal year for the widespread integration of AI into the economy
Integration into the economy takes time and investment. Unfortunately, ai applications dont have an easy adoption curve - except for the chatbot. Every other use case requires an expensive and risky integration into an existing workflow.
> By the end of 2027, models will frequently outperform experts on many tasks
fixed tasks like tests - maybe. But, the real world is not a fixed model. It requires constant learning through feedback.
Many of the "people don't understand Exponential functions" posts are ultimately about people not understanding logistic functions. Because most things in reality that seemingly grow exponentially will eventually, unevitably taper off at some point when the cost for continued growth gets so high, accelerated growth can't be supported anymore.
Viruses can only infect so many people for example. If the growth was truly exponential you would need infinite people for it to be truly exponential.
> Again we can observe a similar trend, with the latest GPT-5 already astonishingly close to human performance:
Yes but only if you measure "performance" as "better than the other option more than 50% of the time" which is a terrible way to measure performance, especially for bullshitting AI.
Imagine comparing chocolate brands. One is tastier than the other one 60% of the time. Clear winner right? Yeah except it's also deadly poisonous 5% of the time. Still tastier on average though!
> Instead, even a relatively conservative extrapolation of these trends suggests that 2026 will be a pivotal year for the widespread integration of AI into the economy:
> Models will be able to autonomously work for full days (8 working hours) by mid-2026.
At least one model will match the performance of human experts across many industries before the end of 2026.
> By the end of 2027, models will frequently outperform experts on many tasks.
First commandment of tech hype: the pivotal, groundbreaking singularity is always just 1-2 years away.
I mean seriously, why is that? Even when people like OP try to be principled and use seemingly objective evaluation data, they find that the BIG big thing is 1-2 years away.
Self driving cars? 1-2 years away.
AR glasses replacing phones? 1-2 years away.
All of us living our life in the metaverse? 1-2 years away.
Again, I have to commend OP on putting in the work with the serious graphs, but there’s something more at play here.
Is it purely a matter of data cherry picking? Is it the unknowns unknowns leading to the data driven approaches being completely blind to their medium/long term limitations?
Many people seem to assert that "constant relative growth in capabilities/sales/whatever" is a totally reasonable (or even obvious or inevitable) prior assumption, and then point to "OMG relative growth produces an exponential curve!" as the rest of their argument. And at least the AI 2027 people tried to one-up that by asserting an increasing relative growth rate to produce a superexponential curve.
I'd be a fool to say that we'll ever hit a hard plateau in AI capabilities, but I'll have a hard time believing any projected exponential-growth-to-infinity until I see it with my own eyes.
Self driving cars have existed for at least a year now. It only took a decade of “1 years away” but it exists now, and will likely require another decade of scaling up the hardware.
I think AGI is going to follow a similar trend. A decade of being “1 years away”. Meanwhile, unlike self driving the industry is preemptively solving the scaling up of hardware concurrently.
Because I need to specify an amount of time short enough that big investors will hand over a lot of money, long enough that I can extract a big chunk of it for myself before it all comes crashing down.
A couple of years is probably a bit tight, really, but I'm competing for that cash with other people so the timeframe we make up is going to about the lowest we think we can get away with.
I feel like there should be some take away from the fact that we have to come up with new and interesting metrics like “Length of a Task That Can Be Automated” in order to point out that exponential growth is still happening. Fwiw, it does seem like a good metric, but it also feels like you can often find some metric that’s improving exponentially even when the base function is leveling out.
It's the only benchmark I know of with a well-behaved scale. Benchmarks with for example a score from 0-100% get saturated quite quickly, and further improvements on the metric are literally impossible. And even excluding saturation, they just behave very oddly at the extremes. To use them to show long term exponential growth you need to chain together benchmarks, which is hard to make look credible.
The sentiment of the comments here seems rather pessimistic. A perspective that balances both sides might be that the rate of mass adoption of some technology often lags behind the frontier capabilities, so I wouldn’t expect AI to take over a majority of those jobs in GPDval in a couple of years, but it’ll probably happen eventually.
There are still fundamental limitations in both the model and products using the model that restrict what AI is capable of, so it’s simultaneously true that AI can do cutting edge work in certain domains for hours while vastly underperforming in other domains for very small tasks. The trajectory of improvement of AI capabilities is also an unknown, where it’s easy to overestimate exponential trends due to unexpected issues arising but also easy to underestimate future innovations.
I don’t see the trajectory slowing down just yet with more compute and larger models being used, and I can imagine AI agents will increasingly give their data to further improve larger models.
This doesn't feel at all credible because we're already well into the sigmoid part of the curve. I thought the gpt5 thing made it pretty obvious to everyone.
I'm bullish on AI, I don't think we've even begun to understand the product implications, but the "large language models are in context learners" phase has for now basically played out.
AI company employee whose livelihood depends on people continuing to pump money into AI writes a blog post trying to convince people to keep pumping more money into AI. Seems solid.
The "exponential" metric/study they include is pretty atrocious. Measuring AI capability by how long humans would take to do the task. By that definition existing computers are already super AGI - how long would it take humans to sort a list of a million numbers? Computers can do it in a fraction of a second. I guess that proves they're already AGI, right? You could probably fit an exponential curve to that as well, before LLMs even existed.
> Given consistent trends of exponential performance improvements over many years and across many industries, it would be extremely surprising if these improvements suddenly stopped.
The difference between exponential and sigmoid is often a surprise to the believers, indeed.
Somewhat missed by many comments proclaiming that it’s sigmoidal is that sigmoid curves exhibit significant growth after it stops looking exponential. Unless you think things have already hit a dramatic wall you should probably assume further growth.
We should probably expect compute to get cheaper at the same time, so that’s performance increases with lowering costs. Even after things flatline for performance you would expect lowering costs of inference.
Without specific evidence it’s also unlikely you randomly pick the point on a sigmoid where things change.
To the people who claim that we’re running out of data, I would just say: the world is largely undigitized. The Internet digitized a bunch of words but not even a tiny fraction of all that humans express every day. Same goes for sound in general. CCTV captures a lot of images, far more than social media, but it is poorly processed and also just a fraction of the photons bouncing off objects on earth. The data part of this equation has room to grow.
There’s no exponential improvement in go or chess agents, or car driving agents. Even tiny mouse racing.
If there is, it would be such nice low hanging fruit.
Maybe all of that happens all at once.
I’d just be honest and say most of it is completely fuzzy tinkering disguised as intellectual activity (yes, some of it is actual intellectual activity and yes we should continue tinkering)
There are rare individuals that spent decades building up good intuition and even that does not help much.
This extrapolates based on a good set of data points to predict when AI will reach significant milestones like being able to “work on tasks for a full 8 hours” (estimates by 2026). Which is ok - but it bears keeping https://xkcd.com/605/ in mind when doing extrapolation.
On top of other criticism here, I'd like to add that the article optimistically assumes that actors are completely honest with their benchmarks when billions of dollars and national security are at stake.
I'm only an "expert" in computer science and software engineering, and can say that
- neither of widely available LLMs can produce answers at the level of first year CS student;
- students using LLMs can easily be distingished by being wrong in all the ways a human would otherwise never be.
So to me it's not really the question of whether CS-related benchmarks are false, it's a question of how exactly did this BS even fly.
Obviously in other disciplines LLMs show similar lack of performance, but I can't call myself an "expert" there, and someone might argue I tend to use wrong prompts.
Until we see a website where we can put an intermediate problem and get a working solution, "benchmarks show that our AI solves problems on gold medalist level" will still be an obvious BS.
So the author is in a clear conflict of interest with the contents of the blog because he's an employee of Anthropic. But regarding this "blog", showing the graph where OpenAI compares "frontier" models and shows gpt-4o vs o3-high is just disingenuous, o1 vs o3 would have been a closer fight between "frontier" models. Also today I learned that there are people paid to benchmark AI models in terms of how close they are to "human" level, apparently even "expert" level whatever that means. I'm not a LLM hater by any means, but I can confidently say that they aren't experts in any fields.
117 comments so far, and the word economics does not appear.
Any technology which produces more results for more inputs but does not get more efficient at larger scale runs into a money problem if it does not get hit by a physics problem first.
It is quite possible that we have already hit the money problem.
Even if the computational power evolve exponentially, we need to evaluate the utility of additional computations. And if the utility happens to increase logarithmically with computation spend, it's possible that in the end, we will observe just a linear increase in utility.
I don't think I have ever seen a page on HN where so many people missed the main point.
The phenomenon of people having trouble understanding the implications of exponential progress is really well known. Well known, I think, by many people here.
And yet an alarming number of comments here interpret small pauses as serious trend breakers. False assumptions that we are anywhere near the limits of computing power relative to fundamental physics limits. Etc.
Recent progress, which is unprecedented in speed looking backward, is dismissed because people have acclimatized to change so quickly.
The title of the article "Failing to Understand the Exponential, Again" is far more apt than I could have imagined, on HN.
See my other comments here for specific arguments. See lots of comments here for examples of those who are skeptical of a strong inevitability here.
The "information revolution" started the first time design information was separated from the thing it could construct. I.e. the first DNA or perhaps RNA life. And it has unrelentingly accelerated from there for over 4.5 billion years.
The known physics limits of computation per gram are astronomical. We are nowhere near any hard limit. And that is before any speculation of what could be done with the components of spacetime fragments we don't understand yet. Or physics beyond that.
The information revolution has hardly begun.
With all humor, this was the last place I expected people to not understand how different information technology progresses vs. any other kind. Or to revert to linear based arguments, in an exponentially relevant situation.
If there is any S-curve for information technology in general, it won't be apparent until long after humans are a distant memory.
I'm a little surprised too. A lot of the arguments are along the lines of but LLMs aren't very good. But really LLMs are a brief phase in the information revolution you mention that will be superseded.
To me saying we won't get AGI because LLMs aren't suitable is like saying we were not going to get powered flight because steam engines weren't suitable. Fair enough they weren't but they got modified into internal combustion engines and then were. Something like that will happen.
It should be noted that the article author is an AI researcher at Anthropic and therefore benefits financially from the bubble: https://www.julian.ac/about/
> The current discourse around AI progress and a supposed “bubble” reminds me a lot of the early weeks of the Covid-19 pandemic. Long after the timing and scale of the coming global pandemic was obvious from extrapolating the exponential trends, politicians, journalists and most public commentators kept treating it as a remote possibility or a localized phenomenon.
That's not what I remember. On the contrary, I remember widespread panic. (For some reason, people thought the world was going to run out of toilet paper, which became a self-fulfilling prophesy.) Of course some people were in denial, especially some politicians, though that had everything to do with politics and nothing to do with math and science.
In any case, the public spread of infectious diseases is a relatively well understood phenomenon. I don't see the analogy with some new tech, although the public spread of hype is also a relatively well understood phenomenon.
I think the first comment on the article put it best: With COVID, researchers could be certain that exponential growth was taking place because they knew the underlying mechanisms of the growth. The virus was self-replicating, so the more people were already infected, the faster would new infections happen.
(Even this dynamic would only go on for a certain time and eventual slow down, forming an S-curve, when the virus could not find any more vulnerable persons to continue the rate of spread. The critical question was of course if this would happen because everyone was vaccinated or isolated enough to prevent infection - or because everyone was already infected or dead)
With AI, there is no such underlying mechanism. There is the dream of the "self-improving AI" where either humans can make use of the current-generation AI to develop the next-generation AI in a fraction of the time - or where the AI simply creates the next generation on its own.
If this dream were reality, it could be genuine exponential growth, but from all I know, it isn't. Coding agents speed up a number of bespoke programming tasks, but they do not exponentially speed up development of new AI models. Yes, we can now quickly generate large corpora of synthetic training data and use them for distillation. We couldn't do that before - but a large part of the training data discussion is about the observation that synthetic data can not replace real data, so data collection remains a bottleneck.
There is one point where a feedback loop does happen, and this is with the hype curve: Initial models produced extremely impressive results compared to everything we had before - there caused an enormous hype and unlocked investments that allowed more resources for the developed of the next model - which then delivered even better results. But it's obvious that this kind of feedback loop will eventually end when no more additional capital is available and diminishing returns set in.
Then we will once again be in the upper part of the S-curve.
> - Models will be able to autonomously work for full days (8 working hours) by mid-2026.
> - At least one model will match the performance of human experts across many industries before the end of 2026.
> - By the end of 2027, models will frequently outperform experts on many tasks.
I’ve seen a lot of people make predictions like this and it will be interesting to see how this turns out. But my question is, what should happen to a person’s credibility if their prediction turns out to be wrong? Should the person lose credibility for future predictions and we no longer take them seriously? Or is that way too harsh? Should there be reputational consequences for making bad predictions? I guess this more of a general question, not strictly AI-related.
> Should the person lose credibility for future predictions and we no longer take them seriously
If this were the case, almost every sell-side analyst should have been blacklisted by now. Its more about entertainment than facts - sort of like astrology.
Another 'numbrr go up' analyst. Yes, models are objectively better at tasks. Please include the fact that hundreds of billions of dollars are being poured into making them better. You could even call it a technology race. Once the money avalanche runs it's course, I and many others expect 'the exponential' to be followed by an implosion or correction in growth. Data and training is not what LLMs crave. Piles of cash is what LLMs crave.
IMO this approach ultimately asks the wrong question. Every exponential trend in history has eventually flattened out. Every. single. one. Two rabbits would create a population with a mass greater than the Earth in a couple of years if that trend continues indefinitely. The left hand side of a sigmoid curve looks exactly like exponential growth to the naked eye... until it nears the inflection point at t=0. The two curves can't be distinguished when you only have noisy data from t<0.
A better question is, "When will the curve flatten out?" and that can only be addressed by looking outside the dataset for which constraints will eventually make growth impossible. For example, for Moore's law, we could examine as the quantum limits on how small a single transistor can be. You have to analyze the context, not just do the line fitting exercise.
The only really interesting question in the long term is if it will level off at a level near, below, or above human intelligence. It doesn't matter much if that takes five years or fifty. Simply looking at lines that are currently going up and extending them off the right side of the page doesn't really get us any closer to answering that. We have to look at the fundamental constraints of our understanding and algorithms, independent of hardware. For example, hallucinations may be unsolvable with the current approach and require a genuine paradigm shift to solve, and paradigm shifts don't show up on trend lines, more or less by definition.
I am constantly astonished that articles like this even pass the smell test. It is not rational to predict exponential growth just because you've seen exponential growth before! Incidentally, that is not what people did during COVID, they predicted exponential growth for reasons. Specific, articulable reasons, that consisted of more than just "look, like go up. line go up more?".
Incidentally, the benchmarks quoted are extremely dubious. They do not even really make sense. "The length of tasks AI can do is doubling every 7 months". Seriously, what does that mean? If the AI suddenly took double the time to answer the same question, that would not be progress. Indeed, that isn't what they did, they just... picked some times at random? You might counter that these are actually human completion times, but then why are we comparing such distinct and unrelated tasks as "count words in a passage" (trivial, any child can do) and "train adversarially robust image model" (expert-level task, could take anywhere between an hour and never-complete).
Honestly, the most hilarious line in the article is probably this one:
> You might object that this plot looks like it might be levelling off, but this is probably mostly an artefact of GPT-5 being very consumer-focused.
This is a plot with three points in it! You might as well be looking at tea leaves!
> but then why are we comparing such distinct and unrelated tasks as ...
Because a few years ago the LLMs could only do trivial tasks that a child could do, and now they're able to do complex research and software development tasks.
If you just have the trivial tasks, the benchmark is saturated within a year. If you just have the very complex tasks, the benchmark is has no sensitivity at all for years (just everything scoring a 0) and then abruptly becomes useful for a brief moment.
This seems pretty obvious, and I can't figure out what your actual concern is. You're just implying it is a flawed design without pointing out anything concrete.
The key word is "unrelated"! Being able to count the number of words in a paragraph and being able to train an image classifier are so different as to be unrelated for all practical purposes. The assumption underlying this kind of a "benchmark" is that all tasks have a certain attribute called complexity which is a numerical value we can use to discriminate tasks, presumably so that if you can complete tasks up to a certain "complexity" then you can complete all other tasks of lower complexity. No such attribute exists! I am sure there are "4 hour" tasks an LLM can do and "5 second" tasks that no LLM can do.
The underlying frustration here is that there is so much latitude possible in choosing which tasks to test, which ones to present, and how to quantify "success" that the metrics given are completely meaningless, and do not help anyone to make a prediction. I would bet my entire life savings that by the time the hype bubble bursts, we will still have 10 brainless articles per day coming out saying AGI is round the corner.
> The length of tasks AI can do is doubling every 7 months
The claim is "At time t0, an AI can solve a task that would take a human 2 minutes. At time t0+dt, they can solve 4-minutes tasks. At time t0+2dt, it's 8 minutes" and so on.
I still find these claims extremely dubious, just wanted to clarify.
Yes, I get that, I did allow for it in my original comment. I remain convinced this is a gibberish metric - there is probably no such thing as "a task that would take a human 2 minutes", and certainly no such thing as "an AI that can do every task that would take a human 2 minutes".
"It’s Difficult to Make Predictions, Especially About the Future" - Yogi Berra. It's funny because it's true.
So if you want to try to do this difficult task, because say there's billions of dollars and millions of people's livelihoods on the line, how do you do it? Gather a bunch of data, and see if there's some trend? Then maybe it makes sense to extrapolate. Seems pretty reasonable to me. Definitely passes the sniff test. Not sure why you think "line go up more" is such a stupid concept.
Isn't one of those scalers getting money from NVIDIA to buy NVIDIA cards which they use as collateral to borrow more money to buy more NVIDIA cards which NVIDIA put up as revenue which bumps the stock price which they invest into OpenAI which invests into Oracle which buys more NVIDIA cards?
Its not a Ponzi scheme, and I don't have a crystal ball to determine where supply and demand will best meet, but a lot seems to be riding on the promise of future demand selling at a premium.
I'm not yet ready to believe this is the thing that permanently breaks supply and demand. More compute demand is likely, but the state of the art: resale-users, providers, and suppliers, will all get hit with more competition.
The exponential progress argument is frequently also misconstrued as a
>"we will get there by monotonously doing more of what we did previously"
Take the independent time being an SWE metric of the article. This is a rather new( metric for measuring AI capabilitie, it's also a good metric, it is directly measurable in a quantified way, unlike nebulous goal points such as "AGI/ASI".
It also doesn't necessarily predict any upheaval, which I also think is a good trait of a metric, we know it will be better when it hits 8, or 16 hours, but we can skip the hype and prophecies of civilizational transformation that are attached to terminology like "AGI/ASI".
Now the caveat is that a SWE-time metric is useful at the moment because it's an intra day timescale, but if we push this number to the point of comparing 48 hour vs 54 hour SWE-time models we can easily end up chasing abstractions that have little to no explanatory power as to how good this AI really is and what consists as a proper and good incremental improvement and what comes out as a numerical benchmark number that may or may not be artificial.
The same can be said of math-olympiad scores and many of the existing AI benchmarks.
In the past there existed a concept of narrow AI. We could take task A, make a narrow AI become good at it. But we would expect a different application to be needed for task B.
Now we have generalist AI, and we take the generalist AI and make it become good at task A because that is the flavor of the month metric, but maybe that doesn't translate for improving task B, which someone will come around to improving when that becomes flavor of the month.
The conclusion? There's probably no good singular metric to get stuck on and say
"this is it, this graph is the one, watch it go exponential and bring forth God"
We will instead skip, hop and jump between task-or-category specific metrics that are deemed significant at the moment and arms-race style pump them up until their relevance fades.
It's funny because the author doesn't realize that this sentence at the beginning undermines his entire argument:
> Or they see two consecutive model releases and don’t notice much difference in their conversations, and they conclude that AI is plateauing and scaling is over.
The reason why we now fail to notice the difference between consecutive models now is because the progress isn't in fact exponential. Humans tend to have a logarithmic perception, which means we only appreciate progress when it is exponential (for instance you'd be very happy to get a $500 raise if you are living the minimum wage, but you wouldn't even call that “a raise” when on a SV engineers salary).
AI models have been improving a ton for the past three years, in many direction, but the rate of progress is definitely not exponential. It's not emergent either, as the focus is now being specifically directed at solving specific problems (both riddles and real world problems) thanks to trillions of token of high quality synthetic data.
On topics that aren't explicitly being worked on, progress have been minimal or even negative (for instance many people still use the 1 year old Mistral Nemo for creative writing because the more recent ones all have been STEMmaxxed)
This guy isn’t even wrong. Sure these models are getting faster, but they are barely getter better at actual reasoning, if at all. Who cares if a model can give me a bullshit answer in five minutes instead of ten? It’s still bullshit.
Seems like the right place to ask with ML enthusiasts gathered in one place discussing curves and the things that bend them: what's the thing with potential to obsolete transformers and diffusion models? Is it something old people noticed once LLMs blew up? Something new? Something in-between?
> The evaluation tasks are sourced from experienced industry professionals (avg. 14 years' experience), 30 tasks per occupation for a total of 1320 tasks. Grading is performed by blinded comparison of human and model-generated solutions, allowing for both clear preferences and ties.
It's important to carefully scrutinize the tasks to understand they actually reflect tasks that are unique to industry professionals. I just looked quickly at the nursing ones (my wife is a nurse) and half of them were creating presentations, drafting reports, and the like, which is the primary strength of LLMs but a very small portion of nursing duties.
The computer programming tests are more straightforward. I'd take the other ones with a grain of salt for now.
Ah, employee of an AI company is telling us the technology he's working on and is directly financially interested in hyping will... grow forever and be amazing and exponential and take over the world. And everyone who doesn't believe this employee of AI company hyping AI is WRONG about basics of math.
I absolutely would NOT ever expect such a blog post.
That is a complete strawman - you made up forever growth and then argued against it. The OP is saying the in the short term, it makes more sense to assume exponential growth continues instead of thinking it will flatten out any moment now.
Just because something exhibits an exponential growth at one point in time, that doesn’t mean that a particular subject is capable of sustaining exponential growth.
Their Covid example is a great counter argument to their point in that covid isn’t still growing exponentially.
Where the AI skeptics (or even just pragmatists, like myself) chime in is saying “yeah AI will improve. But LLMs are a limited technology that cannot fully bridge the gap between what they’re producing now, and what the “hypists” claim they’ll be able to do in the future.”
People like Sam Altman know ChatGPT is a million miles away from AGI. But their primary goal is to make money. So they have to convince VCs that their technology has a longer period of exponential growth than what it actually will have.
Author here.
The argument is not that it will keep growing exponentially forever (obviously that is physically impossible), rather that:
- given a sustained history of growth along a very predictable trajectory, the highest likelihood short term scenario is continued growth along the same trajectory. Sample a random point on an s-curve and look slightly to the right, what’s the most common direction the curve continues?
- exponential progress is very hard to visualize and see, it may appear to hardly make any progress while far away from human capabilities, then move from just below to far above human very quickly
So it's an argument impossible to counter because it's based on a hypothesis that is impossible to falsify: it predicts that there will either be a bit of progress, or a lot of progress, soon. Well, duh.
My point is that the limits of LLMs will be hit long before we they start to take on human capabilities.
The problem isn’t that exponential growth is hard to visualise. The problem is that LLMs, as advanced and useful a technique as it is, isn’t suited for AGI and thus will never get us even remotely to the stage of AGI.
The human like capabilities are really just smoke and mirrors.
It’s like when people anthropomorphisise their car; “she’s being temperamental today”. Except we know the car is not intelligence and it’s just a mechanical problem. Whereas it’s in the AI tech firms best interest to upsell the human-like characteristics of LLMs because that’s how they get VC money. And as we know, building and running models isn’t cheap.
My problem with takes like this is it presumes a level of understanding of intelligence in general that we simply do not have. We do not understand consciousness at all, much less consciousness that exhibits human intelligence. How are we to know what the exact conditions are that result in human-like intelligence? You’re assuming that there isn’t some emergent phenomenon that LLMs could very well achieve, but have not yet.
I'm not making a philosophical argument about what human-like intelligence is. I'm saying LLMs have many weaknesses that make in incapable of performing basic functions that humans take for granted. Like count and recall.
I go into much more detail here: https://news.ycombinator.com/item?id=45422808
Ostensibly, AGI might use LLMs in parts of it's subsystems. But the technology behind LLMs doesn't adapt to all of the problems that AGI would need to solve.
It's a little like how the human brain isn't just one homogeneous grey lump. There's different parts of the brain that specialize on different parts of cognitive processing.
LLMs might work for language processing, but that doesn't mean it would work for maths reasoning -- and in fact we already know it doesn't.
This is why we need tools / MCPs. We need ways of turning problems LLM cannot solve into standalone programs that LLMs can cheat and ask the answers for.
AI services are/will be going hybrid. Just like we have seen in search, with thousands of dedicated subsystems handling niches behind the single unified ui element or api call.
“Hybrid” is just another way of saying “AI isn’t good enough to work independently”. Which is the crux of my point.
>the limits of LLMs will be hit long before we they start to take on human capabilities.
Why do you think this? The rest of the comment is just rephrasing this point ("llms isn't suited for AGI"), but you don't seem to provide any argument.
Fair point.
Basically AGI describes human-like capabilities.
The problem with LLMs are that they’re, at their core, a token prediction model. Tokens, typically text, are given a numeric value and can then be used to predict what tokens should follow.
This makes them extremely good things like working with source code and other source of text where relationships are defined via semantics.
The problem with this is that it makes them very poor at dealing with:
1. Limited datasets. Smaller models are shown to be less powerful. So often LLMs need to inject significantly more information than a human would learn in their entire life time, just to approximate what that human might produce in any specific subject.
2. Learning new content. Here we have to rely on non-AI tooling like MCPs. This works really well under the current models because we can say “scrape these software development references” (etc) to keep itself up to date. But there’s no independence behind those actions. An MCP only works because it includes into the prompt how to use that MCP and why you should use that. Whereas if you look at humans, even babies know how to investigate and learn independently. Our ability to self-learn is one of the core principles of human intelligence.
3. Remember past content that resides outside of the original model training. I think this is actually a solvable problem in LLMs but there’s current behaviour of them is to bundle all the current interactions into the next prompt. In reality, the LLM hasn’t really remembered anything, you’re just reminding it about everything with each exchange. So each subsequent prompt gets longer and thus more fallible. It also means that context is always volatile. Basically it’s just a hack that only works because context sizes have grown exponentially. But if we want AGI then there needs to be a persistent way of retaining that context. There are some work around here, but they depend on tools.
4. any operation that isn’t semantic-driven. Things like maths, for example. LLMs have to call a tool (like MCPs) to perform calculations. But that requires having a non-AI function to return a result rather than the AI reason about maths. So it’s another hack. And there are a lot of domains that fall into this kind of category where complex tokenisation is simply not enough. This, I think, is going to be the biggest hurdle for LLMs.
5. Anything related to the physical world. We’ve all seen examples of computer vision models drawing too many fingers on a hand or have disembodied objects floating. The solutions here are to define what a hand should look like. But without an AI having access to a physical 3 dimensional world to explore, it’s all just guessing what things might look like. This is particularly hard for LLMs because they’re language models, not 3D coordinate systems.
There’s also the question about whether holding vector databases of token weights is the same thing as “reasoning”, but I’ll leave that argument for the philosophers.
I think a theoretical AGI might use LLMs as part of its subsystems. But it needs to leverage AI throughout, which LLMs cannot, as it needs handle topics that are more than just token relationships, which LLMs cannot do.
There is no particular reason why AI has to stick to language models though. Indeed if you want human like thinking you pretty much have to go beyond language as we do other stuff too if you see what I mean. A recent example: "Google DeepMind unveils its first “thinking” robotics AI" https://arstechnica.com/google/2025/09/google-deepmind-unvei...
> There is no particular reason why AI has to stick to language models though.
There’s no reason at all. But that’s not the technology that’s in the consumer space, growing exponentially, gaining all the current hype.
So at this point in time, it’s just a theoretical future that will happen inevitably but we don’t know when. It could be next year. It could be 10 years. It could be 100 years or more.
My prediction is that current AI tech plateaus long before any AGI-capable technology emerges.
Yeah, quite possible.
That's a rather poor choice for an example considering Gemini Robotics-ER is built on a tuned version of Gemini, which is itself an LLM. And while the action model is impressive, the actual "reasoning" here is still being handled by an LLM.
From the paper [0]:
> Gemini Robotics 1.5 model family. Both Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 inherit Gemini’s multimodal world knowledge.
> Agentic System Architecture. The full agentic system consists of an orchestrator and an action model that are implemented by the VLM and the VLA, respectively:
> • Orchestrator: The orchestrator processes user input and environmental feedback and controls the overall task flow. It breaks complex tasks into simpler steps that can be executed by the VLA, and it performs success detection to decide when to switch to the next step. To accomplish a user-specified task, it can leverage digital tools to access external information or perform additional reasoning steps. We use GR-ER 1.5 as the orchestrator.
> • Action model: The action model translates instructions issued by the orchestrator into low-level robot actions. It is made available to the orchestrator as a specialized tool and receives instructions via open-vocabulary natural language. The action model is implemented by the GR 1.5 model.
AI researchers have been trying to discover workable architectures for decades, and LLMs are the best we've got so far. There is no reason to believe that this exponential growth on test scores would or even could transfer to other architectures. In fact, the core advantage that LLMs have here is that they can be trained on vast, vast amounts of text scraped from the internet and taken from pirated books. Other model architectures that don't involve next-token-prediction cannot be trained using that same bottomless data source, and trying to learn quickly from real-world experiences is still a problem we haven't solved.
[0] https://storage.googleapis.com/deepmind-media/gemini-robotic...
That feels like you're moving the goal posts a bit.
Exponential growth over the short term is very uninteresting. Exponential growth is exciting when it can compound.
E.g. if i offered you an investing opportunity 500% / per year compounded daily - that's amazing. If the fine print is that that rate will only last for the very near term (say a week), then it would be worse than a savings account.
Well, growth has been on this exponential already for 5+ years (for the METR eval), and we are at the point where models are very close to matching human expert capabilities in many domains - only one or two more years of growth would put us well beyond that point.
Personally I think we'll see way more growth than that, but to see profound impacts on our economy you only need to believe the much more conservative assumption of a little extra growth along the same trend.
> we are at the point where models are very close to matching human expert capabilities in many domains
That's a bold claim. I don't think it matches most people's experiences.
If that was really true people wouldn't be talking about exponential growth. You don't need exponential growth if you are already almost at your destination.
Which domains?
What I’ve seen is that LLMs are very good at simulating an extremely well read junior.
Models know all the tricks but not when to use them.
And because of that, you’re continually have to hand hold them.
Working with an LLM is really closer to pair programming than it is handing a piece of work to an expert.
The stuff I’ve seen in computer vision is far more impressive in terms of putting people out of a job. But even there, it’s still highly specific models left to churn away at tasks that are ostensibly just long and laborious tasks. Which so much of VFX is.
> we are at the point where models are very close to matching human expert capabilities in many domains
This is not true because experts in these domains don't make the same routine errors LLMs do. You may point to broad benchmarks to prove your point, but actual experts in the benchmarked fields can point to numerous examples of purportedly "expert" LLMs making things up in a way no expert would ever.
Expertise is supposed to mean something -- it's supposed to describe both a level of competency and trustworthiness. Until they can be trusted, calling LLMs experts in anything degrades the meaning of expertise.
The most common part of the S-curve by far is the flat bit before and the flat bit after. We just don't graph it because it's boring. Besides which there is no reason at all to assume that this process will follow that shape. Seems like guesswork backed up by hand waving.
Very much handwaving. The question is not meaningful at all without knowing the parameters of the S-curve. It's like saying "I flipped a coin and saw heads. What's the most likely next flip?"
> Just because something exhibits an exponential growth at one point in time, that doesn’t mean that a particular subject is capable of sustaining exponential growth.
Which is pretty ironic given the title of the post
>People notice that while AI can now write programs, design websites, etc, it still often makes mistakes or goes in a wrong direction, and then they somehow jump to the conclusion that AI will never be able to do these tasks at human levels, or will only have a minor impact. When just a few years ago, having AI do these things was complete science fiction!
Both things can be true, since they're orthogonal.
Having AI do these things was complete fiction 10 years ago. And after 5 years of LLM AI, people do start to see serious limits and stunted growth with the current LLM approaches, while also seeing that nobody has proposed another serious contended to that approach.
Similarly, going to the moon was science finction 100 years ago. And yet, we're now not only not in Mars, but 50+ years without a new moon manned landing. Same for airplanes. Science fiction in 1900. Mostly stale innovation wise for the last 30 years.
A lot of curves can fit an exponential line plot, without the progress going forward being exponential.
We would have 1 trillion transistor cpus following Moore's "exponential curve"
I agree with all your points, just wanted to say that transistor count is probably a counter example. We have been keeping with the Moore's Law more or less[1] and M3 Max, a 2023 consumer-grade CPU, has ~100B of transistors, "just" one order of magnitude away from yout 1T. I think that shows we haven't stagnated much in transistor density and the progress is just staggering!
[1] https://en.m.wikipedia.org/wiki/Transistor_count
That one order of magnitude is about 7 years behind the Moore's Law. We're still progressing but it's slower, more expensive and we hit way more walls than before.
Except it’s not been five years, it’s been at most three, since approximately no one was using LLMs prior to ChatGPT’s release, which was just under three years ago. We did have Copilot a year before that, but it was quite rudimentary.
And really, we’ve had even less than that. The first large scale reasoning model was o1, which was released 12 months ago. More useful coding agents are even newer than that. This narrative that we’ve been using these tools for many years and are now hitting a wall doesn’t match my experience at all. AI-assisted coding is way better than it was a year ago, let alone five.
>Except it’s not been five years, it’s been at most three,
Why would it be "at most" 3? We had Chat GPT commercially available as private beta API on 2020. It's only the mass public that got 3.5 3 years ago.
But those who'd do the noticing as per my argument is not just Joe Public (which could be oblivious), but people already starting in 2020, and includes people working in the space, who worked with LLM and LLM-like architectures 2-3 years before 2020.
No, we didn’t. We had the GPT-3 API available in 2020, and approximately no one was using it.
Seems like you missed most of my comment
I stopped once it became clear that the first half was both wrong and irrelevant.
> We would have 1 trillion transistor cpus following Moore's "exponential curve"
Cerebras wafer scale chip has 4 trillion transistors.
https://www.cerebras.ai/chip
> Cerebras wafer scale chip has 4 trillion transistors.
It is also, notably, _wafer-scale_. The metric is not just "number of transistors", but in fact "number of transistors per cm2"
> Given consistent trends of exponential performance improvements over many years and across many industries, it would be extremely surprising if these improvements suddenly stopped.
I'm sure people were saying that about commercial airline speeds in the 1970's too.
But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
With LLM's at the moment, the limiting factors might turn out to be training data, cost, or inherent limits of the transformer approach and the fact that LLM's fundamentally cannot learn outside of their context window. Or a combination of all of these.
The tricky thing about S curves is, you never know where you are on them until the slowdown actually happens. Are we still only in the beginning of the growth part? Or the middle where improvement is linear rather than exponential? And then the growth starts slowing...
> a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
Yes of course it’s not going to increase exponentially forever.
The point is, why predict that the growth rate is going to slow exactly now? What evidence are you going to look at?
It’s possible to make informed predictions (eg “Moore’s law can’t get you further than 1nm with silicon due to fundamental physical limits”). But most commenters aren’t basing their predictions in anything as rigorous as that.
And note, there are good reasons to predict a speedup, too; as models get more intelligent, they will be able to accelerate the R&D process. So quality per-researcher is now proportional to the exponential intelligence curve, AND quantity of researchers scales with number of GPUs (rather than population growth which is much slower).
Yeah exactly!
It’s likely that it will slow down at some point, but the highest likelihood scenario for the near future is that scaling will continue.
NOTE IN ADVANCE: I'm generalizing, naturally, because talking about specifics would require an essay and I'm trying to write a comment.
Why predict that the growth rate is going to slow now? Simple. Because current models have already been trained on pretty much the entire meaningful part of the Internet. Where are they going to get more data?
The exponential growth part of the curve was largely based on being able to fit more and more training data into the models. Now that all the meaningful training data has been fed in, further growth will come from one of two things: generating training data from one LLM to feed into another one (dangerous, highly likely to lead to "down the rabbit hole forever" hallucinations, and weeding those out is a LOT of work and will therefore contribute to slower growth), or else finding better ways to tweak the models to make better use of the available training data (which will produce growth, but much slower than what "Hey, we can slurp up the entire Internet now!" was producing in terms of rate of growth).
And yes, there is more training data available because the Internet is not static: the Internet of 2025 has more meaningful, human-generated content than the Internet of 2024. But it also has a lot more AI-generated content, which will lead into the rabbit-hole problem where one AI's hallucinations get baked into the next one's training, so the extra data that can be harvested from the 2025 Internet is almost certainly going to produce slower growth in meaningful results (as opposed to hallucinated results).
> Where are they going to get more data?
This is a great question, but note that folks were freaking out about this a year or so ago and we seem to be doing fine.
We seem to be making progress with some combination of synthetic training datasets on coding/math tasks, textbooks authored by paid experts, and new tokens (plus preference signals) generated by users of the LLM systems.
It wouldn’t surprise me if coding/math turned out to have a dense-enough loss-landscape to produce enough synthetic data to get to AGI - though I wouldn’t bet on this as a highly likely outcome.
I have been wanting to read/do some more rigorous analysis here though.
This sort of analysis would count as the kind of rigorous prediction that I’m asking for above.
E2A: initial exploration on this: https://chatgpt.com/share/68d96124-a6f4-8006-8a87-bfa7ee4ea3...
Gives some relevant papers such as
https://arxiv.org/html/2211.04325v2#:~:text=3.1%20AI
I am extremely confident that AGI, if it is achievable at all (which is a different argument and one I'm not getting into right now), requires a world model / fact model / whatever terminology you prefer, and is therefore not achievable by models that simply chain words together without having any kind of understanding baked into the model. In other words, LLMs cannot lead to AGI.
Agreed, it surely does require a world-model.
I disagree that generic LLMs plus CoT/reasoning/tool calling (ie the current stack) cannot in principle implement a world model.
I believe LLMs are doing some sort of world modeling and likely are mostly lacking a medium-/long-term memory system in which to store it.
(I wouldn’t be surprised if one or two more architectural overhauls end up occurring before AGI, I also wouldn’t be surprised if these occurred seamlessly with our current trajectory of progress)
Isn’t the memory the pre-trained weights that let it do anything at all? Or do you mean they should be capable of refining them in real-time (learning).
The human brain has many systems that adapt on multiple time-frames which could loosely be called “memory”.
But here I’m specifically interested in real-time updates to medium/long term memory, and the episodic/consciously accessible systems that are used in human reasoning/intelligence.
Eg if I’m working on a big task I can think through previous solutions I learned, remember the salient/surprising lessons, recall recent conversations that may indirectly affect requirements, etc. The brain is clearly doing an associative compression and indexing operation atop the raw memory traces. I feel the current LLM “memory” implementations are very weak compared to what the human brain does.
I suppose there is a sense in which you could say the weights “remember” the training data, but it’s read-only and I think this lack of real-time updating is a crucial gap.
To expand on my hunch about scaffolding - it may be that you can construct an MCP module that can let the LLM retrieve or ruminate on associative memories in such a way as to allow the LLM to not make the same mistake twice and be steerable on a longer timeframe.
I think the best argument against my hunch is that human brains have systems which update the synaptic weights themselves over a timeframe of days-to-months, and so if neural plasticity is the optimal solution here then we may not be able to efficiently solve the problem with “application layer” memory plugins.
But again, there is a lot of solution-space to explore; maybe some LoRA-like algorithm can allow an LLM instance to efficiently update its own weights at test-time, and persist those deltas for efficient inference, thus implementing the required neural plasticity algorithms?
Ah, so you dont know anything about how they work. Thanks for clarification.
Curiously, humans don't seem to require reading the entire internet in order to perform at human level on a wide variety of tasks... Nature suggests that there's a lot of headroom in algorithms for learning on existing sources. Indeed, we had models trained on the whole internet a couple years ago, now, yet model quality has continued to improve.
Meanwhile, on the hardware side, transistor counts in GPUs are in the tens of billions and still increasing steadily.
This is a time horizon thing though. Over the course of future human history AI development might look exponential but that doesn’t mean there won’t be significant plateaus. We don’t even fully understand how the human brain works so whilst the fact it does exist strongly suggests it’s replicable (and humans do it naturally) that doesn’t make it practical in any time horizon that matters to us now. Nor does there seem to be fast movement in that direction since everyone is largely working on the same underlying architecture that isn’t similar to the brain.
Alternative argument, there is no need for more training data, just better algorithms. Throwing more tokens at the problem doesn't solve the fact that training llms using supervised learning is a poor way to integrate knowledge. We have however seen promising results coming out of reinforcement learning and self play. Which means that anthropic and openais' bet on scale is likely a dead end, but we may yet see capability improvements coming from other labs, without the need for greater data collection.
Better algorithms is one of the things I meant by "better ways to tweak the models to make better use of the available training data". But that produces slower growth than the jaw-droppingly rapid growth you can get by slurping pretty much the whole Internet. That produced the sharp part of the S curve, but that part is behind us now, which is why I assert we're approaching the slower-growth part at the top of the curve.
> The point is, why predict that the growth rate is going to slow exactly now? What evidence are you going to look at?
Why predict that the (absolute) growth rate is going to keep accelerating past exactly now?
Exponential growth always assumes a constant relative growth rate, which works in the fiction of economics, but is otherwise far from an inevitability. People like to point to Moore's law ad nauseam, but other things like "the human population" or "single-core performance" keep accelerating until they start cooling off.
> And note, there are good reasons to predict a speedup, too; as models get more intelligent, they will be able to accelerate the R&D process.
And if heaven forbid, R&D ever turns out to start taking more work for the same marginal returns on "ability to accelerate the process", then you no longer have an exponential curve. Or for that matter, even if some parts can be accelerated to an amazing extent, other parts may get strung up on Amdahl's law.
It's fine to predict continued growth, and it's even fine to predict that a true inflection point won't come any time soon, but exponential growth is something else entirely.
> Why predict that the (absolute) growth rate is going to keep accelerating past exactly now?
By following this logic you should have predicted Moore’s law would halt every year for the last five decades. I hope you see why this is a flawed argument. You prove too much.
But I will answer your “why”: plenty of exponential curves exist in reality, and empirically, they can last for a long time. This is just how technology works; some exponential process kicks off, then eventually is rate-limited, then if we are lucky another S-curve stacks on top of it, and the process repeats for a while.
Reality has inertia. My hunch is you should apply some heuristic like “the longer a curve has existed, the longer you should bet it will persist”. So I wouldn’t bet on exponential growth in AI capabilities for the next 10 years, but I would consider it very foolish to use pure induction to bet on growth stopping within 1 year.
And to be clear, I think these heuristics are weak and should be trumped by actual physical models of rate-limiters where available.
> By following this logic you should have predicted Moore’s law would halt every year for the last five decades. I hope you see why this is a flawed argument. You prove too much.
I do think it's continually amazing that Moore's law has continued in some capacity for decades. But before trumpeting the age of exponential growth, I'd love to see plenty of examples that aren't named "Moore's law": as it stands, one easy hypothesis is that "ability to cram transistors into mass-produced boards" lends itself particularly well to newly-discovered strategies.
> So I wouldn’t bet on exponential growth in AI capabilities for the next 10 years, but I would consider it very foolish to use pure induction to bet on growth stopping within 1 year.
Great, we both agree that it's foolish to bet on growth stopping within 1 year. What I'm saying that "growth doesn't stop" ≠ "growth is exponential".
A theory of "inertia" could just as well support linear growth: it's only because we stare at relative growth rates that we treat exponential growth as a "constant" that will continue in the absence of explicit barriers.
Solar panel cost per watt has been dropping exponentially for decades as well...
Partly these are matters of economies of scale - reduction in production costs at scale - and partly it's a matter of increasing human attention leading to steady improvements as the technology itself becomes more ubiquitous.
Sorry, to be clear I was making the stronger claim:
I would consider it very foolish to use pure induction to bet on _exponential_ growth stopping within 1 year.
I think you can easily find plenty of other long-lasting exponential curves. A good starting point would be:
https://en.m.wikipedia.org/wiki/Progress_studies
With perhaps the optimistic case as
https://en.m.wikipedia.org/wiki/Accelerating_change
This is where I’d really like to be able to point to our respective Manifold predictions on the subject; we could circle back in a year’s time and review who was in fact correct. I wager internet points it will be me :)
Concretely, https://manifold.markets/JoshYou/best-ai-time-horizon-by-aug...
I think progress per dollar spent has actually slowed dramatically over the last three years. The models are better, but AI spending has increased by several orders of magnitude during the same time, from hundreds of millions to hundreds of billions. You can only paper over the lack of fundamental progress by spending on more compute for so long. And even if you manage to keep up the current capex, there certainly isn't enough capital in the world to accelerate spending for very long.
It has already been trained on all the data. The other obvious next step is to increase context window, but that's apparently very hard/costly.
I don’t think this is true. See https://arxiv.org/html/2211.04325v2 for example.
Yes, nobody knows the future of AI, but sometimes people use curve fitting to try convince themselves or others that they know what’s going to happen.
> why predict that the growth rate is going to slow exactly now?
why predict that it will continue? Nobody ever actually makes an argument that growth is likely to continue, they just extrapolate from existing trends and make a guess, with no consideration of the underlying mechanics.
Oh, go on then, I'll give a reason: this bubble is inflated primarily by venture capital, and is not profitable. The venture capital is starting to run out, and there is no convincing evidence that the businesses will become profitable.
Indeed you can't be sure. But on the other hand a bunch of the commentariat has been claiming (with no evidence) that we're at the midpoint of the sigmoid for the last three years. They were wrong. And then you had the AI frontier lab insiders who predicted an accelerating pace of progress for the last three years. They were right. Now, the frontier labs rarely (never?) provide evidence either, but they do have about a year of visibility into the pipeline, unlike anyone outside.
So at least my heuristic is to wait until a frontier lab starts warning about diminishing returns and slowdowns before calling the midpoint or multiple labs start winding down capex. The first component might have misaligned incentives, but if we're in a realistic danger of hitting a wall in the next year, the capex spending would not be accelerating the way it is.
Capex requirements might be on a different curve than model improvements.
E.g. you might need to accelerate spending to get sub-linear growth in model output.
If valuations depend on hitting the curves described in the article, you might see accelerating capex at precisely the time improvements are dropping off.
I don’t think frontier labs are going to be a trustworthy canary. If Anthropic says they’re reaching the limit and OpenAI holds the line that AGI is imminent, talent and funding will flee Anthropic for OpenAI. There’s a strong incentive to keep your mouth shut if things aren’t going well.
I think you nailed it. The capex is desperation in the hopes of maintaining the curve. I have heard actual AI researchers say progress is slowing, just not from the big companies directly.
> Indeed you can't be sure. But on the other hand a bunch of the commentariat has been claiming (with no evidence) that we're at the midpoint of the sigmoid for the last three years.
I haven’t followed things closely, but I’ve seen more statements that we may be near the midpoint of a sigmoid than that we are at it.
> Thy were wrong. And then you had the AI frontier lab insiders who predicted an accelerating pace of progress for the last three years. They were right.
I know it’s an unfair question because we don’t have an objective way to measure speed of progress in this regard, but do you have evidence for models not only getting better, but getting better faster? (Remember: even at the midpoint of a sigmoid, there still is significant growth)
I thought the original article included the strongest objective data point on this: recent progress on the METR long task benchmark isn't just on the historical "task length doubling every 7 months" best fit, but is trending above it.
A year ago, would you have thought that a pure LLM with no tools could get a gold medal level score in the 2025 IMO finals? I would have thought that was crazy talk. Given the rates of progress over the previous few years, maybe 2027 would have been a realistic target.
> I thought the original article included the strongest objective data point on this: recent progress on the METR long task benchmark isn't just on the historical "task length doubling every 7 months" best fit, but is trending above it.
There is selection bias in that paper. For example, they chose to measure “AI performance in terms of the length of tasks the system can complete (as measured by how long the tasks take humans)”, but didn’t include calculation tasks in the set of tasks, and that’s a field in which machines have been able to reliably do tasks for years that humans would take centuries or more to perform, but at which modern LLM-based AIs are worse than, say, Python.
I think leaving out such taks is at least somewhat defensible, but have to wonder whether there are other tasks at which LLMs do not become better as rapidly they also leave out.
Maybe it is a matter of posing different questions, with the article being discussed being more interested in “(When) can we (ever) expect LLMs to do jobs that now require humans to do?” than in “(How fast) do LLMs get smarter over time?”
Or are the model author’s, i.e the blog author with a vested interest, getting better at optimizing for the test while real world performance aren’t increasing as fast?
> And then you had the AI frontier lab insiders who predicted an accelerating pace of progress for the last three years.
Progress has most definitely not been happening at an _accelerating_ pace.
There are a few other limitations, in particular how much energy, hardware and funding we (as a society) can afford to throw at the problem, as well as the societal impact.
AI development is currently given a free pass on these points, but it's very unclear how long that will last. Regardless of scientific and technological potential, I believe that we'll hit some form of limit soon.
Luckily both middle eastern religious dictatorships and countries like China are throwing way too many resources at it ...
So we can rest assured the well-being of a country's people will not allowed to be a drag on AI progress.
There's a Mulla Nasrudin joke that's sort of relevant here:
Nasrudin is on a flight, when suddenly the pilot comes on the intercom, saying, "Passengers, we apologize, but we have experienced an engine burn-out. The plane can still fly on the remaining three engines, but we'll be delayed in our arrival by two hours."
Nasrudin speaks up "let's not worry, what's 2 hours really"
A few minutes later, the airplane shakes, and passengers see smoke coming out of another engine. Again, the intercom crackles to life.
"This is your captain speaking. Apologies, but due to a second engine burn-out, we'll be delayed by another two hours."
The passengers are agitated, but the Mulla once again tries to remains calm.
Suddenly, the third engine catches fire. Again, the pilot comes on the intercom and says, "I know you're all scared, but this is a very advanced aircraft, and it can safely fly on only a single engine. But we will be delayed by yet another two hours."
At this, Nasrudin shouts, "This is ridiculous! If one more engine goes, we'll be stuck up here all day"
> I'm sure people were saying that about commercial airline speeds in the 1970's too.
Or CPU frequencies in the 1990's. Also we spent quite a few decades at the end of the 19th century thinking that physics was finished.
I'm not sure that explaining it as an "S curve" is really the right metaphor either, though.
You get the "exponential" growth effect when there's a specific technology invented that "just needs to be applied", and the application tricks tend to fall out quickly. For sure generative AI is on that curve right now, with everyone big enough to afford a datacenter training models like there's no tomorrow and feeding a community of a million startups trying to deploy those models.
But nothing about this is modeled correctly as an "exponential", except in the somewhat trivial sense of "the community of innovators grows like a disease as everyone hops on board". Sure, the petri dish ends up saturated pretty quickly and growth levels off, but that's not really saying much about the problem.
Progress in information systems cannot be compared to progress in physical systems.
For starters, physical systems compete for limited resources and labor.
For another, progress in software vastly reduces the cost of improved designs. Whereas progress in physical systems can enable but still increase the cost of improved designs.
Finally, the underlying substrate of software is digital hardware, which has been improving in both capabilities and economics exponentially for almost 100 years.
Looking at information systems as far back as the first coordination of differentiating cells to human civilization is one of exponential improvement. Very slow, slow, fast, very fast. (Can even take this further, to first metabolic cycles, cells, multi-purpose genes, modular development genes, etc. Life is the reproduction of physical systems via information systems.)
Same with human technological information systems, from cave painting, writing, printing, telegraph, phone, internet, etc.
It would be VERY surprising if AI somehow managed to fall off the exponential information system growth path. Not industry level surprising, but "everything we know about how useful information compounds" level surprising.
> Looking at information systems as far back as the first coordination of differentiating cells to human civilization is one of exponential improvement.
Under what metric? Most of the things you mention don't have numerical values to plot on a curve. It's a vibe exponential, at best.
Life and humans have become better and better at extracting available resources and energy, but there's a clear limit to that (100%) and the distribution of these things in the universe is a given, not something we control. You don't run information systems off empty space.
> It's a vibe exponential, at best.
I am a little stunned you think so.
Life has been on Earth about 3.5-3.8 billion years.
Break that into 0.5-0.8, 1 billion, 1 billion, 1 billion "quarters", and you will find exponential increases in evolutions rate of change and production of diversity across them by many many objective measures.
Now break up the last 1 billion into 100 million year segments. Again exponential.
Then break up the last 100 million into segments. Again.
Then the last 10 million years into segments, and watch humans progress.
The last million, in 100k year segments, watch modern humans appear.
the last 10k years into segments, watch agriculture, civilizations, technology, writing ...
The last 1000 years, incredible aggregation of technology, math, and the appearance of formal science
last 100 years, gets crazy. Information systems appear in labs, then become ubiquitous.
last 10 years, major changes, AI starts having mainstream impact
last 1 year - even the basic improvements to AI models in the last 12 months are an unprecedented level of change, per time, looking back.
I am not sure how any of could appear "vibe", given any historical and situational awareness.
This progression is universally recognized. Aside from creationists and similar contingents.
The progression is much less clear when you don't view it anthropocentrically. For instance, we see an explosion in intelligible information: information that is formatted in human language or human-made formats. But this is concomitant with a crash in natural spaces and biodiversity, and nothing we make is as information-rich as natural environments, so from a global perspective, what we have is actually an information crash. Or hell, take something like agriculture. Cultured environments are far, far simpler than wild ones. Again: an information crash.
I'm not saying anything about the future, mind you. Just that if we manage to stop sniffing our own farts for a damn second and look at it from the outside, current human civilization is a regression on several metrics. We didn't achieve dominion over nature by being more subtle or complex than it. We achieved that by smashing nature with a metaphorical club and building upon its ruins. Sure, it's impressive. But it's also brutish. Intelligence requires intelligible environments to function, and that is almost invariably done at the expense of complexity and diversity. Do not confuse success for sophistication.
> last 1 year - even the basic improvements to AI models in the last 12 months are an unprecedented level of change, per time, looking back.
Are they? What changed, exactly? What improvements in, say, standards of living? In the rate of resource exploitation? In energy efficiency? What delta in our dominion over Earth? I'll tell you what I think: I think we're making tremendous progress in simulating aspects of humanity that don't matter nearly as much as we think they do. The Internet, smartphones, AI, speak to our brains in an incredible way. Almost like it was by design. However, they matter far more to humans within humanity than they do in the relationship of humanity with the rest of the universe. Unlike, say, agriculture or coal, which positively defaced the planet. Could we leverage AI to unlock fusion energy or other things that actually matter, just so we can cook the rest of the Earth with it? Perhaps! But let's not count our chickens before they hatch. As of right now, in the grand scheme of things, AI doesn't matter. Except, of course, in the currency of vibes.
I am curious when you think we will run out of atoms to make information systems.
How many billions of years you think that might take.
Of all the things to be limited by, that doesn't seem like a near term issue. Just an asteroid or two alone will provide resources beyond our dreams. And space travel is improving at a very rapid rate.
In the meantime, in terms of efficiency of using Earth atoms for information processing, there is still a lot space at the "bottom", as Feynman said. Our crude systems are limited today by their power waste. Small energy efficient systems, and more efficient heat shedding, will enable full 3D chips ("cubes"?) and vastly higher density of packing those.
The known limit for information for physical systems per gram, is astronomical:
• Bremermann’s limit : 10^47 operations per second, per gram.
Other interesting limits:
• Margolus–Levitin bound - on quantum state evolution
• Landauer’s principle - Thermodynamic cost of erasing (overwriting) one bit.
• Bekenstein bound: Maximum storage by volume.
Life will go through many many singularities before we get anywhere near hard limits.
> Progress in information systems cannot be compared to progress in physical systems.
> For starters, physical systems compete for limited resources and labor.
> Finally, the underlying substrate of software is digital hardware…
See how these are related?
By physical systems, I meant systems whose purpose is to do physical work. Mechanical things. Gears. Struts.
Computer hardware is an information system. You are correct that it is has a physical component. But its power comes from its organization (information) not its mass, weight, etc.
Transistors get more powerful, not less, when made from less matter.
Information systems move from substrate to more efficient substrate. They are not their substrate.
They still depend on physical resources and labor. They’re made by people and machines. There’s never been more resources going into information systems than right now, and AI accelerated that greatly. Think of all the server farms being built next to power plants.
Yes. Of course.
All information has a substrate at any given time.
But the amount of computation per resource drops because computation is not something tied to any unit matter. Nor any particular substrate.
It is not the same as a steam engine, which can only be made so efficient.
The amount of both matter and labor per quantity of computing power is dropping exponentially. Right?
See a sibling reply on the physical limits of computation. We are several singularities away from any hard limit.
Evidence: History of industrialization vs. history of computing. Fundamental physics.
> The amount of both matter and labor per quantity of computing power is dropping exponentially. Right?
Right. The problem is the demand is increasing exponentially.
It’s not like when computers got 1000x more powerful we were able to get by with 1/1000x of them. Quite the opposite (or inverse, to be more precise).
Just to go back to my original point, I think drawing a comparison that physical systems compete for physical resources and implying information systems don’t is misleading at best. It’s especially obvious right now with all the competition for compute going on.
>[..] to first metabolic cycles, cells, multi-purpose genes, modular development genes, etc.
One example is when cells discovered energy production using mitochondria. Mitochondria add new capabilities to the cell, with (almost) no downside like: weight, temperature-sensitivity, pressure-sensitivity. It's almost 100% upside.
If someone tried to predict the future number of mitochondria-enabled cells from the first one, he could be off by 10^20 less cells.
I am writing a story the last 20 days, with that exact story plot, have to get my stuff together and finish it.
That's fallacious reasoning, you are extrapolating from survivorship bias. A lot of technologies, genes, or species have failed along the way. You are also subjectively attributing progression as improvements, which is problematic as well, if you speak about general trends. Evolution selects for adaptation not innovation. We use the theory of evolution to explain the emergence of complexity, but that's not the sole direction and there are many examples where species evolved towards simplicity (again).
Resource expense alone could be the end of AI. You may look up historic island populations, where technological demands (e.g. timber) usually led to extinction by resource exhaustion and consequent ecosystem collapse (e.g. deforestation leading to soil erosion).
See replies to sibling comments.
Doesn't answer the core fallacy. Historical "technological progress" can't be used as argument for any particular technology. Right now, if we are talking about AI, we're talking about specific technologies, which may just as well fail and remain inconsequential in the grand scheme of things, like most technologies, most things really, did in the past. Even more so since we don't understand much anything in either human or artificial cognition. Again and again, we've been wrong about predicting the limits and challenges in computation.
You see, your argument is just bad. You are merely guessing like everyone else.
My arguments are very strong.
Information technology does not operate by the rules of any other technology. It is a technology of math and organization, not particular materials.
The unique value of information technology is that it compounds the value of other information and technology, including its own, and lowers the bar for its own further progress.
And we know with absolute certainty we have barely scratched the computing capacity of matter. Bremermann’s limit : 10^47 operations per second, per gram. See my other comment for other relevant limits.
Do you also expect a wall in mathematics?
And yes, an unbroken historical record of 4.5 billions years of information systems becoming more sophisticated with an exponential speed increase over time, is in fact a very strong argument. Changes that took a billion years initially, now happen in very short times in today's evolution, and essentially instantly in technological time. The path is long, with significant acceleration milestones at whatever scale of time you want to look at.
Your argument, on the other hand, is indistinguishable from cynical AI opinions going back decades. It could be made any time. Zero new insight. Zero predictive capacity.
Substantive negative arguments about AI progress have been made. See "Perceptrons" by Marvin Minksy and Seymour Papert, for an example of what a solid negative argument looks like. It delivered insights. It made some sense at the time.
> Your argument, on the other hand, is indistinguishable from cynical AI opinions going back decades. It could be made any time. Zero new insight. Zero predictive capacity.
Pointing out logical fallacies?
Lol.
> Historical "technological progress" can't be used as argument for any particular technology.
Historical for billions of years of natural information system evolution. Metabolic, RNA, DNA, protein networks, epigenetic, intracellular, intercellular, active membrane, nerve precursors, peptides, hormonal, neural, ganglion, nerve nets, brains.
Thousands of years of human information systems. Hundreds of years of technological information systems. Decades of digital information systems. Now in in just the last few years, progress year to year is unlike any seen before.
Significant innovations being reported virtually every day.
Yes track records carry weight. Especially with no good reason for any reason for a break, while every tangible reason to believe nothing is slowing down, right up to today.
"Past is not a predictor of future behavior" is about asset gains relative to asset prices in markets where predictable gains have had their profitability removed by the predictive pricing of others. A highly specific feedback situation making predicting asset gains less predictable even when companies do maintain strong predictable trends in fundamentals.
It is a narrow specific second order effect.
It is the worst possible argument for anything outside of those special conditions.
Every single thing you have ever learned was predicated on the past having strong predictive qualities.
You should understand what an argument means, before throwing it into contexts where its preconditions don't exist.
> Right now, if we are talking about AI, we're talking about specific technologies, which may just as well fail and remain inconsequential in the grand scheme of things, like most technologies, most things really, did in the past. Even more so since we don't understand much anything in either human or artificial cognition. Again and again, we've been wrong about predicting the limits and challenges in computation.
> Your argument [...] is indistinguishable from cynical AI opinions going back decades. It could be made any time. Zero new insight. Zero predictive capacity.
If I need to be clearer, nobody could know when you wrote that by reading it. It isn't an argument it's a free floating opinion. And you have not made it more relevant today, than it would have been all the decades up till now, through all the technological transitions up until now. Your opinion was equally "applicable", and no less wrong.
This is what "Zero new insight. Zero predictive capacity" refers to.
> Substantive negative arguments about AI progress have been made. See "Perceptrons" by Marvin Minksy and Seymour Papert, for an example of what a solid negative argument looks like. It delivered insights. It made some sense at the time.
Here you go:
https://en.wikipedia.org/wiki/Perceptrons_(book)
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
I'd argue all of them. Any true exponential eventually gets to a point where no computer can even store its numerical value. It's a physically absurd curve.
The narrative quietly assumes that this exponential curve can in fact continue since it will be the harbinger of the technological singularity. Seems more than a bit eschatological, but who knows.
If we suppose this tech rapture does happen, all bets are off; in that sense it's probably better to assume the curve is sigmoidal, since the alternative is literally beyond human comprehension.
Barring fully reversible processes as the basis for technology, you still quickly run into energy and cooling constraints. Even with that, you'd have time or energy density constraints. Unlimited exponentials are clearly unphysical.
Yes, this is an accurate description, and also completely irrelevant to the issue at hand.
At the stage of development we are today, no one cares how fast it takes for the exponent to go from eating our galaxy to eating the whole universe, or whether it'll break some energy density constraint before it and leave a gaping zero-point energy hole where our local cluster used to be.
It'll stop eventually. What we care about is whether it stops before it breaks everything for us, here on Earth. And that's not at all a given. Fundamental limits are irrelevant to us - it's like worrying that putting too many socks in a drawer will eventually make them collapse into a black hole. The limits that are relevant to us are much lower, set by technological, social and economic factors. It's much harder to say where those limits lay.
Sure, but it reminds us that we are dealing with an S-curve, so we need to ask where the inflection point is. i.e. what are the relevant constraints, and can they reasonably sustain exponential growth for a while still? At least as an outsider, it's not obvious to me whether we won't e.g. run into bandwidth or efficiency constraints that make scaling to larger models infeasible without reimagining the sorts of processors we're using. Perhaps we'll need to shift to analog computers or something to break through cooling problems, and if the machine cannot find the designs for the new paradigm it needs, it can't make those exponential self-improvements (until it matches its current performance within the new paradigm, it gets no benefit from design improvements it makes).
My experience is that "AI can write programs" is only true for the smallest tasks, and anything slightly nontrivial will leave it incapable of even getting started. It doesn't "often makes mistakes or goes in a wrong direction". I've never seen it go anywhere near the right direction for a nontrivial task.
That doesn't mean it won't have a large impact; as an autocomplete these things can be quite useful today. But when we have a more honest look at what it can do now, it's less obvious that we'll hit some kind of singularity before hitting a constraint.
I think the technological singularity has generally been a bit of a metaphor rather than a mathematical singularity.
Some exponentials are slow enough that it takes decades or centuries, though.
You clearly haven’t played my idle game.
I am getting the sense that the 2nd deriative of the curve is already hitting negative teritory. models get updated, and I don't feel I'm getting better answers from the LLMs.
On the application front though, it feels that the advancements from a couple of years ago are just beginning to trickle down to product space. I used to do some video editing as a hobby. Recently I picked it up again, and was blown away by how much AI has chipped away the repetitive stuff, and even made attempts at the more creative aspects of production, with mixed but promising results.
What are some examples of tasks you no longer have to do?
one example is auto generating subtitles -- elements of this tasks, e.g. speech to text with time coding, have been around for a while (openai whisper and others), but they have only recently been integrated into video editors and become easy to use for non-coders. other examples: depth map (estimating object distance from the camera; this is useful when you want to blur the background), auto-generating masks with object tracking.
>I'm sure people were saying that about commercial airline speeds in the 1970's too.
Also elegantly formulated by: https://idlewords.com/talks/web_design_first_100_years.htm
>> it would be extremely surprising if these improvements suddenly stopped.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
An S-curve is exactly the opposite of "suddenly" stopping.
It is possible for us to get a sudden stop, due to limiting factors.
For a hypothetical: if Moore's Law had continued until we hit atomic resolution instead of the slowdown as we got close to it, that would have been an example of a sudden stop: can't get transistors smaller than atoms, but yet it would have been possible (with arbitrarily large investments that we didn't have) to halve transistor sizes every 18 months until suddenly we can't.
Now I think about it, the speed of commercial airlines is also an example of a sudden stop: we had to solve sonic booms first before even considering a Concorde replacement.
Agreed!
And, maybe I'm missing something, but to me it seems obvious that flat top part of the S curve is going to be somewhere below human ability... because, as you say, of the training data. How on earth could we train an LLM to be smarter than us, when 100% of the material we use to teach it how to think, is human-style thinking?
Maybe if we do a good job, only a little bit below human ability -- and what an accomplishment that would still be!
But still -- that's a far cry from the ideas espoused in articles like this, where AI is just one or two years away from overtaking us.
Author here.
The standard way to do this is Reinforcement Learning: we do not teach the model how to do the task, we let it discover the _how_ for itself and only grade it based on how well it did, then reinforce the attempts where it did well. This way the model can learn wildly superhuman performance, e.g. it's what we used to train AlphaGo and AlphaZero.
The cost of the next number in a GPT (3>4>5) seems to be in 2 ways:
1) $$$
2) data
The second (data) also isn't cheap. As it seems we've already gotten through all the 'cheap' data out there. So much so that synthetic data (fart huffing) is a big thing now. People tell it's real and useful and passes the glenn-horf theore... blah blah blah.
So it really more so comes down to just:
1) $$$^2 (but really pick any exponent)
In that, I'm not sure this thing is a true sigmoid curve (see: biology all the time). I think it's more a logarithmic cost here. In that, it never really goes away, but it gets really expensive to carry out for large N.
[To be clear, lots of great shit happens out there in large N. An AI god still may lurk in the long slow slope of $N, the cure for boredom too, or knowing why we yawn, etc.]
Yes. It's true that we don't know, with any certainty, (1) whether we are hitting limits to growth intrinsic to current hardware and software, (2) whether we will need new hardware or software breakthroughs to continue improving models, and (3) what the timing of any necessary breakthroughs, because innovation doesn't happen on a predictable schedule. There are unknown unknowns.[a]
However, there's no doubt that at a global scale, we're sure trying to maintain current rates of improvement in AI. I mean, the scale and breadth of global investment dedicated to improving AI, presently, is truly unprecedented. Whether all this investment is driven by FOMO or by foresight, is irrelevant. The underlying assumption in all cases is the same: We will figure out, somehow, how to overcome all known and unknown challenges along the way. I have no idea what the odds of success may be, but they're not zero. We sure live in interesting times!
---
[a] https://en.wikipedia.org/wiki/There_are_unknown_unknowns
I hope the crash won't be unprecedented as well...
I hope so too. Capital spending on AI appears to be holding up the entire economy:
https://am.jpmorgan.com/us/en/asset-management/adv/insights/...
It never ceases to amaze me how people consistently mistake the initial phase of a sigmoid curve for an exponential function.
"I'm sure people were saying that about commercial airline speeds in the 1970's too."
But there are others that keep going also. Moore's law is still going (mostly, slowing), and made it past a few pinch points where people thought it was the end.
The point is, that over 30 decades, many people said Moore's law was at an end, and then it wasn't, there was some breakthrough that kept it going. Maybe a new one will happen.
The thing with AI is, maybe the S curve flattens out , after all the jobs are gone.
Everyone is hoping the S curve flattens out somewhere just below human level, but what if it flattens out just beyond human level? We're still screwed.
Each specific technology can be S-shaped, but advancements in achieving goals can still maintain an exponential curve. e.g. Moore's law is dead with the end of Dennard scaling, but computation improvements still happen with parallelism.
Meta's Behemoth shows that scaling number of parameters has diminished returns, but we still have many different ways to continue advancements. Those who point at one thing and say "see", isn't really seeing. Of course there are limits, like energy but with nuclear energy or photon-based computing were nowhere near the limits.
Ironically, given that it probably mistakes a sigmoid curve for an exponential curve, "Failing to understand the exponential, again" is an extremely apt name for this blog post.
Yes exponential is only an approximation of the first part of S curves. And this author claims that he understands the exponential better than others…
the author is an anthropic employee
if the money dries up because the investors lose faith on the exponential continuing, then his future looks much dimmer
That is even true for covid for obvious reasons, because Covid runs out of people it can infect at some point.
Infectious diseases rarely see actual exponential growth for logistical reasons. It's a pretty unrealistic model that ignores that the disease actually needs to find additional hosts to spread, the local availability of which starts to go down from the first victim.
If you assume the availability of hosts is local to the perimeter of the infected hosts, then the relative growth is limited to 2/R where R is the distance from patient 0 in 2 dimensions. It's becuase an area of the circle defines how many hosts are already ill but the interaction can only happen on the perimeter of the circle.
The disease is obviously also limited by the total amount of hosts, but I assume there's also the "bottom" limit - i.e. the resource consumption of already-infected hosts.
It also depends on how panicked people are. Covid was never going to spread like ebola, for instance: it was worse. Bad enough to harm and kill people, but not bad enough to scare them into self-enforced isolation and voluntary compliance with public health measures.
Back on the subject of AI, I think the flat part of the curve has always been in sight. Transformers can achieve human performance in some, even many respects, but they're like children who have to spend a million years in grade school to learn their multiplication tables. We will have to figure out why that is the case and how to improve upon it drastically before this stuff really starts to pay off. I'm sure we will but we'll be on a completely different S-shaped curve at that point.
Yes the model where the S curves comes out is extremely simplified. Looking at covid curves we could have well said it was parabolic, but that’s much less worrisome
It's obvious, but the problem was that enough people would die in the process for people to be worried. Similarly, if the current AI will be able to replace 99% of devs in 5-10 years (or even worse, most white collar jobs) and flatten out there without becoming a godlike AGI, it will still have enormous implications for the economy.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
S curves are exponential before they start tapering off though. It's hard to predict how long that could continue, so there's an argument to be made that we should remain optimistic and milk that while we can lest pessimism cut off investment top early.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
S curves are exponential before they start tapering off though. It's hard to predict how long that could continue, so there's an argument to be made that we should milk that while we can.
> I'm sure people were saying that about commercial airline speeds in the 1970's too.
They'd be wrong, of course - for not realizing demand is a limiting factor here. Airline speeds plateaued not because we couldn't make planes go faster anymore, but because no one wanted them to go faster.
This is partially economical and partially social factor - transit times are bucketed by what they enable people to do. It makes little difference if going from London to New York takes 8 hours instead of 12 - it's still in the "multi-day business trip" bucket (even 6 hours goes into that bucket, once you add airport overhead). Now, if you could drop that to 3 hours, like Concorde did[0], that finally moves it into "hop over for a meet, fly back the same day" bucket, and then business customers start paying attention[1].
For various technical, legal and social reasons, we didn't manage to cross that chasm before money for R&D dried out. Still, the trend continued anyway - in military aviation and, later, in supersonic missiles.
With AI, the demand is extreme and only growing, and it shows no sign of being structured into classes with large thresholds between them - in fact, models are improving faster than we're able to put them to any use; even if we suddenly hit a limit now and couldn't train even better models anymore, we have decades of improvements to extract just from learning how to properly apply the models we have. But there's no sign we're about to hit a wall with training any time soon.
Airline speeds are inherently a bad example for the argument you're making, but in general, I don't think pointing out S-curves is all that useful. As you correctly observe:
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
But, what happens when one technology - or rather, one metric of that technology - stops improving? Something else starts - another metric of that technology, or something built on top of it, or something that was enabled by it. The exponent is S-curves on top of S-curves, all the way down, but how long that exponent is depends on what you consider in scope. So, a matter of accounting. So yeah, AI progress can flatten tomorrow or continue exponentially for the next couple years - depending on how narrowly you define "AI progress".
Ergo, not all that useful.
--
[0] - https://simpleflying.com/concorde-fastest-transatlantic-cros...
[1] - This is why Elon Musk wasn't immediately laughed out of the room after proposing using Starship for moving people and cargo across the Earth, back in 2017. Hopping between cities on an ICBM sounds borderline absurd for many reasons, but it also promised cutting flight time to less than one hour between any two points on Earth, which put it a completely new bucket, even more interesting for businesses.
Starship produces deadly noise in a large radius around it, whatever space port you're going to build, it's going to be far away from civilization.
Yes, though "far" isn't so large as to be inconceivable: the city of Starbase is only 2.75 km from the Starship launch tower.
That kind of distance may or may not be OK for a whole bunch of other reasons, many of which I'm not even qualified to guess at the nature of, but the noise at least isn't an absolute issue for reasonable scale civil infrastructure isolated development in many places.
> I'm sure people were saying that about commercial airline speeds in the 1970's too.
They were also saying that about CPU clock speeds.
There’s a key way to think about a process that looks exponential and might or might not flatten out into an S curve: reasoning about fundamental limits. For COVID it would obviously flatten out because there are finite humans, and it did when the disease had in fact infected most humans on the planet. For commercial airlines you could reason about the speed of sound or escape velocity and see there is again a natural upper limit- although which of those two would dominate would have very different real world implications.
For computational intelligence, we have one clear example of an upper limit in a biological human brain. It only consumes about 25W and has much more intelligence than today’s LLMs in important ways. Maybe that’s the wrong limit? But Moore’s law has been holding for a very long time. And smart physicists like Feynman in his seminal lecture predicting nanotechnology in 1959 called “there’s plenty of room at the bottom” have been arguing that we are extremely far from running into any fundamental physical limits on the complexity of manufactured objects. The ability to manufacture them we presume is limited by ingenuity, which jokes aside shows no signs of running out.
Training data is a fine argument to consider. Especially since there are training on “the whole internet” sorta. The key breakthrough of transformers wasn’t in fact autoregressive token processing or attention or anything like that. It was that they can learn from (memorize / interpolate between / generalize) arbitrary quantities of training data. Before that every kind of ML model hit scaling limits pretty fast. Resnets got CNNs to millions of parameters but they still became quite difficult to train. Transformers train reliably on every size data set we have ever tried with no end in sight. The attention mechanism shortens the gradient path for extremely large numbers of parameters, completely changing the rules of what’s possible with large networks. But what about the data to feed them?
There are two possible counter arguments there. One is that humans don’t need exabytes of examples to learn the world. You might reasonably conclude from this that NNs have some fundamental difference vs people and that some hard barrier of ML science innovation lies in the way. Smart scientists like Yann LeCun would agree with you there. I can see the other side of that argument too - that once a system is capable of reasoning and learning it doesn’t need exhaustive examples to learn to generalize. I would argue that RL reasoning systems like GRPO or GSPO do exactly this - they let the system try lots of ways to approach a difficult problem until they figure out something that works. And then they cleverly find a gradient towards whatever technique had relative advantage. They don’t need infinite examples of the right answer. They just need a well chosen curriculum of difficult problems to think about for a long time. (Sounds a lot like school.) Sometimes it takes a very long time. But if you can set it up correctly it’s fairly automatic and isn’t limited by training data.
The other argument is what the Silicon Valley types call “self play” - the goal of having an LLM learn from itself or its peers through repeated games or thought experiments. This is how Alpha Go was trained, and big tech has been aggressively pursuing analogs for LLMs. This has not been a runaway success yet. But in the area of coding agents, arguably where AI is having the biggest economic impact right now, self play techniques are an important part of building both the training and evaluation sets. Important public benchmarks here start from human curated examples and algorithmically enhance them to much larger sizes and levels of complexity. I think I might have read about similar tricks in math problems but I’m not sure. Regardless it seems very likely that this has a way to overcome any fundamental limit on availability of training data as well, based on human ingenuity instead.
Also, if the top of the S curve is high enough, it doesn’t matter that it’s not truly exponential. The interesting stuff will happen before it flattens out. E.g. COVID. Consider the y axis “human jobs replaced by AI” instead of “smartness” and yes it’s obviously an S curve.
> For computational intelligence, we have one clear example of an upper limit in a biological human brain. It only consumes about 25W and has much more intelligence than today’s LLMs in important ways. Maybe that’s the wrong limit?
It's a good reference point, but I see no reason for it to be an upper limit - by the very nature of how biological evolution works, human brains are close to the worst possible brains advanced enough to start a technological revolution. We're the first brain on Earth that crossed that threshold, and in evolutionary timescales, all that followed - all human history - happened in an instant. Evolution didn't have time yet to iterate on our brain design.
> But a lot of technologies turn out to be S-shaped, not purely exponential, because there are limiting factors.
S curves are exponential before they start tapering off though. It's hard to predict how long that could continue, so there's an argument to be made that we should remain optimistic and milk that while we can lest pessimism cut off investment too early.
As they say, every exponential is a sigmoid in disguise. I think the exponential phase of growth for LLM architectures is drawing to a close, and fundamentally new architectures will be necessary for meaningful advances.
I'm also not convinced by the graphs in this article. OpenAI is notoriously deceptive with their graphs, and as Gary Marcus has already noted, that METR study comes with a lot of caveats: [https://garymarcus.substack.com/p/the-latest-ai-scaling-grap...]
What makes you believe the exponential phase will end soon?
Yes that's logistic growth basically
Exponential curves don't last for long fortunately, or the universe would have turned into a quark soup. The example of COVID is especially ironic, considering it stopped being a real concern within 3 years of its advent despite the exponential growth in the early years.
Those who understand exponentials should also try to understand stock and flow.
Reminds me a bit of the "ultraviolet catastrophe".
> The ultraviolet catastrophe, also called the Rayleigh–Jeans catastrophe, was the prediction of late 19th century and early 20th century classical physics that an ideal black body at thermal equilibrium would emit an unbounded quantity of energy as wavelength decreased into the ultraviolet range.
[...]
> The phrase refers to the fact that the empirically derived Rayleigh–Jeans law, which accurately predicted experimental results at large wavelengths, failed to do so for short wavelengths.
https://en.wikipedia.org/wiki/Ultraviolet_catastrophe
Right. Nobody believed that the intensity would go to infinity. What they believed was that the theory was incomplete, but they didn't know how or why. And the solution required inventing a completely new theory.
Exponentials exist in their environment. Didn't Covid stop because we ran out of people to infect. Of course it can't keep going exponential, because there aren't exponential people to infect.
What is this limit on AI? It is technology, energy, something. All these things can be over-come, to keep the exponential going.
And of course, systems also break at the exponential. Maybe AI is stopped by the world economy collapsing. AI advancement would be stopped, but that is cold comfort to the humans.
>What is this limit on AI?
Data. Think of our LLMs like bacteria in a Petri dish. When first introduced, they achieve exponential growth by rapidly consuming the dish's growth medium. Once the medium is consumed, growth slows and then stops.
The corpus of information on the Internet, produced over several decades, is the LLM's growth medium. And we're not producing new growth medium at an exponential rate.
> What is this limit on AI?
Gulf money, for one. DoD budget would be another.
Booms are economic phenomena, not technological phenomena. When looking for a limiting factor of a boom, think about the money taps.
> What is this limit on AI? It is technology, energy, something. All these things can be over-come, to keep the exponential going.
That's kind of begging the question. Obviously if all the limitations on AI can be overcome growth would be exponential. Even the biggest ai skeptic would agree. The question is, will it?
Long COVID is still a thing, the nAbs immunity is pretty paltry because the virus keeps changing its immunity profile so much. T-cells help but also damage the host because of how COVID overstimulates them. A big reason people aren't dying like they used to is because of the government's strategy of constant infection which boosts immunity regularly* while damaging people each time, that plus how Omicron changed SARS-CoV-2's cell entry mechanism to avoid cell-cell fusion (syncytia) that caused huge over-reaction in lung tissue.
If you think COVID isn't still around: https://www.cdc.gov/nwss/rv/COVID19-national-data.html
* one might call this strategy forced vaccination with a known dangerous live vaccine strain lol
It's possible to understand both exponential and limiting behavior at the same time. I work in an office full of scientists. Our team scrammed the workplace on March 10, 2020.
To the scientists, it was intuitively obvious that the curve could not surpass 100% of the population. An exponential curve with no turning point is almost always seen as a sure sign that something is wrong with your model. But we didn't have a clue as to the actual limit, and any putative limit below 100% would need a justification, which we didn't have, or some dramatic change to the fundamental conditions, which we couldn't guess.
The typical practice is to watch the curve for any sign of a departure from exponential behavior, and then say: "I told you so." ;-)
The first change may have been social isolation. In fact that was pretty much the only arrow in our quivers. The second change was the vaccine, which changed both the infection rate and the mortality rate, dramatically.
I'm curious as to whether the consensus is that the observed behaviour of COVID waves was ever fully and satisfactorily explained - the tend to grow exponentially but then seemingly saturate at a much lower point than a naïve look at the curve might suggest?
To those interested in numbers it was explained early - even on TV. Anyone interested saw that it was going like a seasonal flue wave. Numbers were following strict mathematics. My area was early - the numbers peaked right before people started to go crazy - the rest was censorship - There was a lot of fakery going on by using very soft numbers. Very often they used reporting date instead of infection date.. and some numbers were delayed 9 months... So most curves out there were seriously flawed. But if you were really interested you could see real epidemiological curves - but you had to do real work to find the numbers. Strict mathematics of a seasonal virus was something people didn't want to see - and this is still the consensus...
This is easily disproven by looking at all-cause mortality. E.g. https://www.cdc.gov/mmwr/volumes/71/wr/figures/mm7150a3-F2.g...
Did that look like normal seasonal deaths? It's even more stark if you look specifically at the harder hit areas.
Well, the shapes look very seasonal... Do you know something about epidemiological curves?!
The wave 2020 in Europe was often smaller than 2018. And the data was perfectly seasonal. If you know people working in nursing homes and hospitals, you can ask them what happened later in 2021...
I heard a lot of stories - from first hand... They parked old ladies in the cold in front of open windows for fresh air - until they were blue... They vaccinated old people right into an ongoing wave and of course they had more problems caused from a wrongly trained vulnerable immune system - sane doctors don't vaccinate into an ongoing wave. What was going on in hospitals and nursing homes was a crime for money. Just ask the people that were there. A combat medic I know that now works in a hospital called 2021 a crime.
And still - solid Epidemiological data - wherever you could find it - was still perfectly seasonal. You could see some perfect mathematical curves. Just very high because they actively killed people. Even pupils in school spent all day in front of open windows in the cold... To remain healthy... How stupid is that...
Not all places are equal, but I've taken a look at German all cause mortality. 2020 was not special. In 2021 it started rising synchronous with vaccinations.
This repeatedly confuses correlation and causation. The shape is seasonal - of what relevance is the shape? Why shouldn't we expect there to be a seasonal component of an airborne virus?
Do you see that the all-cause mortality rate is 50-100% higher than prior years? I'm not going to try to suss it out in German but the same pattern holds in the UK: https://assets.publishing.service.gov.uk/government/uploads/....
Similarly, to say "deaths increased when vaccines happened" is the most clear illustration. Why did the vaccines exist? Could that be related to the mortality increase? You can see charts here for Switzerland, US, UK: https://science.feedback.org/review/misleading-instagram-pos...
The shape is relevant if you want to evaluate measures.
If you can get your hands on some good data you'll find perfect mathematical seasonal functions. This is a serious criterion to exclude any measures from having any influence on the curve. It was just the seasonal thing happening. The data proves that measures were all useless - you could have worn any fancy hat for government measures instead. There are no trend changes in seasonal data you can corrolate to measures. The only trend changes you can find are in the reporting data. There's a decrease in reporting delay before a measure and there's a lot of reporting delay after the measure. Accidentally or intentionally reporting delay tried to make government measure look good.
For vaccines I know 3 cases where people died and 2 who have serious health problems after vaccines. There is a reason, why there's no good official data on vaccine efficency - and why all placebo groups were killed as soon as possible.
Why did vaccines exists? The answer is simpler: Because of Money!
It would probably be hard to do. The really huge factor may be easier to study, since we know where and when every vaccine dose was administered. The behavioral factors are likely to be harder to measure, and would have been masked by the larger effect of vaccination. We don't really know the extent of social isolation over geography, demographics, time, etc..
There's human behavioural factors yes, but I was kinda wondering about the virus itself, the R number seemed to fluctuate quite a bit, with waves peaking fast and early and then receding equally quickly.. I know there were some ideas around asymptomatic spread and superspreaders (both people with highly connected social graphs, and people shedding far more active virus than the median), I just wondered whether anyone had built a model that was considered to have accurately reproduced the observed behaviour of number of positive tests and symptomatic cases, and the way waves would seemingly saturate after infecting a few % of the population.
[dead]
> By the end of 2027, models will frequently outperform experts on many tasks.
In passing the quiz-es
> Models will be able to autonomously work for full days (8 working hours) by mid-2026.
Who will carry responsibility for the consequences of these model's errors? What tools will be avaiable to that resposible _person_?
--
Tehchno optimists will be optimistic. Techno pessimists will be pessimistic.
Processes we're discussing have their own limiting factors which no one mentiones. Why to mention what exactly makes graph go up and holds it from going exponential? Why to mention or discuss inherit limitations of the LLMs architecture? Or what is legal perspective on AI agency?
Thus we're discussing results of AI models passing tests and people's perception of other people opinions.
You don't actually need to have a "responsible person"; you can just have an AI do stuff. It might make a mistake; the only difference between that and an employee is that you can't punish an AI. If you're any good at management and not a psychopath, the ability to have someone to punish for mistakes isn't actually important
The importance of having a human be responsible is about alignment. We have a fundamental belief that human beings are comprehensible and have goals that are not completely opaque. That is not true of any piece of software. In the case of deterministic software, you can’t argue with a bug. It doesn’t matter how many times you tell it that no, that’s not what either the company or the user intended, the result will be the same.
With an AI, the problem is more subtle. The AI may absolutely be able to understand what you’re saying, and may not care at all, because its goals are not your goals, and you can’t tell what its goals are. Having a human be responsible bypasses that. The point is not to punish the AI, the point is to have a hope to stop it from doing things that are harmful.
I will worry when I see Startups competing on products with companies 10x, 100x, or 1000x times their size. Like a small team producing a Photoshop replacement. So far I haven't seen anything like that. Big companies don't seem to be launching new products faster either, or fixing some of their products that have been broken for a long time (MS teams...)
AI obviously makes some easy things much faster, maybe helps with boilerplate, we still have to see this translate into real productivity.
I think the real turning point is when there isn’t the need for something like photoshop. Creatives that I speak to yearn for the day when they can stop paying the adobe tax.
There will always be an adobe tax so to speak. Creatives want high quality and reliable tools to be able to produce high quality things.
I could imagine a world where a small team + AI creates an open source tool that is better than current day Photoshop. However if that small team has that power, so does adobe, and what we perceive as "good" or "high quality" will shift.
If they don’t like it, they can stop now. It may have consequences, however.
Exponential curves happen when a quantity's growth rate is a linear function of its own value. In practice they're all going to be logistic, but you can ignore that as long as you're far away from the cap of whatever factor limits growth.
So what are the things that could cause "AI growth" (for some suitable definition of it) to be correlated with AI? The plausible ones I see are: - growing AI capabilities spur additional AI capex - AI could be used to develop better AIs
The first one rings true, but is most definitely hitting the limit since US capex into the sector definitely cannot grow 100-fold (and probably cannot grow 4-fold either).
The second one is, to my knowledge, not really a thing.
So unless AI can start improving itself or there is a self-feeding mechanism that I have missed, we're near the logistic fun phase.
It's interesting that he brings up the example of "exponential" growth in the case of COVID infections even though it was actually logistic growth[1] that saturates once resources get exhausted. What makes AI different?
[1] https://en.wikipedia.org/wiki/Logistic_function#Modeling_ear...
> Again we can observe a similar trend, with the latest GPT-5 already astonishingly close to human performance:
I have issues with "human performance" as single data point in times where education keeps to excel in some countries and degrades in others.
How far away are we from saying, better than "X percent of humans" ?
This reminds me -- very tenuously -- of how the shorthand for very good performance in the Python community is "like C". In the C community, we know that programs have different performance depending on algorithms chosen..
> In the C community, we know that programs have different performance depending on algorithms chosen..
Yes. Only the C community knows this. What a silly remark.
Regarding the "Python community" remark, benchmarks against C and Fortran go back decades now. It's not just a Python thing. C people push it a lot, too.
Nah, that part is ok. Human wherever you set it, human competence takes decades to really change, and those things have visible changes ever year or so.
The problem with all of the article's metrics is that they are all absolutely bullshit. It just throws claims like that AI can write full programs 50% of the time by itself in there and moves on like if it had any resemblance to what happens on the real world.
A lot of this post relies on the recent open ai result they call GDPval (link below). They note some limitations (lack of iteration in the tasks and others) which are key complaints and possibly fundamental limitations of current models.
But more interesting is the 50% win rate stat that represents expert human performance in the paper.
That seems absurdly low, most employees don’t have a 50% success rate on self contained tasks that take ~1 day of work. That means at least one of a few things could be true:
1. The tasks aren’t defined in a way that makes real world sense
2. The tasks require iteration, which wasn’t tested, for real world success (as many tasks do)
I think while interesting and a very worthy research avenue, this paper is only the first in a still early area of understanding how AI will affect with the real world, and it’s hard to project well from this one paper.
https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf1...
That's not 50% success rate at completing the task, that's the win rate of a head-to-head comparison of an algorithm and an expert. 50% means the expert and the algorithm each "win" half the time.
For the METR rating (first half of the article), it is indeed 50% success rate at completing the task. The win rate only applies to the GDPval rating (second half of the article).
You'd think that boosters for a technology whose very foundations rely on the sigmoid and tanh functions used as neuron activation functions would intuitively get this...
It's all relu these days
When people want a smooth function so they can do calculus they often use something like gelu or the swish function rather than relu. And the swish function involves a sigmoid. https://en.wikipedia.org/wiki/Swish_function
The gated variants of these functions have been dominant for a few years.
Most LLMs use GeGLU or SwiGLU.
“Guy who personally benefits from AI hype says we aren’t in a bubble” - don’t we have enough of these already ??
(I'm the one who posted the url, not author post)
Julian Schrittwieser (author of this post) has been in AI for a long time, he was in the core team who worked on AlphaGo, AlphaZero and MuZero at DeepMind, you can see him in the AlphaGo movie. While it doesn't make his opinion automatically true, I think it makes it worth considering, especially since he's a technical person, not a CEO trying to raise money
"extrapolating an exponential" seems dubious, but I think the point is more that there is no clear sign of slowing down in models capabilities from the benchmarks, so we can still expect improvements
Benchmarks are notoriously easy to fake. Also he doesn’t need to be a CEO trying to raise money in order to have an incentive here to push this agenda / narrative. He has a huge stock grant from Anthropic that will go to $0 when the bubble pops
"Models will be able to autonomously work for full days (8 working hours)" does not make them equivalent to a human employee. My employees go home and come back retaining context from the previous day; they get smarter every month. With Claude Code I have to reset the context between bite-sized tasks.
To replace humans in my workplace, LLMs need some equivalent of neuroplasticity. Maybe it's possible, but it would require some sort of shift in the approach that may or may not be coming.
Maybe when we get updating models. Right now, they are trained, and released, and we are using that static model with a context window. At some point when we have enough processing to have models that are always updating, then that would be plastic. I'm supposing.
I am flabbergasted by the naivety around predicting the future. While we have hints and suggestions, our predictions are best expressed as ranges of possibilities with varying weights. The hyperbolic among us like to pretend that predictions come in the form of precise lines of predetermined direction and curve; how foolish!
Predicting exponential growth is exceptionally difficult. Asymptotes are ordinary, and they often are not obvious until circumstances make them appear (in other words, they are commonly unpredictable).
(I do agree with the author regarding the potential of LLM's remaining underestimated by much of the public, however I cannot hang around such abysmal reasoning.)
> I am flabbergasted by the naivety around predicting the future. While we have hints and suggestions, our predictions are best expressed as ranges of possibilities with varying weights. The hyperbolic among us like to pretend that predictions come in the form of precise lines of predetermined direction and curve; how foolish!
I dont see why the latter is any more foolish than the former.
It’s a matter of correctness and utility. You can improve your odds of correctness (and thus usefulness) by adjusting the scope of your projection.
This applies not only to predicting the future. Consider measuring something: you carefully choose your level of precision for practical reasons. Consider goal setting: you leave abundant room for variation because your goal is not expressed in hyper narrow terms, but you don’t leave it so loose that you don’t know what steps to take.
When expressed in sufficiently narrow terms, no one will ever predict anything. When expressed in sufficiently broad terms, everyone can predict everything. So the point is to modulate the scope until attaining utility.
> When just a few years ago, having AI do these things was complete science fiction!
This is only because these projects only became consumer facing fairly recently. There was a lot of incremental progress in the academic language model space leading up to this. It wasn't as sudden as this makes it sound.
The deeper issue is that this future-looking analysis goes no deeper than drawing a line connecting a few points. COVID is a really interesting comparison, because in epidemiology the exponential model comes from us understanding disease transmission. It is also not actually exponential, as the population becomes saturated the transmission rate slows (it is worth noting that unbounded exponential growth doesn't really seem to exist in nature). Drawing an exponential line like this doesn't really add anything interesting. When you do a regression you need to pick the model that best represents your system.
This is made even worse because this uses benchmarks and coming up with good benchmarks is actually an important part of the AI problem. AI is really good at improving things we can measure so it makes total sense that it will crush any benchmark we throw at it eventually, but there will always be some difference between benchmarks and reality. I would argue that as you are trying to benchmark more subtle things it becomes much harder to make a benchmark. This is just a conjecture on my end but if something like this is possible it means you need to rule it out when modeling AI progress.
There are also economic incentives to always declare percent increases in progress at a regular schedule.
Will AI ever get this advanced? Maybe, maybe even as fast as the author says, but this just isn't a compelling case for it.
Aside from the S-versus-exp issue, this area is one of these things where there's a kind of disconnect between my personal professional experience with LLMs and the criteria measures he's talking about. LLMs to me have this kind of superficially impressive feel where it seems impressive in its capabilities, but where, when it fails, it fails dramatically, in a way humans never would, and it never gets anywhere near what's necessary to actually be helpful on finishing tasks, beyond being some kind of gestalt template or prototype.
I feel as if there needs to be a lot more scrutiny on the types of evaluation tasks being provided — whether they are actually representative of real-world demands, or if they are making them easy to look good, and also more focus on the types of failures. Looking through some of the evaluation tasks he links to I'm more familiar with, they seem kind of basic? So not achieving parity with human performance is more significant than it seems. I also wonder, in some kind of maxmin sense, whether we need to start focusing more on worst-case failure performance rather than best-case goal performance.
LLMs are really amazing in some sense, and maybe this essay makes some points that are important to keep in mind as possibilities, but my general impression after reading it is it's kind of missing the core substance of AI bubble claims at the moment.
Wow, an exponential trendline, I guess billions of years of evolution can just give up and go home cause we have rigged the game my friends. At this rate we will create an AI which can do a task 10 years long! And then soon after that 100 years long! And that's that. Humans will be kept as pets because that's all we will be good for QED
Failing to understand the sigmoid, again
The 50% success rate is the problem. It means you can’t reliably automate tasks unattended. That seems to be where it becomes non-exponential. It’s like having cars that go twice as far as the last year but will only get you to your destination 50% of the time.
> It’s like having cars that go twice as far as the last year but will only get you to your destination 50% of the time
Nice analogy. All human progress is based on tight-abstractions describing a well-defined machine model. Leaky abstractions with an undefined machine are useful too but only as recommendations or for communication. It is harder to build on top of them. Precisely why programming in english is a non-starter - or - just using english in math/science instead of formalism.
I think the author of this blog is not a heavy user of AI in real life. If you are, you know there things AI is very good at, and thing AI is bad at. AI may see exponential improvements in some aspects, but not in other aspects. In the end, those "laggard" aspects of AI will put a ceiling on its real-world performance.
I use AI in my coding for many hours each day. AI is great. But AI will not replace me in 2026 or in 2027. I have to admit I can't make projections many years in the future, because the pace of progress in AI is indeed breathtaking. But, while I am really bullish on AI, I am skeptical of claims that AI will be able to fully replace a human any time soon.
How much better is AI-assisted coding than it was in September 2023?
This all depends on how you define "better".
I am an amateur programmer and tried to port a python 2.7 library to python 3 with GPT5 a few weeks ago.
After a few tries, I realized both myself and the model missed that a large part of the library is based on another library that was never ported to 3 either.
That doesn't stop GPT5 from trying to write the code as best it can with a library that doesn't exist for python 3.
That is the part we have made absolutely no progress on.
Of course, it can do a much better react crud app than in Sept 2023.
In one sense, LLMs are so amazing and impressive and quite fugazi in another sense.
The author is an AI researcher at Anthropic: https://www.julian.ac/about/
He likely has his substantial experience using AI in real life (particularly when it comes to coding).
>they somehow jump to the conclusion that AI will never be able to do these tasks at human level
I don’t see that, I mostly see AI criticism that it’s not up to the hype, today. I think most people know it will approach human ability, we just don’t believe the hype that it will be here tomorrow.
I’ve lived through enough AI winter in the past to know that the problem is hard, progress is real and steady, but we could see a big contraction in AI spending in a few years if the bets don’t pay off well in the near term.
The money going into AI right now is huge, but it carries real risks because people want returns on that investment soon, not down the road eventually.
> Instead, even a relatively conservative extrapolation of these trends suggests that 2026 will be a pivotal year for the widespread integration of AI into the economy
Integration into the economy takes time and investment. Unfortunately, ai applications dont have an easy adoption curve - except for the chatbot. Every other use case requires an expensive and risky integration into an existing workflow.
> By the end of 2027, models will frequently outperform experts on many tasks
fixed tasks like tests - maybe. But, the real world is not a fixed model. It requires constant learning through feedback.
And today, COVID has infected 5000 quadrillion people!
Many of the "people don't understand Exponential functions" posts are ultimately about people not understanding logistic functions. Because most things in reality that seemingly grow exponentially will eventually, unevitably taper off at some point when the cost for continued growth gets so high, accelerated growth can't be supported anymore.
Viruses can only infect so many people for example. If the growth was truly exponential you would need infinite people for it to be truly exponential.
> Again we can observe a similar trend, with the latest GPT-5 already astonishingly close to human performance:
Yes but only if you measure "performance" as "better than the other option more than 50% of the time" which is a terrible way to measure performance, especially for bullshitting AI.
Imagine comparing chocolate brands. One is tastier than the other one 60% of the time. Clear winner right? Yeah except it's also deadly poisonous 5% of the time. Still tastier on average though!
Failing to Understand Sigmoid functions, again?
> Instead, even a relatively conservative extrapolation of these trends suggests that 2026 will be a pivotal year for the widespread integration of AI into the economy:
> Models will be able to autonomously work for full days (8 working hours) by mid-2026. At least one model will match the performance of human experts across many industries before the end of 2026.
> By the end of 2027, models will frequently outperform experts on many tasks.
First commandment of tech hype: the pivotal, groundbreaking singularity is always just 1-2 years away.
I mean seriously, why is that? Even when people like OP try to be principled and use seemingly objective evaluation data, they find that the BIG big thing is 1-2 years away.
Self driving cars? 1-2 years away.
AR glasses replacing phones? 1-2 years away.
All of us living our life in the metaverse? 1-2 years away.
Again, I have to commend OP on putting in the work with the serious graphs, but there’s something more at play here.
Is it purely a matter of data cherry picking? Is it the unknowns unknowns leading to the data driven approaches being completely blind to their medium/long term limitations?
Many people seem to assert that "constant relative growth in capabilities/sales/whatever" is a totally reasonable (or even obvious or inevitable) prior assumption, and then point to "OMG relative growth produces an exponential curve!" as the rest of their argument. And at least the AI 2027 people tried to one-up that by asserting an increasing relative growth rate to produce a superexponential curve.
I'd be a fool to say that we'll ever hit a hard plateau in AI capabilities, but I'll have a hard time believing any projected exponential-growth-to-infinity until I see it with my own eyes.
Self driving cars have existed for at least a year now. It only took a decade of “1 years away” but it exists now, and will likely require another decade of scaling up the hardware.
I think AGI is going to follow a similar trend. A decade of being “1 years away”. Meanwhile, unlike self driving the industry is preemptively solving the scaling up of hardware concurrently.
Because I need to specify an amount of time short enough that big investors will hand over a lot of money, long enough that I can extract a big chunk of it for myself before it all comes crashing down.
A couple of years is probably a bit tight, really, but I'm competing for that cash with other people so the timeframe we make up is going to about the lowest we think we can get away with.
Where in nature/reality do we actually see exponential trends continued long? It seems like they typically encounter a governing effect quite quickly.
The issue is quite quickly can be really slow for humans. Moore’s law has been brought up multiple times.
I feel like there should be some take away from the fact that we have to come up with new and interesting metrics like “Length of a Task That Can Be Automated” in order to point out that exponential growth is still happening. Fwiw, it does seem like a good metric, but it also feels like you can often find some metric that’s improving exponentially even when the base function is leveling out.
From Nassim Taleb
"Unless you have confidence in the ruler’s reliability, if you use a ruler to measure a table you may also be using the table to measure the ruler."
Seems like that is exactly what we are doing.
It's the only benchmark I know of with a well-behaved scale. Benchmarks with for example a score from 0-100% get saturated quite quickly, and further improvements on the metric are literally impossible. And even excluding saturation, they just behave very oddly at the extremes. To use them to show long term exponential growth you need to chain together benchmarks, which is hard to make look credible.
The sentiment of the comments here seems rather pessimistic. A perspective that balances both sides might be that the rate of mass adoption of some technology often lags behind the frontier capabilities, so I wouldn’t expect AI to take over a majority of those jobs in GPDval in a couple of years, but it’ll probably happen eventually.
There are still fundamental limitations in both the model and products using the model that restrict what AI is capable of, so it’s simultaneously true that AI can do cutting edge work in certain domains for hours while vastly underperforming in other domains for very small tasks. The trajectory of improvement of AI capabilities is also an unknown, where it’s easy to overestimate exponential trends due to unexpected issues arising but also easy to underestimate future innovations.
I don’t see the trajectory slowing down just yet with more compute and larger models being used, and I can imagine AI agents will increasingly give their data to further improve larger models.
This doesn't feel at all credible because we're already well into the sigmoid part of the curve. I thought the gpt5 thing made it pretty obvious to everyone.
I'm bullish on AI, I don't think we've even begun to understand the product implications, but the "large language models are in context learners" phase has for now basically played out.
AI company employee whose livelihood depends on people continuing to pump money into AI writes a blog post trying to convince people to keep pumping more money into AI. Seems solid.
The "exponential" metric/study they include is pretty atrocious. Measuring AI capability by how long humans would take to do the task. By that definition existing computers are already super AGI - how long would it take humans to sort a list of a million numbers? Computers can do it in a fraction of a second. I guess that proves they're already AGI, right? You could probably fit an exponential curve to that as well, before LLMs even existed.
> Given consistent trends of exponential performance improvements over many years and across many industries, it would be extremely surprising if these improvements suddenly stopped.
The difference between exponential and sigmoid is often a surprise to the believers, indeed.
The model (of the world) is not the world.
Just because the model fits so far does not mean it will continue to fit.
These takes (both bears and bulls) are all misguided.
AI agents' performance depends heavily on the context / data / environment provided, and how that fits into the overall business process.
Thus, "agent performance" itself will be very unevenly distributed.
I'm less concerned about "parity with industry expert" and more concerned about "Error/hallucination rate compared to industry expert".
Without some guarantee of correctness, just posting the # of wins seems vacuous.
Somewhat missed by many comments proclaiming that it’s sigmoidal is that sigmoid curves exhibit significant growth after it stops looking exponential. Unless you think things have already hit a dramatic wall you should probably assume further growth.
We should probably expect compute to get cheaper at the same time, so that’s performance increases with lowering costs. Even after things flatline for performance you would expect lowering costs of inference.
Without specific evidence it’s also unlikely you randomly pick the point on a sigmoid where things change.
To the people who claim that we’re running out of data, I would just say: the world is largely undigitized. The Internet digitized a bunch of words but not even a tiny fraction of all that humans express every day. Same goes for sound in general. CCTV captures a lot of images, far more than social media, but it is poorly processed and also just a fraction of the photons bouncing off objects on earth. The data part of this equation has room to grow.
"data" in the abstract is not useful, it has to contain useful stuff in it.
There’s no exponential improvement in go or chess agents, or car driving agents. Even tiny mouse racing.
If there is, it would be such nice low hanging fruit.
Maybe all of that happens all at once.
I’d just be honest and say most of it is completely fuzzy tinkering disguised as intellectual activity (yes, some of it is actual intellectual activity and yes we should continue tinkering)
There are rare individuals that spent decades building up good intuition and even that does not help much.
>> "Train adversarially robust image model".
Should be easy to check a couple years down the line.
This extrapolates based on a good set of data points to predict when AI will reach significant milestones like being able to “work on tasks for a full 8 hours” (estimates by 2026). Which is ok - but it bears keeping https://xkcd.com/605/ in mind when doing extrapolation.
Well, the article is not off to a good start. COVID-19 is modeled by an SIR dynamical system, which is at times, is approximately exponential.
At times.
On top of other criticism here, I'd like to add that the article optimistically assumes that actors are completely honest with their benchmarks when billions of dollars and national security are at stake.
I'm only an "expert" in computer science and software engineering, and can say that - neither of widely available LLMs can produce answers at the level of first year CS student; - students using LLMs can easily be distingished by being wrong in all the ways a human would otherwise never be.
So to me it's not really the question of whether CS-related benchmarks are false, it's a question of how exactly did this BS even fly.
Obviously in other disciplines LLMs show similar lack of performance, but I can't call myself an "expert" there, and someone might argue I tend to use wrong prompts.
Until we see a website where we can put an intermediate problem and get a working solution, "benchmarks show that our AI solves problems on gold medalist level" will still be an obvious BS.
So the author is in a clear conflict of interest with the contents of the blog because he's an employee of Anthropic. But regarding this "blog", showing the graph where OpenAI compares "frontier" models and shows gpt-4o vs o3-high is just disingenuous, o1 vs o3 would have been a closer fight between "frontier" models. Also today I learned that there are people paid to benchmark AI models in terms of how close they are to "human" level, apparently even "expert" level whatever that means. I'm not a LLM hater by any means, but I can confidently say that they aren't experts in any fields.
117 comments so far, and the word economics does not appear.
Any technology which produces more results for more inputs but does not get more efficient at larger scale runs into a money problem if it does not get hit by a physics problem first.
It is quite possible that we have already hit the money problem.
Even if the computational power evolve exponentially, we need to evaluate the utility of additional computations. And if the utility happens to increase logarithmically with computation spend, it's possible that in the end, we will observe just a linear increase in utility.
I didn't plot it, but I had the impression the Aider benchmark success rates for SOTA over time were a hockey curve.
Like the improvements between 60 and 70 felt much faster than those between 80 and 90.
I don't think I have ever seen a page on HN where so many people missed the main point.
The phenomenon of people having trouble understanding the implications of exponential progress is really well known. Well known, I think, by many people here.
And yet an alarming number of comments here interpret small pauses as serious trend breakers. False assumptions that we are anywhere near the limits of computing power relative to fundamental physics limits. Etc.
Recent progress, which is unprecedented in speed looking backward, is dismissed because people have acclimatized to change so quickly.
The title of the article "Failing to Understand the Exponential, Again" is far more apt than I could have imagined, on HN.
See my other comments here for specific arguments. See lots of comments here for examples of those who are skeptical of a strong inevitability here.
The "information revolution" started the first time design information was separated from the thing it could construct. I.e. the first DNA or perhaps RNA life. And it has unrelentingly accelerated from there for over 4.5 billion years.
The known physics limits of computation per gram are astronomical. We are nowhere near any hard limit. And that is before any speculation of what could be done with the components of spacetime fragments we don't understand yet. Or physics beyond that.
The information revolution has hardly begun.
With all humor, this was the last place I expected people to not understand how different information technology progresses vs. any other kind. Or to revert to linear based arguments, in an exponentially relevant situation.
If there is any S-curve for information technology in general, it won't be apparent until long after humans are a distant memory.
I'm a little surprised too. A lot of the arguments are along the lines of but LLMs aren't very good. But really LLMs are a brief phase in the information revolution you mention that will be superseded.
To me saying we won't get AGI because LLMs aren't suitable is like saying we were not going to get powered flight because steam engines weren't suitable. Fair enough they weren't but they got modified into internal combustion engines and then were. Something like that will happen.
It should be noted that the article author is an AI researcher at Anthropic and therefore benefits financially from the bubble: https://www.julian.ac/about/
> The current discourse around AI progress and a supposed “bubble” reminds me a lot of the early weeks of the Covid-19 pandemic. Long after the timing and scale of the coming global pandemic was obvious from extrapolating the exponential trends, politicians, journalists and most public commentators kept treating it as a remote possibility or a localized phenomenon.
That's not what I remember. On the contrary, I remember widespread panic. (For some reason, people thought the world was going to run out of toilet paper, which became a self-fulfilling prophesy.) Of course some people were in denial, especially some politicians, though that had everything to do with politics and nothing to do with math and science.
In any case, the public spread of infectious diseases is a relatively well understood phenomenon. I don't see the analogy with some new tech, although the public spread of hype is also a relatively well understood phenomenon.
OP failing to understand S-curves again...
I think the first comment on the article put it best: With COVID, researchers could be certain that exponential growth was taking place because they knew the underlying mechanisms of the growth. The virus was self-replicating, so the more people were already infected, the faster would new infections happen.
(Even this dynamic would only go on for a certain time and eventual slow down, forming an S-curve, when the virus could not find any more vulnerable persons to continue the rate of spread. The critical question was of course if this would happen because everyone was vaccinated or isolated enough to prevent infection - or because everyone was already infected or dead)
With AI, there is no such underlying mechanism. There is the dream of the "self-improving AI" where either humans can make use of the current-generation AI to develop the next-generation AI in a fraction of the time - or where the AI simply creates the next generation on its own.
If this dream were reality, it could be genuine exponential growth, but from all I know, it isn't. Coding agents speed up a number of bespoke programming tasks, but they do not exponentially speed up development of new AI models. Yes, we can now quickly generate large corpora of synthetic training data and use them for distillation. We couldn't do that before - but a large part of the training data discussion is about the observation that synthetic data can not replace real data, so data collection remains a bottleneck.
There is one point where a feedback loop does happen, and this is with the hype curve: Initial models produced extremely impressive results compared to everything we had before - there caused an enormous hype and unlocked investments that allowed more resources for the developed of the next model - which then delivered even better results. But it's obvious that this kind of feedback loop will eventually end when no more additional capital is available and diminishing returns set in.
Then we will once again be in the upper part of the S-curve.
> - Models will be able to autonomously work for full days (8 working hours) by mid-2026. > - At least one model will match the performance of human experts across many industries before the end of 2026. > - By the end of 2027, models will frequently outperform experts on many tasks.
I’ve seen a lot of people make predictions like this and it will be interesting to see how this turns out. But my question is, what should happen to a person’s credibility if their prediction turns out to be wrong? Should the person lose credibility for future predictions and we no longer take them seriously? Or is that way too harsh? Should there be reputational consequences for making bad predictions? I guess this more of a general question, not strictly AI-related.
> Should the person lose credibility for future predictions and we no longer take them seriously
If this were the case, almost every sell-side analyst should have been blacklisted by now. Its more about entertainment than facts - sort of like astrology.
All of those never-ending exponential graphs about Covid were wrong though.
Another 'numbrr go up' analyst. Yes, models are objectively better at tasks. Please include the fact that hundreds of billions of dollars are being poured into making them better. You could even call it a technology race. Once the money avalanche runs it's course, I and many others expect 'the exponential' to be followed by an implosion or correction in growth. Data and training is not what LLMs crave. Piles of cash is what LLMs crave.
Good article, the METR metric is very interesting. See also Leopold Aschenbrenner's work in the same vein:
https://situational-awareness.ai/from-gpt-4-to-agi/
IMO this approach ultimately asks the wrong question. Every exponential trend in history has eventually flattened out. Every. single. one. Two rabbits would create a population with a mass greater than the Earth in a couple of years if that trend continues indefinitely. The left hand side of a sigmoid curve looks exactly like exponential growth to the naked eye... until it nears the inflection point at t=0. The two curves can't be distinguished when you only have noisy data from t<0.
A better question is, "When will the curve flatten out?" and that can only be addressed by looking outside the dataset for which constraints will eventually make growth impossible. For example, for Moore's law, we could examine as the quantum limits on how small a single transistor can be. You have to analyze the context, not just do the line fitting exercise.
The only really interesting question in the long term is if it will level off at a level near, below, or above human intelligence. It doesn't matter much if that takes five years or fifty. Simply looking at lines that are currently going up and extending them off the right side of the page doesn't really get us any closer to answering that. We have to look at the fundamental constraints of our understanding and algorithms, independent of hardware. For example, hallucinations may be unsolvable with the current approach and require a genuine paradigm shift to solve, and paradigm shifts don't show up on trend lines, more or less by definition.
There are no exponentials in nature. Everything is finite.
I am constantly astonished that articles like this even pass the smell test. It is not rational to predict exponential growth just because you've seen exponential growth before! Incidentally, that is not what people did during COVID, they predicted exponential growth for reasons. Specific, articulable reasons, that consisted of more than just "look, like go up. line go up more?".
Incidentally, the benchmarks quoted are extremely dubious. They do not even really make sense. "The length of tasks AI can do is doubling every 7 months". Seriously, what does that mean? If the AI suddenly took double the time to answer the same question, that would not be progress. Indeed, that isn't what they did, they just... picked some times at random? You might counter that these are actually human completion times, but then why are we comparing such distinct and unrelated tasks as "count words in a passage" (trivial, any child can do) and "train adversarially robust image model" (expert-level task, could take anywhere between an hour and never-complete).
Honestly, the most hilarious line in the article is probably this one:
> You might object that this plot looks like it might be levelling off, but this is probably mostly an artefact of GPT-5 being very consumer-focused.
This is a plot with three points in it! You might as well be looking at tea leaves!
> but then why are we comparing such distinct and unrelated tasks as ...
Because a few years ago the LLMs could only do trivial tasks that a child could do, and now they're able to do complex research and software development tasks.
If you just have the trivial tasks, the benchmark is saturated within a year. If you just have the very complex tasks, the benchmark is has no sensitivity at all for years (just everything scoring a 0) and then abruptly becomes useful for a brief moment.
This seems pretty obvious, and I can't figure out what your actual concern is. You're just implying it is a flawed design without pointing out anything concrete.
The key word is "unrelated"! Being able to count the number of words in a paragraph and being able to train an image classifier are so different as to be unrelated for all practical purposes. The assumption underlying this kind of a "benchmark" is that all tasks have a certain attribute called complexity which is a numerical value we can use to discriminate tasks, presumably so that if you can complete tasks up to a certain "complexity" then you can complete all other tasks of lower complexity. No such attribute exists! I am sure there are "4 hour" tasks an LLM can do and "5 second" tasks that no LLM can do.
The underlying frustration here is that there is so much latitude possible in choosing which tasks to test, which ones to present, and how to quantify "success" that the metrics given are completely meaningless, and do not help anyone to make a prediction. I would bet my entire life savings that by the time the hype bubble bursts, we will still have 10 brainless articles per day coming out saying AGI is round the corner.
Well put, the metric is cherry picked to further the narrative.
> The length of tasks AI can do is doubling every 7 months
The claim is "At time t0, an AI can solve a task that would take a human 2 minutes. At time t0+dt, they can solve 4-minutes tasks. At time t0+2dt, it's 8 minutes" and so on.
I still find these claims extremely dubious, just wanted to clarify.
Yes, I get that, I did allow for it in my original comment. I remain convinced this is a gibberish metric - there is probably no such thing as "a task that would take a human 2 minutes", and certainly no such thing as "an AI that can do every task that would take a human 2 minutes".
"It’s Difficult to Make Predictions, Especially About the Future" - Yogi Berra. It's funny because it's true.
So if you want to try to do this difficult task, because say there's billions of dollars and millions of people's livelihoods on the line, how do you do it? Gather a bunch of data, and see if there's some trend? Then maybe it makes sense to extrapolate. Seems pretty reasonable to me. Definitely passes the sniff test. Not sure why you think "line go up more" is such a stupid concept.
It's a stupid concept because it's behind every ponzi scheme.
In Ponzi schemes the numbers are generally faked.
Isn't one of those scalers getting money from NVIDIA to buy NVIDIA cards which they use as collateral to borrow more money to buy more NVIDIA cards which NVIDIA put up as revenue which bumps the stock price which they invest into OpenAI which invests into Oracle which buys more NVIDIA cards?
Its not a Ponzi scheme, and I don't have a crystal ball to determine where supply and demand will best meet, but a lot seems to be riding on the promise of future demand selling at a premium.
I'm not yet ready to believe this is the thing that permanently breaks supply and demand. More compute demand is likely, but the state of the art: resale-users, providers, and suppliers, will all get hit with more competition.
The numbers are fake, but the returns are real, until they aren't. If all you go off is past performance then you will fall for any scam.
Pure Cope from a partner at Anthropic. However, I _do_ agree AI is comparable to COVID, but not in the way our author intends.
The exponential progress argument is frequently also misconstrued as a
>"we will get there by monotonously doing more of what we did previously"
Take the independent time being an SWE metric of the article. This is a rather new( metric for measuring AI capabilitie, it's also a good metric, it is directly measurable in a quantified way, unlike nebulous goal points such as "AGI/ASI".
It also doesn't necessarily predict any upheaval, which I also think is a good trait of a metric, we know it will be better when it hits 8, or 16 hours, but we can skip the hype and prophecies of civilizational transformation that are attached to terminology like "AGI/ASI".
Now the caveat is that a SWE-time metric is useful at the moment because it's an intra day timescale, but if we push this number to the point of comparing 48 hour vs 54 hour SWE-time models we can easily end up chasing abstractions that have little to no explanatory power as to how good this AI really is and what consists as a proper and good incremental improvement and what comes out as a numerical benchmark number that may or may not be artificial.
The same can be said of math-olympiad scores and many of the existing AI benchmarks.
In the past there existed a concept of narrow AI. We could take task A, make a narrow AI become good at it. But we would expect a different application to be needed for task B.
Now we have generalist AI, and we take the generalist AI and make it become good at task A because that is the flavor of the month metric, but maybe that doesn't translate for improving task B, which someone will come around to improving when that becomes flavor of the month.
The conclusion? There's probably no good singular metric to get stuck on and say
"this is it, this graph is the one, watch it go exponential and bring forth God"
We will instead skip, hop and jump between task-or-category specific metrics that are deemed significant at the moment and arms-race style pump them up until their relevance fades.
It's funny because the author doesn't realize that this sentence at the beginning undermines his entire argument:
> Or they see two consecutive model releases and don’t notice much difference in their conversations, and they conclude that AI is plateauing and scaling is over.
The reason why we now fail to notice the difference between consecutive models now is because the progress isn't in fact exponential. Humans tend to have a logarithmic perception, which means we only appreciate progress when it is exponential (for instance you'd be very happy to get a $500 raise if you are living the minimum wage, but you wouldn't even call that “a raise” when on a SV engineers salary).
AI models have been improving a ton for the past three years, in many direction, but the rate of progress is definitely not exponential. It's not emergent either, as the focus is now being specifically directed at solving specific problems (both riddles and real world problems) thanks to trillions of token of high quality synthetic data.
On topics that aren't explicitly being worked on, progress have been minimal or even negative (for instance many people still use the 1 year old Mistral Nemo for creative writing because the more recent ones all have been STEMmaxxed)
This guy isn’t even wrong. Sure these models are getting faster, but they are barely getter better at actual reasoning, if at all. Who cares if a model can give me a bullshit answer in five minutes instead of ten? It’s still bullshit.
Seems like the right place to ask with ML enthusiasts gathered in one place discussing curves and the things that bend them: what's the thing with potential to obsolete transformers and diffusion models? Is it something old people noticed once LLMs blew up? Something new? Something in-between?
> The evaluation tasks are sourced from experienced industry professionals (avg. 14 years' experience), 30 tasks per occupation for a total of 1320 tasks. Grading is performed by blinded comparison of human and model-generated solutions, allowing for both clear preferences and ties.
It's important to carefully scrutinize the tasks to understand they actually reflect tasks that are unique to industry professionals. I just looked quickly at the nursing ones (my wife is a nurse) and half of them were creating presentations, drafting reports, and the like, which is the primary strength of LLMs but a very small portion of nursing duties.
The computer programming tests are more straightforward. I'd take the other ones with a grain of salt for now.
Measuring "how long" an AI can work for seems bizarre to me.
Its a computer program. What does it even mean that soon it "will be able to work 8 hour days"?
"Failing to Understand the Sigmoid, Again"
[dead]
Failing to acknowledge we are in a bigger and more dangerous bubble, again.
If AI was so great, why all curl hackerone submissions have been rejected? Slop is not a substitute of skill.
Ah, employee of an AI company is telling us the technology he's working on and is directly financially interested in hyping will... grow forever and be amazing and exponential and take over the world. And everyone who doesn't believe this employee of AI company hyping AI is WRONG about basics of math.
I absolutely would NOT ever expect such a blog post.
/s.
That is a complete strawman - you made up forever growth and then argued against it. The OP is saying the in the short term, it makes more sense to assume exponential growth continues instead of thinking it will flatten out any moment now.