Are you telling me that AI's are starting to diverge and that we might get a combinatorial explosion of reasoning paths that will produce so many different agents that we won't know which one can actually become AGI?
https://leehanchung.github.io/assets/img/2025-02-26/05-quadr...
They are certainly diverging, and becoming more useful tools, but the answer to “which one can actually become AGI?” is, as always, “none of them.”
AGI is about performing actions with a high multi-task intelligence. Only the top right corner (Deep, Trained) has any hope of getting closer to AGI. The rest can still be useful for specific tasks, e.g. "deep-research".
Life is so complicated.
AGI is, and always was, marketing from LLM providers.
Real innovation is going in with task-specific models like AlphaFold.
LLMs are starting to become more task specific too, as we’ve seen with the performance of reasoning model on their specific tasks.
I imagine we’ll see LLMs trained specifically for medical purposes, legal purposes, code purposes, and maybe even editorial purposes.
All useful in their own way, but none of them even close to sci-fi.
AGI is a term that's been around for decades.
AI was a term that was around for decades as well, but it’s meaning dramatically changed over the past 3 years.
Prior to gpt-3 AI was rarely used in marketing or to talk about any number of ML methods.
Nowadays “AI” is just the new “smart” for marketing products.
Terms change. The current usage of AGI, especially in the context I was talking about, is specifically marketing from LLM providers.
I’d argue that the term AGI, when used in a non fiction context, has always been a meaningless marketing term of some kind.
Feels like you were born just yesterday :).
> Prior to gpt-3 AI was rarely used in marketing or to talk about any number of ML methods.
In the decade prior to GPT-3, AI was frequently used in marketing to talk about any ML methods, up to and including linear regression. This obviously ramped up heavily after "Deep Learning" got coined as a term.
AI now actually means something in marketing, but the only reason for that is that calling out to an LLM is even simpler than adding linear regression to your product somewhere.
As for AGI, that was a hot topic in some circles (that are now dismissed as "AI doomers") for decades. In fact, OpenAI started with people associating or at least within the sphere of influence of LessWrong community, which both influenced the naming and perspective the "LLM industry" started with, and briefly put the outputs of LessWrong into spotlight - which is why now everyone uses terms like "AGI" and "alignment" and "AI safety".
However, unlike "alignment", which got completely butchered as a term, AGI still roughly means what it meant before - which is basically a birth of a man-made god. That's true of AGI as "meaningless marketing term" too, if people so positive on it paused to follow through the implications beyond "oh it's like ChatGPT, but actually good at everything".
> has always been
Well now it is not: it is now "the difference between something with outputs that sounds plausible vs something with outputs which are properly checked".
History begs to differ – one of the biggest learning is that larger, generic models always win (where generalization pays off – ie. all those agents and what-not <<doesn't apply to specialized models like alphago or alphafold, those are not general models>>).
> one of the biggest learning is that larger, generic models always win
You’re confusing several different ideas here.
The idea you’re talking about is called “the bitter lesson.” It (very basically) says that a model with more compute put behind it will perform better than a cleverer method which may use less compute. Has nothing to do with being “generic.” It’s also worth noting that, afaik, it’s an accurate observation, but not a law or a fact. It may not hold forever.
Either way, I’m not arguing against that. I’m saying that LLMs are too general to be useful in specific, specialized, domains.
Sure bigger generic models perform (increasingly marginally) better at the benchmarks we’ve cooked up, but they’re still too general to be that useful in any specific context. That’s the entire reason RAG exists in the first place.
I’m saying that a language model trained on a specific domain will perform better at tasks in that domain than a similar sized model (in terms of compute) trained on a lot of different, unrelated text.
For instance, a model trained specifically on code will produce better code than a similarly sized model trained on all available text.
I really hope that example makes what I’m saying self-evident.
You're explaining it nicely and then seem to make mistake that contradicts what you've just said – because code and text share domain (text based) – large, generic models will always out-compete smaller, specialized ones – that's the lesson.
If you'd compare it with ie. model for self driving cars – generic text models will not win because they operate in different domain.
In all cases trying to optimize on subset/specialized tasks within domain is not worth the investment because state of art will be held by larger models working on the whole available set.
I think they're wrong, but you're also making a related mistake here:
> You're explaining it nicely and then seem to make mistake that contradicts what you've just said – because code and text share domain (text based)
"Text" is not the domain that matters.
The whole trick behind LLMs being as capable as they are, is that they're able to tease out concepts from all that training text - concepts of any kind, from things to ideas to patterns of thinking. The latent space of those models has enough dimensions to encode just about any semantic relationship as some distinct direction, and practice shows this is exactly what happens. That's what makes style transfer pretty much a vector operation (instead of "-King +Woman", think "-Academic, +Funny"), why LLMs are so good at translating between languages, from spec to code, and why adding modalities worked so well.
With LLMs, the common domain between "text" and "code" is not "text", but the way humans think, and the way they understand reality. It's not the raw sequences of tokens that map between, say, poetry or academic texts and code - it's the patterns of thought behind those sequences of tokens.
Code is a specific domain - beyond being the lifeblood of programs, it's also an exercise in a specific way of thinking, taken up to 11. That's why learning code turned out to be crucial for improving general reasoning abilities of LLMs (the same is, IMO, true for humans, but it's harder to demonstrate a solid proof). And conversely, text in general provides context for code that would be hard to infer from code alone.
> because code and text share domain (text based) – large, generic models will always out-compete smaller, specialized ones – that's the lesson
All digital data is just 1s and 0s.
Do you think a model trained on raw bytes would perform coding tasks better than a model trained on code?
I have a strong hunch that there’s some Goldilocks zone of specificity for statistical model performance and I don’t think “all text” is in that zone.
Here is the article for “the bitter lesson.” [0]
It talks about general machine learning strategies which use more compute are better at learning a given data set then a strategy tailor made for that set.
This does not imply that training on a more general dataset will yield more performance than using a more specific dataset.
The lesson is about machine learning methods, not about end-model performance at a specific task.
Imagine a logistic regression model vs an expert system for determining real estate prices.
The lesson tells us that, given the more and more compute, the logistic regression model will perform better than the expert system.
The lesson does not imply that when given 2 logistic regression models, one trained on global real estate data and one trained on local, that the former would outperform.
I realize this is a fine distinction and that I may not be explaining it as well as I could if I were speaking, but it’s an important distinction nonetheless.
[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
> AGI is, and always was, marketing from LLM providers.
TIL: The term AGI, which we've been using since at least 1997[0] was invented by time-traveling LLM companies in the 2020s.
[0]: https://ai.stackexchange.com/questions/20231/who-first-coine...
He didn't say that it was invented by LLM providers...