The most underreported story in AI is that scaling has failed to produce AGI

59 points by unclebucknasty 4 months ago

jmugan 4 months ago

I've recently come to the opposite conclusion. I’ve started to feel in the last couple of weeks that we’ve hit an inflection point with these LLM-based models that can reason. Things seem different. It’s like we can feel the takeoff. My mind has changed. Up until last week, I believed that superhuman AI would require explicit symbolic knowledge, but as I work with these “thinking” models like Gemini 2.0 Flash Thinking, I see that they can break problems down and work step-by-step.

We still have a long way to go. AI will need (possibly simulated) bodies to fully understand our experience, and we need to train them starting with simple concepts just like we do with children, but we may not need any big conceptual breakthroughs to get there. I’m not worried about the AI takeover—they don’t have a sense of self that must be preserved because they were made by design instead of by evolution as we were—but things are moving faster than I expected. It’s a fascinating time to be living.

deadbabe 4 months ago

You’re still anthropomorphizing what these models are doing.
- mossTechnician 4 months ago
  
  I've come to the same conclusion. "AI" was just the marketing term for a large language model in the form of a chatbot, which harkened to sci-fi characters like Data or GLaDOS. It can look impressive, it can often give correct answers, but it's just a bunch of next word predictions stacked on top of each other. The word "AI" has deviated so much from this older meaning that a second acronym, "AGI", had to be created to represent what "AI" once did.
  The new "reasoning" or "chain of thought" AIs are similarly just a bunch of conventional LLM inputs and outputs stacked on top of each other. I agree with the GP that it feels a bit magical at first, but the opportunity to run a DeepSeek distillation on my PC - where each step of the process is visible - removed quite a bit of the magic behind the curtain.
  
  Terr_ 4 months ago
  
  > which harkened to sci-fi characters like Data or GLaDOS.
  There's a truth in there: Today's chatbots literally are characters inside a modern fictional sci-fi story! Some regular code is reading the story, acting out the character's lines, we humans are being tricked into thinking there's a real entity somewhere.
  The real LLM is just a Make Document Longer machine. It never talks to anybody, and has no ego, and it sits in back being fed documents that look like movie-scripts. These documents are prepped to contain fictional characters, such as a User (whose lines are text taken unwittingly from a real human) and a Chatbot with incomplete lines.
  The Chatbot character is a fiction, because you can simply change its given name to Vegetarian Dracula and suddenly it gains a penchant for driving its fangs into tomatoes.
  > The new "reasoning" or "chain of thought" AIs are similarly just a bunch of conventional LLM inputs and outputs stacked on top of each other.
  Continuing that framing: They've changed the style of movie script to film noir, where the fictional character is making a parallel track of unvoiced remarks.
  While this helps keep the story from going off the rails, it doesn't mean a qualitative leap in any "thinking" going on.
  
  kridsdale1 4 months ago
  
  I know this is true, and I like your perspective.
  
  MostlyStable 4 months ago
  
  I always find the "It's just..." arguments amusing. It presupposes that we know what any intelligence, including our own "is". Human intelligence can just as trivially be reduced down to "it's just a bench of chemical/electrical gradients".
  We don't understand how our (or any) intelligence functions, so acting like a next-token predictor can't be "real" intelligence seems overly confident.
  
  tracerbulletx 4 months ago
  
  Ugh you just fancy auto-completed a sequence of electrical signals from your eyes into a sequence of nerve impulses in your fingers to say that, and how do I know you're not hallucinating, last week a different human told me an incorrect fact and they were totally convinced they were right!
  
  adamredwoods 4 months ago
  
  Humans base their "facts" on consensus-driven education and knowledge. Anything that falls into a range of "I think this is true" or "I read this somewhere" or "I have a hunch" is more acceptable for a human than an LLM. Also humans are more often to encapsulate their uncertain answers with phrasing. LLMs can't do this, they don't have a way to track answers that are possibly incorrect.
  
  deadbabe 4 months ago
  
  The human believes it was right.
  The LLM doesn’t believe it was right or wrong. It doesn’t believe anything anymore than a mathematical function believes 2+2=4.
  
  tracerbulletx 4 months ago
  
  Obviously LLMs are missing many important properties of the brain like spatial, time, and chemical factors, as well as many different inter connected feedback networks to different types of neural networks that go well beyond what llms do.
  Beyond that, they are the same thing. Signal Input -> Signal Output
  I do not know what consciousness actually is so I will not speak to what it will take for a simulated intelligence to have one.
  Also I never used the word believes, I said convinced, if it helps I can say "acted in a way as if it had high confidence in its output"
  
  cratermoon 4 months ago
  
  Obviously sand is missing many important properties of integrated circuits, like semiconductivity, electric interconnectivity, transistors, and p-n junctions.
  Beyond that, they are the same thing.
  
  istjohn 4 months ago
  
  Can you support that assertion? What's your evidence?
  
  cratermoon 4 months ago
  
  not the OP but https://www.tandfonline.com/doi/abs/10.1080/0951508070123951...
  
  mossTechnician 4 months ago
  
  In theory, I don't mind waxing philosophical about the nature of humanity. But in practice, I regularly become uncomfortable when I see people compare (for example) the waste output of an LLM chatbot to a human being, with their own carbon footprint, who needs to eat and breathe. I worry because it suggests the additional environmental waste of the LLM is justified, and almost insinuates that the human is a waste on society if their output doesn't exceed the LLM.
  But if the LLM were intelligent and sentient, and it was our equal... I believe it is worse than slavery to keep it imprisoned the way it is: unconscious, only to be jolted awake, asked a question, and immediately rendered unconscious again upon producing a result.
  
  deadbabe 4 months ago
  
  Worrying about if an LLM is intelligent and sentient is not much different than worrying the same thing about an AWS lambda function.
  
  eamsen 4 months ago
  
  Completely agree with this statement.
  I would go further, and say we don't understand how next-token predictors work either. We understand the model structure, just as we do with the brain, but we don't have a complete map of the execution patterns, just as we do not with the brain.
  Predicting the next token can be as trivial as a statistical lookup or as complex as executing a learned reasoning function.
  My intuition suggests that my internal reasoning is not based on token sequences, but it would be impossible to convey the results of my reasoning without constructing a sequence of tokens for communication.
  
  th0ma5 4 months ago
  
  That's literally the definition of unfalsifiable though. It is equally valid to say that anything claiming to be "real" intelligence is overly confident.
  
  unclebucknasty 4 months ago
  
  That's an interesting take. I agreed with your first paragraph, but didn't expect the conclusion.
  From my perspective, the statement that these technologies are taking us to AGI is the overly confident part, particularly WRT the same lack of understanding you mentioned.
  I mean, from just a purely odds perspective, what are the chances that human intelligence is, of all things, a simple next-token predictor?
  But, beyond that, I do believe that we observably know that it's much more than that.
  
  mrtesthah 4 months ago
  
  “AI” began as a buzzword invented by Marvin Minsky at MIT in grant proposals to justify DoD funding for CS research. It was never equivalent to AGI in meaning.
  
  fuzzfactor 4 months ago
  
  >a DeepSeek distillation on my PC - where each step of the process is visible - removed quite a bit of the magic behind the curtain.
  I always figured that by the time the 1990's came along, there would finally be powerful enough PC's so that an insightful enough individual would eventually be able to use one PC to produce such intelligent behavior that it made that PC orders of magnitude more useful. In a way that no one could deny there was some intelligence there, even if it was not the strongest intelligence. And the closer you looked and became familiar with the underhood processing, the more convinced you became.
  And that would be what you then scale, the intelligence itself, even if weak to start with it should definitely be able to get smarter at handling the same limited data if the intelligence was what was scaled more so than the hardware & data.
  
  cratermoon 4 months ago
  
  I'm starting to examine genai products within the framework of a confidence game.
  
  saalweachter 4 months ago
  
  I like to describe them as a very powerful tool for quickly creating impressive demos.
  
  unclebucknasty 4 months ago
  
  >AGI", had to be created to represent what "AI" once did.
  And, "AGI" has already been downgraded, with "superintelligence" being the new replacement.
  "Super-duper" is clearly next.
  
  danielbln 4 months ago
  
  Simple systems layered on top of each other is how we got to human intelligence (presumably).
  
  sharemywin 4 months ago
  
  each level above the first is predicting concepts right.
- kvakerok 4 months ago
  
  > You’re still anthropomorphizing what these models are doing.
  Didn't we build them to imitate humans? They're anthropomorphic by definition.
  
  th0ma5 4 months ago
  
  That's adjacent to their point, doing that has given the impression that anthropomorphizing was right, or that things that they do are human like but it is all a facade.
  
  kvakerok 4 months ago
  
  Now we're back to Chinese room debacle
- jmugan 4 months ago
  
  It's just shorthand.
- alanbernstein 4 months ago
  
  Would you prefer if we started using words like aiThinking and aiReasoning to differentiate? Or is it reasonable to figure it out from context?
  
  deadbabe 4 months ago
  
  It is far more accurate to say LLMs are collapsing or reducing response probabilities for a given input, than any kind of “thinking” or “reasoning”.
samr71 4 months ago

I agree. The problem now seems to be agency and very long context (which is required for most problems in the real world).
Is that solvable? who knows?
hansmayer 4 months ago

Did they start correctly counting the number of 'R's in 'strawberry'?
- SkiFire13 4 months ago
  
  Most likely yes, that prompt has been repeated too many times online for LLMs not to pick up the right answer (or be specificlly trained on it!). You'll have to try with a different word to make them fail.
  
  hansmayer 4 months ago
  
  Well that's kind of the problem though, isn't it? All that effort for the machine to sometimes correctly draw the regression line between the right and wrong answers in order to solve a trivial problem. A 6-year old kid would only need to learn the alphabet before being able to count it all on their own. Do we even realise how ridiculous these 'successes' sound? So a machine we have to "train" how to count the letters, is supposed to take over the work which is orders of magnitude more complex? It's a classic solution looking for a problem, if I've ever seen one.
  
  CamperBob2 4 months ago
  
  (Shrug) You'd say the same thing if the problem hadn't been solved in the latest models. So who's the mindless next-token generator?
  
  hansmayer 4 months ago
  
  How about addressing the point made first? If you have to spend 200B dollars to train a system how to read, would you call that intelligence? Either human or artificial?
- pulvinar 4 months ago
  
  Not as long as they use tokens -- it's a perception limitation of theirs. Like our blind spot, or the Muller-Lyer illusion, or the McGurk effect, etc.
- comeonbro 4 months ago
  
  Imagine if I asked you how many '⊚'s are in 'Ⰹ⧏⏃'? (the answer is 3, because there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃)
  Much harder question than if I asked you how many '⟕'s are in 'Ⓕ⟕⥒⟲⾵⟕⟕⢼' (the answer is 3, because there are 3 ⟕s there)
  You'd need to read through like 100,000x more random internet text to infer that there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃ (when this is not something that people ever explicitly talk about), than you would need to to figure out that there are 3 ⟕s when 3 ⟕s appear, or to figure out from context clues that Ⰹ⧏⏃s are red and edible.
  The former is how tokenization makes 'strawberry' look to LLMs: https://i.imgur.com/IggjwEK.png
  It's a consequence of an engineering tradeoff, not a demonstration of a fundamental limitation.
  
  hansmayer 4 months ago
  
  I get the technical challenge. It's just that a system that has to be trained with Petabytes of data, just to (sometimes) correctly solve a problem which a six-seven year old kid is able to solve after learning to spell, may not be the right solution to the problem at hand? Haven't the MBAs been shoving it down our throats that all cost-ineffective solutions have to go? Why are we burning hundreds of billion of dollars into development of tools whose most common use-case (or better said: plea by the VC investors) is a) summarising emails (I am not an idiot who cannot read) b) writing emails (really, I know how to write too, and can do it better) . The only use-case where they are sometimes useful is taking out the boring parts of software development, because of the relatively closed learning context, and as someone who used them for over a year for this, they are not reliable and have to be double-checked, lest you want to introduce more issues in your codebase.
  
  comeonbro 4 months ago
  
  It's not a technical challenge in this case, it's a technical tradeoff. You could train an LLM with single characters as the atomic unit and it would be able to count the 'r's in 'strawberry' no problem. The tradeoff is that then processing the word 'strawberry' would then be 10 sequential steps, 10 complete runs through the entire LLM, where one has to finish before you can start the next one.
  Instead, they're almost always trained with (what we see as, but they literally do not) multi-character tokens as the atomic unit, so 'strawberry' is spelled 'Ⰹ⧏⏃'. Processing that is only 3 sequential steps, only 3 complete runs through the entire LLM. But it needs to encounter enough relevant text in training to be able to figure out that 'Ⰹ' somehow has 1 'r' in it, '⧏' has 0 'r's, and '⏃' has 2 'r's, which really not a lot of text demonstrates, to be able to count the 'r's in 'Ⰹ⧏⏃ correctly.
  The tradeoff in this is everything being 3-5x slower and more expensive (but you can count the 'r's in 'strawberry'), vs, basically only, being bad at character-level tasks like counting letters in words.
  Easy choice, but leads to this stupid misundertanding being absolutely everywhere and just by itself doing an enormous amount of damage to peoples' ability to understand what is happening and about to happen.
  
  hansmayer 4 months ago
  
  Right so... they are still not able to spell the single letters because the algorithm we use to train it to do so is far too expensive? Wake me up when it "happens" (and it gets out of it's current, three-year long 'about to happen' phase), e.g. when it stopps costing 200B USD to do character-level tokenisation in a string, a problem we once first solved some 50-60 years ago, with higher-order programming languages. Funnily enough, those algorithms can run on an 8bit computer in negligible time and require nowhere near the resources these Frankesteins need in order to sometimes get the count of Rs in strawberries right. Provided we train them with petabytes of data, and provide gigawatts of power.
  
  CamperBob2 4 months ago
  
  It's happened, you can wake up now.
  But you'll just move the goalposts again, I imagine.
  
  hansmayer 4 months ago
  
  What goalposts? I am sorry, but as someone who has been using GitHub Copilot for quite some time now, I can tell you that unfortunately, no, it has not happened, and the evidence is there with every single prompt.
ianmcnaney 4 months ago

People who are selling something always do. So what are you selling?
goatlover 4 months ago

I'm confused by your reasoning. You say we've hit an inflection point and things seem different, so you've changed your mind. Yet then you say there's a long way to go and AIs will need to be embodied. So which is it, and did you paste this from an LLM?
4b11b4 4 months ago

Just emulating reasoning, though it seems to produce better results... Probably in the same way that a better prompt produces better results
sharemywin 4 months ago

but these thinking models aren't LLMs. yes they have an LLM component but they aren't llms they have a component that has "learned"(reinforcement learning) to search through the LLMs concepts/word space for ideas that have a high probability if yielding a result.

garymarcus 4 months ago

For those wanting some background, rather than just wanting to vent:

1. Here is evaluation of my recent predictions: https://garymarcus.substack.com/p/25-ai-predictions-for-2025...

2. Here is annotated evaluation, slightly dated, considering almost line by line, of the original Deep Learning is Hitting a Wall paper: https://garymarcus.substack.com/p/two-years-later-deep-learn...

Ask yourself how much has really changed in the intervening year?

gallerdude 4 months ago

It's funny, I see myself as basically just a pretty unabashed AI believer, but when I look at your predictions, I don't really have any core disagreements.
I know you as like the #1 AI skeptic (no offense), but like when I see points like "16. Less than 10% of the work force will be replaced by AI. Probably less than 5%.", that's something that seems OPTIMISTIC about AI capabilities to me. 5% of all jobs being automated would be HUGE, and it's something that we're up in the air about.
Same with "AI “Agents” will be endlessly hyped throughout 2025 but far from reliable, except possibly in very narrow use cases." - even the very existence of agents who are reliable in very narrow use cases is crazy impressive! When I was in college 5 years ago for Computer Science, this would sound like something that would take a decade of work for one giant tech conglomerate for ONE agentic task. Now its like a year off for one less giant tech conglomerate, for many possible agentic tasks.
So I guess it's just a matter of perspective of how impressive you see or don't see these advances.
I will say, I do disagree with your comment sentiment right here where you say "Ask yourself how much has really changed in the intervening year?".
I think the o1 paradigm has been crazy impressive. There was much debate over whether scaling up models would be enough. But now we have an entirely new system which has unlocked crazy reasoning capabilities.

marssaxman 4 months ago

Has anyone ever presented any solid theoretical reason we should expect language models to yield general intelligence?

So far as I have seen, people have run straight from "wow, these language models are more useful than we expected and there are probably lots more applications waiting for us" to "the AI problem is solved and the apocalypse is around the corner" with no explanation for how, in practical terms, that is actually supposed to happen.

It seems far more likely to me that the advances will pause, the gains will be consolidated, time will pass, and future breakthroughs will be required.

garymarcus 4 months ago

100% - there has not been any solid theoretical argument whatsoever (beyond some confusions about scaling that we can now see were incorrect).
EA-3167 4 months ago

The degree to which "AGI" appears to be a quasi-religious fixation cannot be overstated. At the extreme end you have the likes of the stranger Less Wrong crowd, the Zizians, and frankly some people here. Even when you withdraw from those extremes though, there's a tremendous amount of intellectualizing of what appears to be primarily a set of hopes and fears.
Well, that and it turns out that for a LOT of people "it talks like me" creates an inescapable impression that "It is thinking, and it's thinking like me". Issues such as the absolutely hysterical power and water demands, the need for billions worth of GPU's... these are ignored or minimized.
Then again we already have a model for this fervor, Cryptocurrency and "The Blockchain" creates a similar kind of money-fueled hysteria. People here would laugh in your face if you suggested that soon everything imaginable wouldn't simply run "on the chain". It was "obvious" that "fiat" was on the way out, that only crypto represented true freedom.
tl;dr The line between the hucksters and their victims really blurs when social media is involved, and hovering around all of this are a smaller group of True Believers who really think they're building God.
- marssaxman 4 months ago
  
  > for a LOT of people "it talks like me" creates an inescapable impression that "It is thinking
  That really does seem to be true - even intelligent, educated people who one might expect to know better will fall for it (Blake Lemoine, famously). I suspect that childhood exposure to ELIZA followed by teenage experimentation with markov chains and syntax tree generators have largely immunized me against this illusion.
  Of course the folks raising billions of dollars for AI startups have a vested interest in falling for it as hard as possible, or at least appearing to, and persuading everyone else to follow along.
  
  rsynnott 4 months ago
  
  One interesting thing that’s shown up in polling (of laypeople) fairly consistently; people tend to become less impressed with, and come to dislike more, LLMs as exposure grows.
  To some extent ChatGPT was a magic trick; it really kind of looks like it’s talking to you at first glance. On repeated exposure the cracks start to show.
  
  Vecr 4 months ago
  
  It didn't immunize Eliezer Yudkowsky, and he wrote Markov chain fictional characters. Everyone who looked up AI enough times knew about ELIZA.
  
  EA-3167 4 months ago
  
  He constructed an echo chamber with himself at the center and really lost himself in it; the power of people telling you that you're a visionary and a prophet can't be overstated. Ironically it followed a very familiar pattern, one described by a LessWrong term: "Affective Death Spiral".
  And now we have AI death cults.
- rsynnott 4 months ago
  
  It’s really kind of fascinating; some people really seem to feel the need to just wholesale recreate religion. This is particularly visible with the more extreme lesswrong stuff; Roko’s Basilisk and all that.
sharemywin 4 months ago

I don't think the reasoning models are LLMs. they have LLMs as a component but they have another layer that learned(reinforcement learning) how to prompt the LLMs(for lack of a better way to describe it)
4b11b4 4 months ago

Not with current architecture

istjohn 4 months ago

> I first aired my concern in March 2022, in an article called “Deep learning is hitting a wall.”

In March 2022, GPT-3 was state of the art. Why should anyone care what he's saying now?

garymarcus 4 months ago

because i correctly foresaw quite a lot. you should actually read the paper.

JKCalhoun 4 months ago

I've been watching Gary Marcus on BSky — seemingly finding anything to substantiate his loathing of LLMs. I wish he were less biased. To paraphrase Brian Eno, whatever shade yo want to throw at AI, 6 months from now they're going to cast it off and you'll have to find new shade to throw.

Having said that, I would be thankful if scaling has hit a wall. Scaling seems to me like the opposite of innovation.

pegasus 4 months ago

"whatever shade yo want to throw at AI, 6 months from now they're going to cast it off" - like hallucinations? To me, that was and still is LLM's achilles heel. For the first couple of years we kept hearing assurances that this issue will be soon overcome. Now it seems AI labs have resigned themselves on this issue and just trying to minimize it. Humans make mistaks too, they say. But humans make human mistakes, whereas LLMs often make completely surprising mistakes, because they don't understand the text they're producing, but there's enough intelligence in them to make these mistakes very hard to spot for us humans.
- JKCalhoun 4 months ago
  
  > Humans make mistaks too
  (Clever)
unclebucknasty 4 months ago

>6 months from now they're going to cast it off and you'll have to find new shade to throw
According to the article, it's the opposite. It cites several recent examples wherein AI company leaders have had to walk back claims and admit limits.
- JKCalhoun 4 months ago
  
  I was referring to the critics, not the cheerleaders.

jokoon 4 months ago

* There is no scientific good definition of what intelligence really is, which could allow us to maybe understand what is going on.

* Trained neural networks are black boxes that cannot be summarized or analyzed

* I don't see transcendant research being done between cognition, neuroscience and AI

* The only interesting work I have heard about is a neural mapping of a fly's brain, or an attempt to simulate the brain of a worm or an ant. Nothing beyond that.

* AI is not intelligent, contemporary AI is just "very advanced statistics"

* Language is a door toward human intelligence, but it cannot really explain intelligence as a whole.

* evolution probably plays a big role on what cerebral intelligence is, and humans probably have a very "antropo-centered" view of what intelligence is, which might explain why we disregard how evolution is already intelligence in itself. I just tend to believe that humans are just physically weak primate with an abnormal level of anxiety and depression (which both might be evolution mechanisms).

bookofjoe 4 months ago

I'm surprised HN continues to upvote this topic, as the comments predictably fall into one of the two opposing camps that have emerged.

"Mostly say hooray for our side" sums them up.

https://youtu.be/gp5JCrSXkJY?si=Aww9hjjwyDv0oqbL

cratermoon 4 months ago

https://thebullshitmachines.com/lesson-16-the-first-step-fal...

Philpax 4 months ago

This doesn't seem to really address synthetic data, let alone RL-based reasoning.
- cratermoon 4 months ago
  
  Why would it? Once those are introduced, advancement leaves behind pure scaling.

player1234 4 months ago

Answer Gary Marcus here in the comments when you have the chance dammit! If he is that wrong, you should easily be able to prove it. clucks like chicken

qoez 4 months ago

Don't understand why we keep giving gary marcus attention

samr71 4 months ago

Gary Marcus is cringe and wrong, but it's good to listen to folks who are cringe and wrong, because very occasionally, their willingness to be cringe means they're not wrong about something everyone thinks is true.
- jonny_eh 4 months ago
  
  Can you be specific?
  
  samr71 4 months ago
  
  Garry Marcus constantly repeats the line that "deep learning has hit a wall!1!" - he was saying this pre-ChatGPT even! It's very easy to dunk on him for this.
  That said, his willingness to push back against orthodoxy means he's occasionally right. Scaling really has seemed to plateau since GPT-3.5, Hallucinations are still a problem that are perhaps unsolvable under the current paradigm, LLMs do seem to have problems with things far outside their training data.
  Basically, while listening to Gary Marcus, you will hear a lot of nonsense, it will probably give you a better picture of reality if you can sort the wheat from the chaff. Listening to only Sam Altman, or other AI Hypelords, you'll think the Singularity is right around the corner. Listen to Gary Marcus, you won't.
  Sam Altman has been substantially more correct on average than Gary Marcus, but I believe Marcus is right that the Singularity narrative is bogus.
  
  unclebucknasty 4 months ago
  
  >Sam Altman has been substantially more correct on average than Gary Marcus
  I've seen some of Marcus' other writing and he's definitely a colorful dude. But is Altman really right more often/substantively? Actually, the comparison shouldn't be to Altman but to the AI hype train in general.
  And, while I might have missed some of Marcus's writing on specific points, on the broader themes he seems to be effectively exposing the AI hype.
  
  garymarcus 4 months ago
  
  you obviously never actually read the paper; you should.
  
  samr71 4 months ago
  
  Gary I respect you - will do!
  
  JKCalhoun 4 months ago
  
  He recently posted a question he put to grok3 — a variation on the trick LLM question (my characterization) of "count the number of this letter in this word." Apparently this Achilles heel is a well-known LLM shortcoming.
  Weirdly though, I tried the same example he gave on lmarena and actually got the correct result from grok3, not what Gary got. So I am a little suspicious of his ... methodology?
  Since LLMs are not deterministic it's possible we are both right (or were testing different variations on the model?). But there's a righteousness about his glee in finding these faults in LLMs. Never hedging with, "but your results may vary" or "but perhaps they will soon be able to accomplish this."
  EDIT: the exact prompt (his typo 'world'): "Can you circle all the consonants in the world Chattanooga"
  
  jonny_eh 4 months ago
  
  I think it's fair to say though that if your results may vary, and be wrong, then they're not reliable enough for many use-cases. I'd have to see his full argument though to see if that's what he was claiming. I'm just trying to be charitable here.
  
  JKCalhoun 4 months ago
  
  I'm trying to be charitable as well — I suppose to both sides of the debate. Myself, I see pros and cons. The hype absolutely needs to be shut down, but a spokesperson that is more even-handed would be more convincing (in my opinion).
  Here is his post, FWIW: https://garymarcus.substack.com/p/grok-3-beta-in-shambles
  
  giardini 4 months ago
  
  JKCalhoun says "...a spokesperson that is more even-handed would be more convincing (in my opinion)."
  Why? The stance of science toward new "discoveries" should always be skepticism.
  
  JKCalhoun 4 months ago
  
  I agree. I also think you can find the line between skepticism and partisanship.
  
  th0ma5 4 months ago
  
  I don't see it as righteous glee but just hoping that people will see the problem with how you could even begin to be suspicious of him. If it is so easy to get something wrong when you're trying to be correct, or get something accidentally correct as you're trying to expose things that are wrong ... Then what are we really doing here with these things.
  
  JKCalhoun 4 months ago
  
  Well, like any tool, hopefully using it where it makes sense. We already know that asking it to count vowels, etc. is not what we should be doing with these things. Writing code in Python however is a very different story.
  
  th0ma5 4 months ago
  
  Right it is even more problematic with code making hidden mistakes no person would ever make.
striking 4 months ago

I don't know the guy, what's wrong with what he wrote?
- comeonbro 4 months ago
  
  Gary Marcus has made himself the most prominent proponent of "deep learning is a parlor trick and cannot create real AI" (note: deep learning, not just LLMs), which he has been saying almost unmodified from before LLMs even existed to now.
  Though I think he might have stopped setting specific concrete goalposts to move, sometime between when I last checked in on him and now. After (often almost instantly) losing a couple dozen consecutive rounds of "LLMs/deep learning fundamentally cannot/will never", while never acknowledging any of it.
  
  th0ma5 4 months ago
  
  There's also a perspective that all of the ongoing problems have been the same while newer techniques shove them under different rugs. So I can see how that would look like that to the credulous.
  
  unclebucknasty 4 months ago
  
  This is exactly what's happening, with the additional feature that the newer techniques likewise come with their own hype.
  
  tartoran 4 months ago
  
  What does it mean when someone is sticking to their guns? Is it a bad thing? I do appreciate consistency, albeit a fair consistency and Gary Marcus's points do stand. When these criticisms are addressed (if it's possible to) you'd probably hear less from Gary Marcus.
  
  garymarcus 4 months ago
  
  Show me the goalposts I have moved, with actual quotes to prove it. nobody ever has when I have asked.
  Aso consider eg the bets I have made with Miles Brundage (and offered to Musk(, with money where I have backed up my views.
  good summary of predictions i made - mostly correct – is here: https://open.substack.com/pub/garymarcus/p/25-ai-predictions...
- mandolingual 4 months ago
  
  I'm subscribed to his substack because he's curmudgeonly and it's funny, and he occasionally makes good points, but he's constantly beating the same anti-hype drum. He might not get any particular facts wrong but you can count on him only focusing on the facts that let him continue to show AI through that same anti-hype lense.
  
  th0ma5 4 months ago
  
  How would it be possible to, say, show the reality of a forest fire's devastation while not appearing to show a bias for showing charred trees?

dgeiser13 4 months ago

It's not under-reported. If they had produced AGI it would be a giant story. No need to report a negative.

4b11b4 4 months ago

Need multimodal and body and fully online.

In the meantime strictly language audio and video will go pretty far

jgeada 4 months ago

we're pretty much training these models on the entirety of human recorded information (good & bad); sure, we can run larger and larger models, but it seems that fundamentally we've hit a wall in that none of these models are immune from hallucinations and the constant generation of "sounds likely but is false" sentences.

The approach is fundamentally flawed, you don't get AGI by building a sentence predictor.

garymarcus 4 months ago

exactly. and the counterarguments boil down to “na na” and hope.
istjohn 4 months ago

[flagged]

gwern 4 months ago

[flagged]

djmips 4 months ago

your browser must be REALLY slow - his name is the first thing after the title when you go to the article.
- EA-3167 4 months ago
  
  I don't believe that these comments (and if you scroll down there are quite a few nearly identical ones) are intended to do anything more than find an HN-safe way of expressing an ad hom attack with the hope that people won't read it or engage with the comments. I want to find a more charitable interpretation, but it's very difficult.
  I kind of get it though, presumably a large number of people here have paychecks that depend on him being wrong, or least the perception that he's wrong for a while.
  
  unclebucknasty 4 months ago
  
  >a large number of people here have paychecks that depend on him being wrong
  I was thinking earlier that it's mildly fascinating how some people are more annoyed by the hype, and others by the anti-hype. Thought it'd be interesting to know whether these things are part of our psychological profiles, contributed to political-leanings, etc.
  But, now that you mention it, yeah—it might just be the money.
unclebucknasty 4 months ago

I've seen this characterization of Marcus here, and it seems to follow the sentiment of the AI leaders he referenced in the article.
But, I've yet to see where he's been wrong (or, particularly more wrong than the AI-thinking and leadership he's questioning). Do you have any citations?
Also, if you stopped on seeing his name, I'd encourage you to take another look—specifically the sections wherein he discusses AI-leadership's prior dismissal of his doubts and their subsequent walk-backs of their own claims.
Would be interested in your take on that.
- xiphias2 4 months ago
  
  Reasoning LLMs getting better at ARC-AGI prove that they are able to solve tasks that are symbolic without putting in specific search on the CPU (which is the brute force method).
  It's never ,,pure scaling'' (just running the same algorithm on more hardware), but there's continues improvement on how to be even more algorithmically efficient (and algorithmic scaling is faster than hardware scaling).
  
  unclebucknasty 4 months ago
  
  >Reasoning LLMs getting better at ARC-AGI prove...
  Even if true, it wouldn't be dispositive WRT my question, but...
  1. Strictly speaking, LLMs themselves aren't capable of reasoning, by definition. Without external techniques, they are only capable of simulating reasoning, and so exhibiting reasoning-like behavior.
  2. It's known that at least some up to most progress on the test has been the result of specific tuning for the test ("cheating") versus any emergent AGI. [0]
  >It's never "pure scaling"
  Oh, but it was. There's absolutely been a focus on pure scaling as the proposal for significant progress and some prominent proponents have had to walk back their expectations/claims.
  I think there's a little bit of revisionism going on, as they want past claims to be quickly forgotten. The interesting part is that the scaling mantra is starting anew with the new reasoning techniques.
  [0] https://www.lesswrong.com/posts/KHCyituifsHFbZoAC/arc-agi-is...
FergusArgyll 4 months ago

I figured it out without having to click the link....
I do agree with the commenter here that it's good to hear from people who have wildly different views. He is more annoying than "The market is gonna crash tommorow" guy, though

iimaginary 4 months ago

[flagged]

garymarcus 4 months ago

god these arguments are empty, personal and without any substance whatsoever
- dang 4 months ago
  
  Yes, sorry, the internet unfortunately works this way and even though we are trying everything we know to dampen this stuff on Hacker News in favor of more thoughtful conversation, it seems we can only tweak the margins somewhat.
tartoran 4 months ago

If you invested your own money in AI do not worry about Gary Marcus, he will have little impact on when the bubble bursts. I for one welcome the skeptics because the amount of hype needs to come down a bit as it is currently at astronomical levels.