ian_j_butler 8 hours ago

Hah, since it's open in another tab: Talk Isn’t Always Cheap: Understanding Failure Modes in Multi-Agent Debate @ https://arxiv.org/html/2509.05396v2

To actually engage more with the substance of TFA.. very refreshing to see someone bringing numbers. To me this shows we (still) need something somewhere between the numbers and the anecdata. It's annoying to hear claims of "1000x productivity" or claims of negative/neutral productivity without any extra context whatsoever. So you brought data? Great! But still no context. Boring CRUD? Complex UI? A rewrite/port of legacy? What industry, language, how many human collaborators, and what baseline for SLOC??

We need to get rigorous about this stuff and actually aim for a decent framework which can answer "Are LLMs a value add for this project? How much value? How much cost?"

Such a thing might be information-theoretical, complexity-based, or counting integration touchpoints / info boundaries / sources of ground-truth? We could even try to implement that framework with LLMs and probably should! But the default answer of "Yes definitely, always useful, just token-maxx it since LLMs are the future" is (still) only marketing, not engineering

jdpigeon 5 hours ago

This effect is likely be excarcebated in teams which are suffering layoffs this year. Even less people to clear the in-flight queue of partially complete tasks

vgordon 7 hours ago

The activity-vs-outcome distinction feels like the most interesting part of this discussion. Curious what metrics people have found most useful for separating the two.

sublinear 8 hours ago

More tasks get "done" while rework is sky high and overall throughput to production drops.

First, I'd like to thank all the people working on testing and doing the lord's work.

Anyway, this isn't even a unique pattern to LLM use. We've all seen this exact same thing when more devs are added a project running late, teams are siloed, outsourcing to contractors, etc.

  • northstar702 6 hours ago

    Definitely not unique. Thats said, given how "easy" it is to scale LLM output, compared to human output, this pattern could be messier in the LLM era? The whole tokenmaxxing motion assumed more output equals more outcomes. What do you think?

    • sublinear 6 hours ago

      Already covered in the post.

      > So this is inherent to the technology. No amount of tokenmaxxing is going to change it. LLM development even breaks common and well accepted quality norms in software development - like backwards compatibility. You literally can’t (and wouldn’t want!) an LLM to do the same thing in the same way twice. But this means, LLMs - on their own - are not a solid foundation to build a revolution on. They never can be.

      • northstar702 6 hours ago

        Fair. I suppose, the question behind my question is -- what else do we need. Thanks.

        • sublinear 4 hours ago

          Yeah I have a different take. We will end up scaling human output by hiring more devs as wages for entry to mid level continue to stagnate.

          I think LLMs make finding information and learning way more accessible. Even with realistic expectations it already is a revolution for literacy, education, and search. LLMs are a massive achievement that would be celebrated appropriately if only the public wasn't introduced to them during global political/economic crises encouraging grifters and authoritarians muddying the waters.

          To be clear, hiring at scale is a sorely needed step. Software is just like any other form of writing. It carries weight and needs cultural and community context to work. We have needed way more technical literacy for decades to make this digital always-on world work. I think most would at least agree that LLMs bridge knowledge gaps. They're a net good thing despite the current abuse by bad actors.

  • conception 3 hours ago

    I think the real bitter lesson is nine women will never make a baby in a month.

gtirloni 9 hours ago

People are still figuring things out, there's a lot of wasted tokens, etc.

This is like complaining a student isn't as productive as a senior engineering.

I think we as an industry haven't even graduated to junior level when it comes to figuring our how to use AI to improve things.

  • ozlikethewizard 8 hours ago

    This is discussed in the article, and I think the author makes pretty reasonable arguments for why by nature we will not see the reliability of LLM usage improve. They also discuss what I agree as the more effective method of using an LLM is, as a feedback and refinement tool, not a decision maker.

    • gtirloni 4 hours ago

      > This is not a limitation that can be overcome by LLMs. Their generative value is in their unreliability. If you turn temperature down to zero, you get a deterministic machine - but you also break every meaningful application I know of in production.

      This is not a reasonable argument. Setting the temperature down to zero does NOT give you a deterministic machine. And I have never seen that break any application in production, quite the contrary.

deadbabe 8 hours ago

I'm curious, LLMs have been around for a while now...

How many of you would say you need LLMs now for work? Not that you want it because it's nice to have, but rather you would literally not be able to do your job at all because you don't have an LLM to use?

If your company said "We're not paying for LLMs anymore.", would you begrudgingly pay for or host your own LLM that complies with company policies, or just go back to writing everything by hand?

I feel like companies could definitely just push the cost of LLMs back onto the engineers themselves (much like how people have to pay for their own gas to go to work), and engineers would have no choice but to either buy their own subscriptions or be very good at writing code by hand just to stay competitive.

This kind of shift is coming, partly because costs of LLMs are to unsustainable for companies, but also because it sounds like the kind of diabolical idea some upper management thinks they can get away with, as peer pressure will naturally do its thing. Paying for your own token usage is a small price to pay for job security isn't it?

  • Cyan488 8 hours ago

    I'm an embedded systems developer. I have almost fully "outsourced" the Python code for frontend pc software that interacts with my firmware.

    I deliberately continue to write all my firmware by hand, and will occasionally consult AI for review. I never use AI to write prose for me.

    Python is better represented in training data, writing bench software was a bit boring, I get to spend more time where I have (and continue to build) domain knowledge.

    Agentic Opus is a nice to have and I get to explore the frontier tech, but if (or when) it's taken away, a self hosted coding model would be fine - I'd just have to dust off my Python skills and it would take longer.

  • bigstrat2003 4 hours ago

    > If your company said "We're not paying for LLMs anymore.", would you begrudgingly pay for or host your own LLM that complies with company policies, or just go back to writing everything by hand?

    I already write code myself, so perhaps I'm not your target audience. But I strongly believe that anyone who pays for his own LLM is being a fool. You should never pay for tools for work. If you do so, you're letting them take advantage of you rather than paying the needs of their own business. It's a bad deal, don't go down that road.

    • deadbabe 4 hours ago

      But you don’t need an LLM.

6stringmerc 8 hours ago

“It’s hard for me to put into words how bad this is.”

AHAHAHAHA! I genuinely laughed out loud, filled the room.

This is the “citation needed” rebuttal most REASONABLE people in the software industry have been looking for, and the sample size is only going to get bigger! Does anyone really think - not believe, logically conclude based on evidence at hand - that there will be any contrary outcomes at scale? Honestly, this is going to put a lot of software pros into a “sabbatical” where cleaning up the mess should be billed like that old joke at auto mechanics’ shops:

$150 an hour to fix it

$500 an hour if you tried to fix it before bringing it here

Seriously laughing out loud. I needed this. Hahaha

Edit: no wonder he’s so astute - came up in construction. In that industry cost versus benefit and safety concerns are often a matter of life or death. Real consequences. A lot more educational than staring at a glowing box for most of one’s career. Disclosure: I did risk management for construction projects, like the Alabama MB plant.