points by bcherny 1 week ago

Thanks for the feedback IDs — read all 5 transcripts.

On the model behavior: your sessions were sending effort=high on every request (confirmed in telemetry), so this isn't the effort default. The data points at adaptive thinking under-allocating reasoning on certain turns — the specific turns where it fabricated (stripe API version, git SHA suffix, apt package list) had zero reasoning emitted, while the turns with deep reasoning were correct. we're investigating with the model team. interim workaround: CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 forces a fixed reasoning budget instead of letting the model decide per-turn.

nayroclade 1 week ago

Hey bcherny, I'm confused as to what's happening here. The linked issue was closed, with you seeming to imply there's no actual problem, people are just misunderstanding the hidden reasoning summaries and the change to the default effort level.

But here you seem to be saying there is a bug, with adaptive reasoning under-allocating. Is this a separate issue from the linked one? If not, wouldn't it help to respond to the linked issue acknowledging a model issue and telling people to disable adaptive reasoning for now? Not everyone is going to be reading comments on HN.

  • unsupp0rted 1 week ago

    It's better PR to close issues and tell users they're holding it wrong, and meanwhile quietly fix the issue in the background. Also possibly safer for legal reasons.

    • liamsfr 4 days ago

      Isn’t that what they just did here? Close Stella’s Issue, cross post to hn, then completely sidestep an observation users are making, and attack the analyst of transcripts with a straw man attack blaming… thinking summaries….

  • kenmacd 1 week ago

    There's a 5 hour difference between the replies, and new data that came in, so the posts aren't really in conflict.

    Also it doesn't sound like they know "there's a model issue", so opening it now would be premature. Maybe they just read it wrong, do better to let a few others verify first, then reopen.

diavelguru 1 week ago

Love this. Responding to users. Detail info investigating. Action being taken (at least it seems so).

  • jojobas 1 week ago

    Surely you realize it's AI responding? (not sure if /s)

  • gilrain 1 week ago

    And all hidden in the comments of a niche forum, while the actual issue is closed and whitewashed? You got played.

allisdust 1 week ago

I cannot provide the session ids but I have tried the above flag and can confirm this makes a huge amount of difference. You should treat this as bug and make this as the default behavior. Clearly the adaptive thinking is making the model plain stupid and useless. It is time you guys take this seriously and stop messing with the performance with every damn release.

JamesSwift 1 week ago

Just set that flag and already getting similar poor results. new one: 93b9f545-716c-4335-b216-bf0c758dff7c

  • JamesSwift 1 week ago

    And another where claude gets into a long cycle of "wait thats not right.. hold on... actually..." correcting itself in train of thought. It found the answer eventually but wasted a lot of cycles getting there (reporting because this is a regression in my experience vs a couple weeks ago): 28e1a9a2-b88c-4a8d-880f-92db0e46ffe8

    • JamesSwift 1 week ago

      Another 1395b7d6-f2f1-4e24-a815-73852bcdeed2

      It fails to answer my initial question and tells me what I need to do to check. Then it hallucinates the answer based on not researching anything, then it incorrectly comes to a conclusion that is inaccurate, and only when I further prompt it does it finally reach a (maybe) correct answer.

      I havent submitted a few more, but I think its safe to say that disabling adaptive thinking isnt the answer here

onoesworkacct 1 week ago

This kind of thing is harder for regular end-users to understand following the change removing reasoning details.

mangatmodi 1 week ago

I am curious. Are you able to see our session text based on the session ID? That was big no in some of the tier-1 places I worked. No employee could see user texts.

  • rkangel 1 week ago

    IIRC for Enterprise, using /feedback or /bug is an exception to the "we promise not to use your data" agreement.

tomaskafka 1 week ago

My guess is there isn't enough hardware, so Anthropic is trying to limit how much soup the buffet serve, did I guess right? And I would absolutely bet the enterprise accounts with millions in spend get priority, while the retail will be first to get throttled.

gilrain 1 week ago

> The data points at adaptive thinking under-allocating reasoning on certain turns

Will you reopen the issue you incorrectly closed, then…? Or are you just playacting concern?