> This all started because I was contractually obligated to write unit tests for a codebase with hundreds of thousands of lines of code and go from 0% to 80%+ coverage in the next few weeks - seems like something Claude should do.
Is your client ok with this? Are the tests any good?
If left to its own devices, Claude will resort to writing passable looking BS as tests pretty quickly when the going gets tough e.g. if it has to interface with stateful real world systems and its tests struggle to pass
This is my experience as well. If you want it to write good tests, you have to take a much more involved approach of first making it establish what needs testing in each module, writing each test one at a time, and making it prove that it can break the test by modifying the source code to introduce a bug, modify the test to be appropriate, rinse and repeat. I haven't done this much because it's very expensive in terms of time and premium tokens...right now, I just write most tests myself so at least I have faith in the verification suite.
While I agree with the sentiment, you are being to generous. Claude is like a new intern every 15..30 minutes, or however long it takes to fill the context. "Oh, I know what is the issue, let me delete this code for now" and proceeds to nuke the fix it spent last context window making. No intern can be that malicious.
> This all started because I was contractually obligated to write unit tests for a codebase with hundreds of thousands of lines of code and go from 0% to 80%+ coverage in the next few weeks - seems like something Claude should do.
Is your client ok with this? Are the tests any good?
If left to its own devices, Claude will resort to writing passable looking BS as tests pretty quickly when the going gets tough e.g. if it has to interface with stateful real world systems and its tests struggle to pass
This is my experience as well. If you want it to write good tests, you have to take a much more involved approach of first making it establish what needs testing in each module, writing each test one at a time, and making it prove that it can break the test by modifying the source code to introduce a bug, modify the test to be appropriate, rinse and repeat. I haven't done this much because it's very expensive in terms of time and premium tokens...right now, I just write most tests myself so at least I have faith in the verification suite.
claude is like an intern, someone has to code review and approve before final delivery imho
While I agree with the sentiment, you are being to generous. Claude is like a new intern every 15..30 minutes, or however long it takes to fill the context. "Oh, I know what is the issue, let me delete this code for now" and proceeds to nuke the fix it spent last context window making. No intern can be that malicious.
Honestly, this is crazy. I have to laugh…
Lol I've done exactly the same thing past 3 weeks with a python script. Tests are okayish, it's very exhausting spending hours just reviewing tho.
https://github.com/DeprecatedLuke/claude-loop seems relevant
https://github.com/anthropics/claude-code/tree/main/plugins/...
"The technique is named after Ralph Wiggum from The Simpsons, embodying the philosophy of persistent iteration despite setbacks."