mwigdahl 2 days ago

I was annoyed at reading all the subjective comparisons and outright conspiracy theories on the Reddit AI coding groups, backed by nothing more than that day's subjective observations.

So I put the older and newer versions of Claude Code and Codex CLI to the same task of generating a simple React web app from a specification, using a consistent prompt structure, and compared the results.

It's not a quantitative evaluation, but it is an apples-to-apples qualitative comparison that shows clear differences in agent/model performance on an equivalent multistep task using equivalent approaches.