Show HN: SurvivalIndex – which developer tools do AI agents choose?

survivalindex.org

1 points by scalefirst 12 hours ago

We've been running coding agents against standardized repos with natural-language prompts — no tool names, no hints — and measuring what they actually choose.

Early finding: Claude Code picks Custom/DIY in 12 of 20 categories. Not because it can't use the tools (BFCL scores suggest it can) but because it doesn't reach for them. That's a different failure mode than capability benchmarks measure.

We score each tool on: agent visibility, pick rate vs Custom/DIY, cross-context breadth, expert human ratings, and implementation success rate. Tools above survival=1 persist. Below it, agents synthesize around them.

Methodology is at survivalindex.org/methodology. Very curious what people think of the measurement approach, especially the human coefficient variable.

scalefirst 12 hours ago

One thing I'd love input on: we use expert human ratings as a variable (H) to capture whether agent choices align with what experienced engineers would actually ship. Curious if people think this is the right signal or whether it introduces too much subjectivity.