Show HN: Open-source testing framework for AI agents with semantic validation

github.com

4 points by alessandro-a 2 days ago

Hey HN!

I built SemanticTest while working on calendar0.app (an AI calendar assistant).

While I was building the AI assistant, I noticed a lack on good AI Evals frameworks that would help me test my agent.

SemanticTest uses GPT-4 as a judge to evaluate:

- Text responses (semantic meaning)

- Tool calls (correct tools, right order)

- Multi-turn conversations

It's composable: you build tests as JSON pipelines using custom blocks.

Would love feedback. Thank you!