Show HN: Mix – Open-source multimodal agents SDK
github.comWhy we built it: • Claude Code: great for coding, but no video/audio support, localhost only • OpenAI SDK: single-model, no native multimedia tools • Both: no integrated DevTools for debugging agent reasoning
So, we built Mix as an alternative for multimodal applications. • Native video/audio/PDF analysis tools (via Gemini for vision, Claude for reasoning) • Multi-model routing instead of single-provider lock-in • One-command Supabase setup for cloud deployment (vs localhost-only) • HTTP architecture that enables visual DevTools alongside agent workflows • Go backend: 50-80% lower memory footprint than Node.js—efficient for concurrent agent sessions. Python and typescript clients are available
Example use cases in the demo video: portfolio analyzer that reads Excel and generates charts, YouTube search agent that finds and edits video clips.
GitHub: https://github.com/recreate-run/mix Demo video: https://youtu.be/IwgKt68wQSc
Would appreciate feedback, especially from folks building multimodal agents.
Do you support streaming responses with the HTTP model? And does the DevTools client connect via WebSocket or polling ?