Show HN: Koshei AI – a voice-native AI language university (A1 to D2)

1 points by bugsbuny24 a month ago

Hi HN,

I’m building a voice-first AI language teacher in the browser using: - Next.js - Gemini Audio API for STT - Gemini TTS - Supabase

The product vision is a structured AI “language university” rather than a general chatbot.

Right now my biggest technical problem is: How would you implement lightweight browser-native lip sync for a static avatar image while TTS audio is playing?

I’ve tried: - Three.js + VRM (too heavy / unstable for this use case) - simple canvas mouth animation - CSS-only pulse effects

I want: - something realistic enough to feel alive - low dependency weight - web-compatible - stable on mobile

Secondary issues: - MediaRecorder reliability on mobile Safari - reducing transcript latency - voice UX for guided teaching rather than free-form chat

Demo: https://koshe-al.onrender.com

Repo: https://github.com/Bugsbuny24/Koshe-Al-

Would love technical suggestions, architecture criticism, or examples of similar systems done well.