Show HN: LLPlayer – a media player with OpenAI Whisper
llplayer.comHello HN. I created a video player for Windows, specialized for language learning.
It can generate subtitles in real time using OpenAI Whisper. It can also generate subtitles from any online videos.
It is free and open source.
I would very much like to get feedback, Thanks.
[Github: https://github.com/umlx5h/LLPlayer]
I think it would also be helpful if it could pronounce or vocalize specific words you click on to learn the definition of, if not entirely transcribe the subtitled text in another language.
Keep up the good work, cool idea.
Thanks for the feedback! It is possible to use an external dictionary tool to speak via the clipboard, but it seems difficult to support many languages. It would be easy to implement using Microsoft UWP speech API, but there may be quality issues.
I will research if there is a good quality playback method locally, Thanks.
Very cool. VLC's AI subtitles are still in preview.
It's not open source, but PotPlayer supported it in December before VLC's announcement, so I'm not the first. Incidentally, I added this feature in October.
I will imitate the good points of other players with ASR implementation.
Looks really nice congrats, I built something similar but for podcasts [0]
I see you are using yt-dlp for YouTube, my app originally also played YouTube videos scraped with yt-dlp, but I found that it is very hard to get YouTube audio "at scale" even when rotating IP, using proxies, etc. You will eventually get blocked, so I dropped the whole thing and focused on podcasts.
But with your app the user runs it locally so that shouldn't be a problem. Good luck.
[0] https://www.langturbo.com
Looks very interesting, thanks for sharing!
Looks incredibly useful. What is the accuracy like?
Thanks, For translating, currently it translates one subtitle at a time, so the accuracy is low because it cannot retain the context before and after the subtitle. I'm addressing this by supporting dual subtitles. Translation is assumed to be used only as a complement.
I would like to improve accuracy by preserving context, but I haven't found a good way to do this at the moment.
If we are talking about the accuracy of the transcription, it is very good if you use a large model. At least the accuracy of whisper is far superior to Youtube's subtitle generation!
Well done!
this is a really cool idea