Show HN: LLPlayer – a media player with OpenAI Whisper

22 points by umlx 4 months ago

Hello HN. I created a video player for Windows, specialized for language learning.

It can generate subtitles in real time using OpenAI Whisper. It can also generate subtitles from any online videos.

It is free and open source.

I would very much like to get feedback, Thanks.

[Github: https://github.com/umlx5h/LLPlayer]

pogue 4 months ago

I think it would also be helpful if it could pronounce or vocalize specific words you click on to learn the definition of, if not entirely transcribe the subtitled text in another language.

Keep up the good work, cool idea.

umlx 4 months ago

Thanks for the feedback! It is possible to use an external dictionary tool to speak via the clipboard, but it seems difficult to support many languages. It would be easy to implement using Microsoft UWP speech API, but there may be quality issues.
I will research if there is a good quality playback method locally, Thanks.

xnx 4 months ago

Very cool. VLC's AI subtitles are still in preview.

umlx 4 months ago

It's not open source, but PotPlayer supported it in December before VLC's announcement, so I'm not the first. Incidentally, I added this feature in October.
I will imitate the good points of other players with ASR implementation.

sebnun 4 months ago

Looks really nice congrats, I built something similar but for podcasts [0]

I see you are using yt-dlp for YouTube, my app originally also played YouTube videos scraped with yt-dlp, but I found that it is very hard to get YouTube audio "at scale" even when rotating IP, using proxies, etc. You will eventually get blocked, so I dropped the whole thing and focused on podcasts.

But with your app the user runs it locally so that shouldn't be a problem. Good luck.

[0] https://www.langturbo.com

umlx 4 months ago

Looks very interesting, thanks for sharing!

throwaway314155 4 months ago

Looks incredibly useful. What is the accuracy like?

umlx 4 months ago

Thanks, For translating, currently it translates one subtitle at a time, so the accuracy is low because it cannot retain the context before and after the subtitle. I'm addressing this by supporting dual subtitles. Translation is assumed to be used only as a complement.
I would like to improve accuracy by preserving context, but I haven't found a good way to do this at the moment.
If we are talking about the accuracy of the transcription, it is very good if you use a large model. At least the accuracy of whisper is far superior to Youtube's subtitle generation!

mdrzn 4 months ago

Well done!

yungporko 4 months ago

this is a really cool idea