Handy – Free open source speech-to-text app

195 points by tin7in 17 hours ago

I’ve tried several, including this one, and I’ve settled on VoiceInk (local, one-time payment), and with Parakeet V3 it’s stunningly fast (near-instant) and accurate enough to talk to LLMs/code-agents, in the sense that the slight drop in accuracy relative to Whisper Turbo3 is immaterial since they can “read between the lines” anyway.

My regular cycle is to talk informally to the CLI agent and ask it to “say back to me what you understood”, and it almost always produces a nice clean and clear version. This simultaneously works as confirmation of its understanding and also as a sort of spec which likely helps keep the agent on track.

UPDATE - just tried handy with Parakeet v3, and it works really well too, so I'll use this instead of VoiceInk for a few days. I just also discovered that turning on the "debug" UI with Cmd-shift-D shows additional options like post processing and appending trailing space.

thethimble 6 hours ago

I wish one of these models was fine tuned for programming.
I want to be able to say things like "cd ~/projects" or "git push --force".
- netghost 5 hours ago
  
  I'll bet you could take a relatively tiny model and get it to translate the transcribed "git force push" or "git push dash dash force" into "git push --force".
  Likewise "cd home slash projects" into "cd ~/projects".
  Maybe with some fine tuning, maybe without.

blutoot 14 hours ago

I have dystonia which often stiffens my arms in a way that makes it impossible for me to type on a keyboard. TTS apps like SuperWhisper have proven to be very helpful for me in such situations. I am hoping to get a similar experience out of "Handy" (very apt maming from my perspective).

I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.

Perhaps this is a bit of combining TTS with computer-use.

mritchie712 11 hours ago

I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.
I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.
When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.
I can clean it up and push to github if anyone would get use out of it.
- mritchie712 8 hours ago
  
  https://github.com/definite-app/ultraplan
- heliostatic 8 hours ago
  
  Definitely interested in that!
  
  mritchie712 8 hours ago
  
  Added link above!
- wanderingmind 11 hours ago
  
  Sounds interesting I would love to use it if you get a chance to push to github
  
  mritchie712 8 hours ago
  
  https://github.com/definite-app/ultraplan
eddyg 12 hours ago

There’s lots of existing work on “coding by voice” long before LLMs were a thing. For example (from 2013): http://xahlee.info/emacs/emacs/using_voice_to_code.html and the associated HN discussion (“Using Voice to Code Faster than Keyboard”): https://news.ycombinator.com/item?id=6203805
There’s also more recent-ish research, like https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130
sipjca 13 hours ago

I totally agree with you and largely what you’re describing is one of the reasons I made Handy open source. I really want to see something like this and see someone go experiment with making it happen. I did hear some people playing with using some small local models (moondream, qwen) to get some more context of the computer itself
I initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases
hasperdi 13 hours ago

What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.
That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
- sipjca 13 hours ago
  
  Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)
  
  sanex 8 hours ago
  
  I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out. I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.

Barbing 4 hours ago

Quick thoughts re: mentioned transcribers

Superwhisper — Been using it a long time. It's paid with a lifetime subscription available. Tons of features. Language models are built right in without additional charge. Solo dev is epic; may defer upgrades to avoid occasional bugs/regressions (hey, it's complex software).

Trying each for a few minutes:

Hex — Feels the leanest (& cleanest) free options mentioned for Mac in this thread.

Fluid Voice — Offers a unique feature, a real-time view of your speech as you talk! Superwhisper has this, but only with an online model. (You can't see your entire transcript in Fluid, though. The recording window view is limited to about one sentence at a time--of course you do see everything when you complete your dictation.)

Handy — Pink and cute. I like the history window. As far as clipboard handling goes, I might note that the "don't modify clipboard" setting is more of a "restore clipboard" setting. Though it doesn't need as many permissions as Hex because it's willing to move clipboard items around a bit, if I'm not mistaken.

Note Hex seems to be upset about me installing all the others... lots of restarting in between installs all around. Each has something to offer.

---

Big shout out to Nvidia open-sourcing Parakeet--all of these apps are lightning fast.

Also I'm partial to being able to stream transcriptions to the cursor into any field, or at least view live like Fluid (or superwhisper online). I know it's complex b/c models transcribe the whole file for accuracy. (I'm OK with seeing a lower quality transcript realtime and waiting a second for the higher-quality version to paste at the end.)

kuatroka 13 hours ago

Love it. I had been searching for STT app for weeks. Every single app was either paid as a one off or had a monthly subscription. It felt a bit ridiculous having to pay when it’s all powered by such small models on the back end. So I decided to build my own. But then I found “Handy” and it’s been a really amazing partner for me. Super fast, super simple, doesn’t get in my way and it’s constantly updated. I just love it. Thanks a lot for making it! Thanks a lot

P.S. The post processing that you are talking about, wouldn’t it be awesome.

frankdilo 15 hours ago

This looks great! What’s missing for me to switch from something like Wispr Flow is the ability to provide a dictionary for commonly mistaken words (name of your company, people, code libraries).

tin7in 15 hours ago

It has something called "Custom Words" which might be what you are describing. Haven't tested this feature yet properly.
sipjca 14 hours ago

There’s a PR for this which will be pulled in soon enough, I can kick off a build of the PR if you want to download a pre release version
- sipjca 13 hours ago
  
  Okay so it's more directly text replacements
  https://github.com/cjpais/Handy/actions/runs/21025848728
  There is also LLM post processing which can do this, and the built in dictionary feature
jauntywundrkind 15 hours ago

I dig that some models have an ability to say how sure they are of words. Manually entering a bunch of special words is ok, but I want to be able to review the output and see what words the model was less sure of, so I can go find out what I might need to add.

mncharity 7 hours ago

A cautionary user experience report. The default hotkey upon download is ctrl+space. Press to begin recording, release to transcribe and insert. Key-up on the space key constitutes hotkey release. If the ctrl key is still down when the insertion lands, the transcribed text is treated as ctrl characters. The test app was emacs. (x64 linux x11, with and without xdotool)

PhilippGille 14 hours ago

Has anyone compared this with https://github.com/HeroTools/open-whispr already? From the description they seem very similar.

Handy first release was June 2025, OpenWhispr a month later. Handy has ~11k GitHub stars, OpenWhispr has ~730.

kuatroka 13 hours ago

I did have tried, but the ease of installing handy as just a macOS app is so much simpler than needing to constantly run in npm commands. I think at the time when I was checking it, which was a couple of months ago they did not have the parakeet model, which is a non-whisper model, so I had decided against it. If I remember correctly, the UI was also not the smoothest.
Handy’s ui is so clean and minimalistic that you always know what to do or where to go. Yes, it lacks in some advanced features, but honestly, I’ve been using it for two months now and I’ve never looked back or searched for any other STT app.
- ranguna 12 hours ago
  
  The OP asked if someone compared both, which usually means actually trying both and not just installing one and skimming through the other's README file. So, in summary, you didn't try both and didn't answer the OP.

Jayakumark 10 hours ago

Its great, i have been using it . Two requests though 1. iOS app 2. API option to use against meeting transcription or route audio from Mic .

blensor 9 hours ago

+1 on the meeting tranecription

holtwick 10 hours ago

FluidVoice for macOS is pretty handy as well. Open source under Apache License. https://altic.dev/fluid https://github.com/altic-dev/FluidVoice

jimmydoe 9 hours ago

Its vibe coded UI feels too complicated.

fittingopposite 2 hours ago

Is there any good android app featuring parakeet v3?

Jack5500 15 hours ago

The Parakeet V3 model is really great!

aucisson_masque 12 hours ago

It’s incredibly fast on my MacBook m1 air and more accurate that the native speech to text.

The ui is well thought out, just the right amount of setting for my usage.

Incredible !

Btw, do you know what « discharging the model » does ? It’s set to never by default, tried to check if it has an impact on ram or cpu but it doesn’t seem to do anything.

mixtureoftakes 11 hours ago

the model is permanently loaded into ram for access speed. discharging it would unload it from ram and lead to longer start times
- sipjca 11 hours ago
  
  It does unload it, and actually might be a good default for most people as the model loading does happen in the background as soon as you hit the key

peterldowns 6 hours ago

Huge fan! Parakeet v3 works great with it. I have used Monologue, Superwhisper, and Aqua, at various times in the past. But Handy is at least as good, and it's not an expensive subscription. I love that it runs locally, too. Strongly recommend!

llarsson 14 hours ago

A question because I'm not using speech-to-text, but find it intriguing (especially since it's now possible to do locally and for free).

How have your computing habits changed as a result of having this? When do you typically use this instead of typing on the keyboard?

tin7in 14 hours ago

I use it all the time with coding agents, especially if I'm running multiple terminals. It's way faster to talk than type. The only problem is that it looks awkward if there are others around.
- johnisgood 14 hours ago
  
  Interesting. I can think and type faster, but not talk. I am not much of a talker.
  
  stavros 13 hours ago
  
  Same, whenever I try to dictate something I always umm and ahhh and go back a bunch of times, and it's faster to just type. I guess it's just a matter of practice, and I'm fine when I'm talking to other people, it's only dictation I'm having trouble with.
noneofyour 14 hours ago

Part of my job is to give feedback to people using Word Comments. Using STT, it's been a breeze. The time saving really is great. Thing is, I only do this when working at home with no one around. So really only when WFH.

dumbmrblah 14 hours ago

I just set this up today. I had Whispering app set up on my Windows computer, but it really wasn't working well on my Ubuntu computer that I just set up. I found Handy randomly. It was the last app I needed to go Linux full-time. Thank you!

unutranyholas 9 hours ago

https://hex.kitlangton.com/ is good

erelong 8 hours ago

WhisperTux on linux worked ok, curious how Handy compares: https://github.com/cjams/whispertux

mnmalst 8 hours ago

This is really cool. Works out of the box and I'm typing this using handy.

Is there any way to execute commands directly on Linux?

Also a feature to edit or correct already typed text would be really great.

oybng 9 hours ago

On Windows this depends on webview2, which the installer attempts to download. No mention of this requirement in the readme. It's a shame this software isn't portable

wi5eif6E 12 hours ago

This looks and works great! A settings option to keep no recording history at all would be terrific.

walthamstow 11 hours ago

Nice. I spent most of Christmas vibe coding with Google Antigravity with one hand while holding a sleeping baby in the other. MacOS built in dictation is OK, but struggles with technical language.

qprofyeh 12 hours ago

As a Mac user, am I missing something? macOS has Dictation built-in, when you short press F5 it should start transcribing your spoken words into text in real time. It even does non-English languages.

d4rkp4ttern 10 hours ago

Besides being trash as others said, there’s a trade off with real time transcription word by word - there’s no opportunity for an AI to holistically correct/clean up the transcription
- SkyPuncher 2 hours ago
  
  But, OSX does come back and fix things.
  
  d4rkp4ttern 2 hours ago
  
  You mean, after displaying each word as it is spoken, then OSX goes back and fixes what’s been displayed? I think I’ve seen it fix one or two recent words, but I guess you’re saying it could fix the entire sentence as well. I didn’t know that
luigi23 12 hours ago

it's trash if:
- you're not a native speaker or have accent
- using airpods mic
- surroundings is noisy
- use novel words like 'claude code'
- mumble a bit

vladstudio 15 hours ago

Use it daily. Looks and works great.

miniwark 13 hours ago

Did this thing (or open-whispr) work well with other languages than english ?

wi5eif6E 12 hours ago

German also works great.
dawkins 13 hours ago

In Spanish works very well

mrroryflint 14 hours ago

On a M4 Macbook Air, there was enough lag to make it unusable for me. I hit the shortcut and start speaking but there was always a 1-2sec delay before it would actually start transcribing even if the icon was displayed.

jborichevskiy 14 hours ago

Curious if you were using AirPods or other Bluetooth headphones for this?
If so, there should be "keep microphone on" or similar setting in the config that may help with this, alternatively, I set my microphone to my MacBook mic so that my headphones aren't involved at all and there is much less latency on activation
- mrroryflint 9 hours ago
  
  Airpods Max (is that the name?) - the big ones.
kuatroka 13 hours ago

Yes, I’ve got the same situation too. I kind of learned to wait for one or two seconds before talking. I am using it with the AirPods, so maybe it’s indeed the Bluetooth thing.
sipjca 14 hours ago

What microphone are you using?
- mrroryflint 9 hours ago
  
  Airpods Max (is that the name?) - the big ones.

bn-usd-mistake 13 hours ago

Does anyone have a similar mobile application that works locally and is not too expensive? Mostly looking to transcribe voice messages sent over Signal which does not offer this OOTB

4mitkumar 11 hours ago

I have been using this one from Futo for quite some time and love it: https://keyboard.futo.org/
They also have a voice input only version if you still would like to keep your typing keyboard: https://voiceinput.futo.org/
bogtap82 13 hours ago

There is one single app I've been able to find that offers Parakeet-v3 for free locally and it's called Spokenly. They have paid cloud models available as well, but the local Parakeet-v3 implementation is totally free and is the best STT has to offer these days regardless. Super fast and accurate. I consider single-user STT basically a solved problem at this point.
- kuatroka 8 hours ago
  
  Spokenly is great too, but Handy's minimalistic and focused UI won me over.
- dumbmrblah 8 hours ago
  
  Spokenly is my go-to app on iOS for transcription as well.
nerdfax 8 hours ago

[dead]

chainmail2029 15 hours ago

There's a slightly awkward naming overlap with an existing product.

unwind 14 hours ago

Which one? I did a quick search but that didn't turn up anything so perhaps it's a partial word overlap or something.
I did find the projects "user-facing" home page [1] which was nice. I found it rather hard to find a link from that to the code on GitHub, which was surprising.
[1]: https://handy.computer/
- DomB 14 hours ago
  
  It's the German word for smart phone / mobile phone
- zavec 14 hours ago
  
  There's also a sex toy
- sReinwald 14 hours ago
  
  [dead]
ensocode 14 hours ago

This is a slightly German-centric comment.
xfeeefeee 14 hours ago

[dead]

jborichevskiy 15 hours ago

Big Handy fan!

skor 13 hours ago

This is so handy, thank you very much. Good work!!

ekjhgkejhgk 10 hours ago

Explain to me why a speech-to-text app has 50% of its code in typescript...?

beklein 9 hours ago

Not the author/contributor, but the app is built using Tauri for easy multi-platform support, so the backend logic is implemented in Rust and the frontend UI is implemented in TypeScript. I think it’s a valid choice. GitHub does not include any model _code_ in the stats; the models will be downloaded separately the first time you use them. Hope this helps.
I know many people hate sites like this, but I actually like them for these use cases. You can get a quick, LLM-generated overview of the architecture, e.g. here: https://codewiki.google/github.com/cjpais/handy

dotancohen 15 hours ago

Looks interesting. Why does it need a GUI at all?

tin7in 15 hours ago

As an alternative to Wisprflow, Superwhisper and so on. It works really well compared to the commercial competitors but with a local model.
sipjca 14 hours ago

It doesn’t! Just makes it more accessible to more people I feel. There’s a cli version for Mac which I wrote first handy-cli
unwind 14 hours ago

Ah, that was a typo: you meant "GPU" (Graphics Processing Unit, not "GUI" which of course is Graphical User Interface) since that is listed in the system requirements. Explained implicitly by an existing comment, thanks!
Barbing 15 hours ago

I hear a CLI request? Tons of CLI speech-to-text tools by the way, really glad to see this. Excellent competitors (Superwhisper, MacWhisper, etc.) are closed/paid.
kristianp 15 hours ago

So more people can use it?
satvikpendem 15 hours ago

Because local AI models run well on a GPU, better than on a CPU

laylower 12 hours ago

Is it deployed locally or does it send data to your servers?

sipjca 11 hours ago

It’s all local
- mixtureoftakes 10 hours ago
  
  Which model would be the best to use for mandarin? Are there any models on par with Parakeet that are just as fast but also understand Chinese?
  
  mixtureoftakes 9 hours ago
  
  also is there a way to make parakeet type more naturally? less capitallization, less punctuation? can this be a setting?
  this can already be done via local llm processing the text but surely there is an easier way to do this, right

Dnguyen 14 hours ago

Would be nice if the output can be piped directly into Claude Code.

blutoot 14 hours ago

Crashes on Tahoe 26.3 Betq 1 :(

sipjca 14 hours ago

Please send me a crash log!

sirjaz 9 hours ago

This is great, and I love that this is not another webapp