By Denys Medvediev

Comparison

Win+H alternative

Win+H is Windows Voice Typing — free, built in, and good for short bursts. But it runs in Microsoft's cloud, needs internet, and has no custom vocabulary. The alternative most people want is offline, with a hold-to-talk hotkey and an AI cleanup pass.

Last updated: June 2026

A Windows keyboard on a dark desk with the Windows key in focus, evoking the Win+H voice typing shortcut

The best Win+H alternative is a system-wide dictation tool that runs offline, like Whisper by Remskill. Win+H (Windows Voice Typing) is free and built in, but it uses Microsoft's online speech service and needs internet. A local tool transcribes on the device, adds custom vocabulary, and uses a hold-to-talk hotkey that does not auto-stop.

Win+H is the free dictation already on your PC, and for a lot of people it is genuinely enough. You press Windows key + H, a little bar pops up wherever your cursor is, you talk, and text appears. No install, no account, no cost. I want to say that plainly up front, because the internet is full of pages pretending the built-in option is garbage so they can sell you something. It isn't garbage. It's fine.

The trouble starts when "fine for short bursts" runs into "I do this all day." Voice Typing sends your speech to Microsoft's cloud, so it needs a steady internet connection. It has no custom vocabulary, so it never learns your product names or your colleagues' surnames. And the shortcut itself collides with things — plenty of apps have already claimed Win+H or the keys around it. If you've hit any of those walls, you're not looking for a fix. You're looking for an alternative.

Here's the boring truth most of these pages skip. Win+H is a good free tool with three specific limits: it's cloud-based, so no internet means no dictation; it has no way to teach it your jargon; and it's a tap-to-toggle bar that listens until something stops it, rather than a key you hold while you talk. None of those are bugs. They're design choices, and they're the right choices for the casual user Microsoft built it for.

So the real question isn't "how do I make Win+H better." You mostly can't — the limits are baked in. The question is "what do I run instead when those limits start hurting," and the honest answer depends on whether you want offline privacy, your own vocabulary, a hotkey that stays out of the way, or an AI pass that cleans up the run-on before it lands. I'll walk all of it, set one up in two minutes, and tell you plainly when Win+H is still the right call.

What Win+H actually is, and who it's for

A person using a Windows laptop on a desk, illustrating built-in voice typing

Win+H is the keyboard shortcut for Windows Voice Typing. Press the Windows logo key and H together on any Windows 11 PC, a microphone bar appears, and whatever you say gets typed into the text box your cursor is in. Microsoft's own support page is clear about what powers it: "Voice typing uses online speech recognition, which is powered by Azure Speech services." It works in roughly 40 languages, it can insert punctuation automatically if you turn that setting on, and it costs nothing because it ships with Windows.

Credit where it's due, because being fair here matters. For short, casual dictation, Win+H is genuinely good. The accuracy on clear English is solid. The setup is zero — there's no account, no download, no model to wait on. If you want to fire off a two-line Teams message or a quick search query by voice, you press one shortcut and you're done. For that job, paying for anything else would be silly, and I'll say so again later in plainer terms.

It's aimed at the person who dictates occasionally, not the person who dictates for a living. That framing explains every limitation that follows. Microsoft built a free, simple, cloud-backed feature for the average user who wants to talk instead of type now and then. It did not build a power tool, and it never pretended to. The mismatch only shows up when you try to use a casual tool for a heavy job.

Why people go looking for a Win+H alternative

Three things send people searching. First, the internet requirement. Microsoft's support documentation states plainly: "To use voice typing, you'll need to be connected to the internet." Voice Typing processes your speech in the cloud, not on your machine, so on a train, on a plane, in a dead zone, or on a locked-down work network, it simply doesn't work. For anyone dictating anything they'd rather not send to a server — a client email, a medical note, a half-formed idea about the business — the cloud round-trip is the dealbreaker, not the speed.

Second, there's no custom vocabulary. Win+H won't learn that your product is spelled "Remskill" not "rem skill," or that your colleague is "Siân" not "Shawn." Every session starts from scratch. Third, the shortcut collides. Win+H is a global shortcut, and other apps grab it or the keys near it, so the thing that's supposed to be one quick press turns into a fight over who owns the chord. An alternative fixes all three at once: it runs on the device, it takes a custom word list, and it lets you pick a hotkey nothing else touches.

That last point is worth seeing rather than reading. The alternative most people land on is a hold-to-talk hotkey: you press and hold a key, speak, and release, and the text pastes at your cursor. It stays on for exactly as long as you hold it — no auto-stop after a pause, no bar to dismiss. A small capsule shows up while you speak so you know it's listening:

Cancel
The recording overlay: a small capsule that appears while you hold the hotkey, so you know it's listening.

Set up the alternative in two minutes

The alternative I'll use here is Whisper by Remskill, because it's the one that closes all three Win+H gaps — offline, custom vocabulary, your own hotkey. You need a Windows 10-or-newer PC (it runs on Mac too), a working microphone, and about two minutes. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.

Step 1 — Install Whisper and sign in.

Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.

You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.

Step 2 — Pick a transcription path.

The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. To beat the cloud limit Win+H has, pick a local one — more on which two sections down.

You'll know it worked when a model finishes downloading and shows as ready.

Step 3 — Set a hotkey that nothing else uses.

Windows defaults to Ctrl+Space, held as push-to-talk. If that clashes with something you run, change it in Settings — the whole reason to leave Win+H is that you get to own this key, not fight for it.

You'll know it worked when a test recording pastes into any text field.

Step 4 — Add your custom words, then talk.

Drop your product names, surnames, and acronyms into the hotwords list so they come out spelled right. Then put your cursor anywhere, hold the hotkey, say a sentence, release.

You'll know it worked when "Remskill" comes out as Remskill and your sentence is sitting in the text box.

Whisper
The real Whisper desktop app on the settings screen, with the Transcription and AI panels open.

The slow part is the model download, not the setup. Everything else is the four steps above. Once it's running, dictation stops being a feature you summon and becomes a key you hold, in any app, online or not.

voice to text on Windows · on Mac

Win+H versus a dedicated tool, honestly

Start with where Win+H wins, because it does win on two things and pretending otherwise would be dishonest. It's free, full stop — nothing to buy, ever. And it's already installed, so there's no download and no account. If those two are what you care about most, the comparison can end here and Win+H takes it. A dedicated tool asks you to install something and sign in; Win+H asks for nothing.

Now the rest of the table. On privacy, Win+H is cloud-only — Microsoft's docs draw the line themselves: online speech recognition "uses Microsoft cloud-based services" and "Voice data is sent to Microsoft," while device-based recognition "processes your voice locally on your device" and "No voice data is sent to Microsoft." Voice Typing uses the online path. A local alternative keeps everything on the machine. On custom vocabulary, Win+H has none; a dedicated tool takes a word list. On the hotkey, Win+H is a fixed global shortcut that listens until interrupted; a hold-to-talk tool gives you a key you choose and hold. On cleanup, Win+H does live auto-punctuation; a dedicated tool can run a full AI pass that fixes filler words and run-ons, not just commas.

And the auto-stop. Win+H is built to read a pause as "you're done" and switch itself off after a few seconds of silence — a behavior I dug into separately in why Windows dictation keeps stopping. For short messages that's fine. For thinking out loud, where you pause to find the word, it's maddening. A hold-to-talk key sidesteps the whole thing: it's on while you hold it and off when you let go, and silence in the middle changes nothing. So the honest scoreboard: Win+H wins on free and pre-installed; the alternative wins on offline, vocabulary, hotkey control, cleanup, and not stopping on you. Pick the row that matches your day.

Local or cloud: which mode replaces Win+H

If the reason you're leaving Win+H is the internet requirement or the privacy, the answer is local mode. The whole appeal of an alternative is that the transcription happens on your machine, with nothing sent to a server — the opposite of the cloud round-trip that stops Voice Typing the moment your connection wobbles. If your PC is from the last few years, local handles everyday dictation without complaint, and cloud becomes an option you reach for rather than a dependency you're stuck with.

Here's how the three paths differ, because the app makes you pick and I'd rather you pick well:

  • Local ParakeetNVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English. If you dictate in English or another European language, this is the quick, fully offline pick that does what Win+H does without the cloud.
  • Local Whisperslower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English. The English-only builds are English-only, not 99. Pick this for Chinese, Japanese, Korean, or any translation work, which Parakeet can't do. Default English model is around 480 MB.
  • Cloud (OpenAI, BYOK)best accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. It needs internet, like Win+H does, so it's the one path that leaves your machine. The Cloud surface is part of Whisper Pro.

The boring truth is that for the kind of text Win+H handles today — emails, messages, notes — either local engine is plenty, and both run fully offline. That's the single biggest practical difference from the built-in option: no connection, still works. Cloud earns its place when you want top-tier accuracy on a hard recording or you need the model to pull a fact off the web mid-sentence. If you came here to escape the internet dependency, start local and treat cloud as the escape hatch, not the default.

The AI cleanup pass Win+H doesn't do

Raw dictation comes out as a run-on. You say "okay so email the client about the remskill rollout push it to thursday and ask about the budget," and that's the unpunctuated wall any speech engine hands you. Win+H will sprinkle in commas and periods as you speak, which is real and useful. What it won't do is rewrite the mess — strip the "ums," fix the broken grammar, turn a spoken ramble into something you'd actually send.

That's the gap an AI pass fills. Say the activation phrase "Hey whisper" and the text gets enhanced before it lands: filler removed, run-ons split, your custom words spelled right because you taught them to the tool. On a local model that runs through Ollama, so the cleanup happens on your machine too; in cloud mode it's gpt-5-mini by default. Win+H gives you punctuation. This gives you a finished sentence.

Thinking...
Raw

okay so email the client about the remskill rollout push it to thursday and ask about the budget um before the call

Cleaned

Okay, so email the client about the Remskill rollout, push it to Thursday, and ask about the budget before the call.

The custom-vocabulary piece is the part Win+H structurally can't match. Because the alternative transcribes on your machine with your own word list, it knows "Remskill" is a product and "Siân" is a name, and it gets them right every time instead of every other time. For anyone who dictates the same proper nouns all day — a sales rep with a CRM full of surnames, a developer naming the same services — that's the difference between text you keep and text you fix. Win+H starts every session as a stranger; a tool with hotwords remembers.

That same speak-then-clean flow is the whole reason voice beats the keyboard for volume — you can type faster with your voice across every app, so a long message becomes a few spoken sentences instead of a paragraph you hammer out by hand.

When Win+H is all you need

A person relaxed at a laptop sending a quick message, illustrating casual built-in dictation

I'd be lying if I told everyone to install something. For a real slice of people, Win+H is the right answer and a dedicated tool is overkill. If you dictate occasionally — a quick message, a search box, a short note — and you're nearly always online, the built-in feature costs nothing and works well. Don't download an app to do what the Windows key + H shortcut already does for free.

Specifically, stay with Win+H if you're always connected to the internet and don't care that your speech routes through Microsoft's cloud; if you dictate in short bursts rather than long passages, so the silence auto-stop never bothers you; if you never need it to spell custom names or jargon; and if the Win+H shortcut doesn't clash with anything you run. That's a genuine profile, not a strawman — it describes a lot of casual users, and for them the alternative adds friction without adding value. Free and pre-installed is a strong combination when the limits don't touch you.

The line to cross is when the limits start costing you time. Reach for an offline, system-wide tool when you dictate where there's no signal, when you want your words to stay on your machine, when you're tired of the same names coming out wrong, or when you want a hotkey you hold that never stops mid-thought. Below that bar, Win+H wins on price and zero setup, and I'll happily tell you to keep it.

And if your real complaint with Win+H is the constant cutting-out rather than the cloud, the fix-first walkthrough in why Windows dictation keeps stopping covers what you can actually steady before you decide to switch at all.

Win+H is the rare free tool that's actually good, which is why I spent this whole piece refusing to trash it. It does one job well: short, online, casual dictation, for nothing. The alternative is for the other job — the all-day, offline, my-own-words, hold-the-key-and-think job. I dictated most of this comparison with a hotkey I picked myself, on a plane with the wifi off, while Win+H sat there waiting for a connection it wasn't going to get. Pick the tool that matches the flight you're on.

Try the offline alternative to Win+H

Hold a hotkey you picked, talk, release. The text lands wherever your cursor is — online or off, in every app.

Free local mode for any signed-in account. No card required to start.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.