By Denys Medvediev

Guide

How to use voice typing on Windows

Windows has voice typing built in. Press Windows key + H in any text box, wait for "Listening," and talk. It needs internet and a working mic. For heavy, offline, or multi-app use, a dedicated tool with one hotkey does the same job everywhere.

Last updated: June 2026

A Windows laptop and keyboard on a desk, ready for hands-free typing by voice

To use voice typing on Windows, place the cursor in any text box and press Windows key + H. The voice typing bar opens, shows "Listening," and types what you say. It needs a working microphone, an internet connection, and online speech recognition turned on under Settings, Privacy & security, Speech.

Most people never find out Windows can type for them. The feature ships with Windows 10 and 11, it has no setup wizard, and it lives behind a keyboard shortcut nobody mentions: Windows key + H. Put your cursor in a text box, hold the Windows key, tap H, and a small bar opens at the top of the screen and starts listening.

That's genuinely it, and for short bursts it works fine. The catch is the part Microsoft states plainly and most blog posts skip: Windows voice typing routes your speech through Microsoft's cloud, so it needs an internet connection to do anything at all. That one detail decides whether the built-in feature is enough for you or whether you'll want something else. I'll walk the built-in honestly first, then tell you where it stops.

Here's the thing the how-to listicles bury. Windows voice typing is a real, free, built-in feature, and it works in any text box that takes a cursor — your browser, Word, a Slack message, the search bar. You do not install anything. The shortcut is Windows key + H, and once you know it exists, you'll use it.

So the honest answer comes in two halves. Half one: how to turn the built-in on and use it well, which is most of what people searching for this actually need. Half two: where Win+H runs out of road — no internet, long dictation, custom words it keeps mishearing — and what a dedicated tool fixes about each. I'll cover both, set up the alternative in two minutes, and tell you plainly when Win+H is already enough.

What Windows voice typing actually is

A person talking near an open laptop, illustrating speaking instead of typing

Windows voice typing is a built-in feature that lets you enter text by speaking instead of typing. It ships with Windows 10 and Windows 11, costs nothing, and works in any text box where you can place a cursor. Microsoft's own description is worth quoting because it sets the boundaries: voice typing "uses online speech recognition, which is powered by Azure Speech services." Three things follow from that single sentence.

First, it needs internet. Your speech is sent to Microsoft's servers to be turned into text, so with no connection, voice typing does nothing. Second, you need a working microphone — the laptop's built-in one is fine to start. Third, because the recognition happens in the cloud, accuracy is generally good, and it doesn't lean on your CPU. Those are the trade-offs in a nutshell: free and accurate, but online-only and not private.

People often confuse this with the older Windows Speech Recognition, with the newer Voice Access, or with dictation inside a single app. For everyday "I want to talk and watch words appear in whatever I'm writing," the one you want is voice typing, opened with Windows key + H. The next section is the actual how-to.

Turn it on with Windows key + H

There is no app to launch and no wizard to click through. You put your cursor where you want the words, then trigger voice typing with a keyboard shortcut. Here is the whole sequence, with the one settings detour you might need.

Click into any text box — a document, an email, a chat, the address bar — so the cursor is blinking there. Press Windows key + H. A small voice typing bar appears at the top of the screen. Wait for it to say "Listening" before you talk; if you start too early, it clips your first words. Speak normally, and the text appears where your cursor is. Press the microphone button on the bar, or the shortcut again, to stop.

If nothing happens, two things are usually the cause. Your microphone isn't set or allowed — voice typing needs a working mic. Or online speech recognition is switched off, which means the cloud half of the feature is disabled. Turn it on at Start, then Settings, then Privacy & security, then Speech, and set Online speech recognition to On. If the bar opens but never reaches "Listening," it's almost always the internet connection, since the recognition happens on Microsoft's servers rather than your machine. (If Win+H is misbehaving in a more stubborn way, I wrote a separate piece on why Win+H stops working and how to get it back.)

The better way for heavy use: one hotkey everywhere

The built-in is great until you hit one of its walls — no internet on a train, a long block of dictation, or a word it mishears every single time. The fix is a system-wide tool that does the same job but runs on your own machine, holds a short tail so your last word isn't clipped, and uses one hotkey in every app. You need a Windows 10-or-newer PC, a working microphone, and an account. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.

Step 1 — Install Whisper and sign in.

Download from the download page, install, and create a free account. No card. The local transcription pipeline opens right away.

You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.

Step 2 — Pick a transcription path.

The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. To replace Win+H's online-only behaviour with something offline, pick a local engine — more on that two sections down.

You'll know it worked when a model finishes downloading and shows as ready.

Step 3 — Confirm your hotkey.

On Windows the default is Ctrl+Space, held as push-to-talk. Change it in Settings if it clashes with something you already use. Unlike Win+H, holding the key keeps recording for as long as you hold it.

You'll know it worked when a test recording pastes into any text field.

Step 4 — Put your cursor anywhere and talk.

Click into a document, email, or chat box, hold the hotkey, say a sentence, release. The transcript pastes where the cursor is, in whatever app has focus.

You'll know it worked when your spoken sentence is sitting in the text box as text.

Whisper
The real Whisper desktop app on the settings screen, with the Transcription and AI panels open.

The slow part is the one-time model download, not the setup. After that, the act of writing in any app stops being a typing task and becomes a talking task — and it keeps working when the Wi-Fi drops.

If you've used speech to text on Windows 11 before, this is the same idea with a hotkey that doesn't auto-stop on you.

Punctuation: commands versus automatic

Raw speech has no commas. Every dictation tool handles that in one of two ways, and Windows voice typing actually offers both. It has automatic punctuation, which adds commas and periods on its own based on how you speak, and you toggle it from the gear icon on the voice typing bar. And it has spoken commands: say "period" or "full stop," "comma," "new line," "open quotes" and "close quotes," and it inserts the mark instead of the words.

The gear menu on that bar is worth a look once. Beyond automatic punctuation, it holds the profanity filter and the default-microphone choice. None of it is buried; it's one click from the bar that opens with Win+H. While you're speaking, a small indicator shows the feature is listening — the same idea every good dictation tool uses so you're never guessing whether it heard you:

Cancel
A recording indicator: a small capsule that appears while you speak, so you know the tool is listening.

The limit of command-based punctuation is that it makes you narrate the formatting — "comma," "new line," "period" — which is fine for a text but tiring across a long paragraph. Automatic punctuation helps, but it still hands you a literal transcript of what you said, ums and false starts included. Cleaning that into something you'd actually keep is a separate step, and it's where a dedicated tool pulls ahead. More on that below.

Local or cloud: the choice Win+H doesn't give you

Windows voice typing made the local-or-cloud decision for you: it's cloud, full stop. Your speech goes to Microsoft's servers every time. That's fine for a grocery list and a real problem for a salary spreadsheet note or a client email you'd rather not have transcribed off-site. A dedicated tool gives you the choice the built-in skips. Here's how the three paths differ, because the app makes you pick and I'd rather you pick well:

  • Local ParakeetNVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English. If you dictate in English or another European language and want speed with nothing leaving your machine, this is the quick pick.
  • Local Whisperslower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English. The English-only builds are English-only, not 99. Pick this for Chinese, Japanese, Korean, or any translation work, which Parakeet can't do. Default English model is around 480 MB.
  • Cloud (OpenAI, BYOK)best accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. Needs internet, like Win+H, but it's your key and your call. The Cloud surface is part of Whisper Pro.

The boring truth is that for most everyday writing, a local engine is plenty, and it's the one thing Win+H can't offer. Both local paths run fully on your machine with nothing sent to a server. Cloud earns its place when you want top-tier accuracy on a hard recording or you need the model to pull a fact off the web mid-sentence. Start local, and reach for cloud only when local leaves you wanting.

Cleaner results without saying every comma

Both Win+H and a dedicated tool give you the same starting point: a run-on. You say "okay so move the deadline to friday tell the client and book the room for two," and that's the unpunctuated wall any speech engine hands you. Win+H can punctuate as you go or take spoken commands. Neither strips the "ums" or fixes a sentence you started over halfway through.

That cleanup is where an AI pass earns its keep. Say the activation phrase "Hey whisper" and the transcribed text gets enhanced before it lands — filler removed, run-ons split, capitalisation fixed. On a local model that runs through Ollama on your machine; in cloud mode it's gpt-5-mini by default. You speak the messy version once and get back the version you'd actually send.

Thinking...
Raw

okay so move the deadline to friday tell the client and book the room for two um before lunch

Cleaned

Okay, so move the deadline to Friday, tell the client, and book the room for two before lunch.

There's also the words a generic engine keeps fumbling — a product name, a colleague's surname, a bit of jargon. Win+H gives you no way to teach it those. A dedicated tool lets you bias toward custom vocabulary so the words you use every day stop coming back wrong. It won't conjure formatting you didn't ask for, and anyone promising "say heading and watch it style itself" is selling a demo, not a Tuesday. Get the words down fast and clean by voice; do the layout with the keys you already know.

That same speak-then-clean flow is the whole reason people switch — you can type faster with voice across every app instead of narrating commas into a built-in bar that only works online.

When Win+H is already enough

A laptop open on a kitchen counter, suggesting a quick everyday note

Sometimes the free thing already on your machine is the right answer, and pretending otherwise would be dishonest. Windows voice typing is genuinely good for a large slice of what people need, and installing anything extra would be overkill.

Stick with Win+H if you're online most of the time, your dictation comes in short bursts, and you don't mind your speech passing through Microsoft's cloud. A two-line Slack reply, a search query, a quick note in a doc — press Windows key + H, wait for "Listening," talk, done. It's free, it's built in, and it punctuates on its own. For a one-line reminder, I'm not going to tell you to install an app.

Reach for a dedicated, system-wide tool when the built-in starts hurting: no internet on a flight or a train, long stretches of dictation where push-to-hold beats a bar that times out, privacy on text you don't want leaving your machine, custom words it keeps mishearing, or wanting one hotkey that behaves identically in every program. Below that bar, Win+H wins on price and zero setup. Above it, the gap is real.

If you're still deciding which side of that line you're on, the longer comparison in the Win+H alternatives guide lays out exactly where each option fits, without the marketing gloss.

Windows shipped voice typing years ago and hid it behind a shortcut nobody says out loud. Now you know it: Windows key + H, wait for "Listening," talk. For most quick jobs that's the entire answer, and it's free. The day you're offline, or you're dictating something longer than a text, or a word keeps coming back wrong, you'll know exactly which wall you hit — and which tool gets you over it. I wrote a fair bit of this by voice, in an app that doesn't care which text box my cursor is in. The internet went out twice while I did it. The dictation didn't notice.

Try voice typing that works offline too

Hold one hotkey, talk, release. The transcript lands in whatever text box your cursor is in — on a train, on a plane, or with the Wi-Fi down.

Free local mode for any signed-in account. No card required to start.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.