By Denys Medvediev

Tutorial

Speech to text on Windows 11

Press Windows key + H on a hardware keyboard, put your cursor in any text box, and start talking — the built-in voice typing runs online via Azure. For offline dictation in any app, install a dedicated tool. This guide sets up both, start to finish.

Last updated: June 2026

Hands typing on a laptop keyboard indoors, illustrating dictation as a faster alternative to typing

Press Windows key + H on a hardware keyboard, put your cursor in any text box, and start talking. Your words land at the cursor. The built-in voice typing runs on Azure online speech recognition and needs an internet connection. For offline dictation in any app, install a dedicated tool. This guide sets up both, start to finish.

My older daughter once asked why my emails take so long to send. The honest answer is that I type at about 40 words a minute and I get interrupted roughly every ninety seconds. Voice typing fixed half of that. The trick on Windows 11 is one shortcut most people never find: hold Windows key + H, and a small microphone toolbar appears over whatever you are typing into.

From there you talk, and the words land at your cursor. The opinion I will defend below: for anything past a quick note, the built-in tool is not the one I would reach for.

Two setup paths get you to working speech to text on Windows 11, and the difference between them is where the work happens. Path 1 is the built-in voice typing, which sends your audio to Microsoft's Azure servers, transcribes it there, and sends the text back. That is fine for a Teams message and a problem for a salary spreadsheet on a flight with no Wi-Fi.

Path 2 is a desktop app that does the transcription on your own machine, offline, in any window. By the end of this guide you will have each one running, and you will know which to keep. Most of the support email I get is from someone who picked the wrong path on day one. I am the one who reads it.

Path 1: press Win+H and start talking

Windows 11 · Win + H

Listening…
The built-in Windows 11 voice typing toolbar: a microphone button, a settings gear, and the Listening label.

Prerequisites: Windows 11, an internet connection, a working microphone, and your cursor in a text box. No download or account. Time: under a minute.

1

Open any text box. A Word document, an email, a browser search bar, anywhere you can type.

2

Press Windows key + H on a hardware keyboard. A small floating toolbar appears with a microphone button.

3

Wait for the Listening label, then talk. Your words appear at the cursor.

4

Say stop listening or tap the microphone to stop. The toolbar closes and your dictation stays where the cursor was.

Expected result: the toolbar reads Listening, and the words you say show up at the cursor as you speak. Turning automatic punctuation on (the gear icon on the toolbar) lets it add commas and full stops based on what you say.

If it does nothing: voice typing uses online speech recognition powered by Azure, so it needs an internet connection, a working microphone, and a cursor sitting inside a text box. No connection, no transcription. The full fix list is two sections down.

Coming from an older tutorial that talks about Windows Speech Recognition? That feature was replaced by Voice Access for Windows 11 22H2 and later back in September 2024. The old WSR control panel only lives on in older Windows versions now. So if a guide tells you to open a Speech Recognition wizard and you cannot find it, the guide is out of date, not your PC.

When the built-in voice typing is enough

Laptop and notepad on a wooden desk, a low-stakes workspace where built-in dictation is fine

I am not going to tell you to install software you do not need. For a lot of tasks, Win+H is the right answer, and it costs nothing.

Use the built-in tool when the dictation is short, you have a connection, and the stakes are low. A reply to a coworker on Teams. A quick note in OneNote. A search query you would rather speak than thumb out. It handles automatic punctuation, and it works in any standard text box across Windows 11. For 30-word bursts where you are online anyway, opening a second app would be slower than just talking.

A second built-in feature confuses people, so let me clear it up. Voice Access is not the same thing as voice typing. Voice Access lets you control the whole PC and author text by voice, and unlike Win+H, it runs offline using on-device speech recognition after a one-time language download. It needs Windows 11 version 22H2 or later. So if you need full hands-free PC control (clicking, scrolling, opening apps by voice), Voice Access is the built-in tool to reach for, not voice typing. Different jobs.

Where Win+H falls short (offline, accuracy, languages)

The built-in tool has three real ceilings. None of them are dealbreakers for a quick note. All three start hurting the moment you do longer or more serious work.

Offline

Voice typing needs the internet because the transcription happens on Azure servers, not your laptop. On a plane, on a train through a tunnel, or in a building that eats Wi-Fi, it stops working.

Accuracy

Microsoft publishes no accuracy figure for voice typing, and there is no neutral benchmark I would stake a claim on. What I can tell you is that a cloud model on a flaky connection, a built-in laptop mic, and a strong accent are three separate ways to get a transcript you have to clean up by hand.

Languages

Voice typing supports a fixed, Microsoft-maintained list of around forty languages, and you install each one before you can switch to it. That is plenty for most people and a wall for anyone working across a language Microsoft has not added.

The three real ceilings of built-in voice typing: offline, accuracy, and language coverage.

The privacy angle is the one I think about most. Your dictation — the email to your kid's school, the draft of a contract, the half-formed idea you would never say out loud in a meeting — leaves your machine and goes to a server. For a Teams running five late that is nothing. For the things you care about, it is worth knowing where the audio goes.

Win+H not working? The three usual culprits

When Win+H does nothing, it is almost always one of three things. Check them in this order, because that is how often each one is the cause.

1. No internet, or no working mic.

Voice typing needs a connection and a microphone that Windows can hear. Open Settings, System, Sound and confirm your input device shows movement when you talk.

Test the fix: the toolbar should reach Listening instead of hanging.

2. The cursor is not in a text box.

Win+H only fires when your cursor is inside a field you can type into. Click into a Word document or an email body first, then press the shortcut.

Test the fix: the microphone toolbar appears the moment you press the keys.

3. A laptop function-key layer is stealing the H.

On some laptops the top-row or media keys remap things, and a keyboard utility can intercept the shortcut.

Test the fix: open the touch keyboard's microphone button instead. If dictation works there, the hardware shortcut is the problem, and you remap the key in your manufacturer's keyboard utility.

Check the three usual culprits in order — that is how often each one is the cause.

If none of those land, the deeper issue is usually a language pack that did not finish installing, or a Windows update mid-flight. That is also the point where I stop fighting the built-in tool and set up something I control end to end, which is Path 2 below. If it keeps failing after that, we wrote a separate guide to voice typing not working on Windows with the longer checklist.

Path 2: set up a dedicated dictation app

Whisper is the desktop app I build, and it does the three things Win+H cannot: it transcribes offline on your own CPU, it works through one system-wide hotkey in any application, and it lets you pick the engine for your hardware and languages instead of one fixed cloud model. Here is the full setup, start to finish.

Whisper
The real Whisper desktop app — click around the Settings, Transcription, and AI panels.

Prerequisites: Windows 11, about 1 GB of free disk for a mid-size model, a microphone, and a free account (no payment method to start). You need a connection for the one-time download only; transcription afterward is offline. Time: 5 to 10 minutes, most of it the model download.

1

Download and install Whisper. Grab the installer from the download page and run it. Expected: the app opens to its main window.

2

Sign in. Create the free account when prompted; no card is required to start. Expected: you land on the main screen with Settings available.

3

Pick a local engine and download the model. Choose a Whisper model sized to your PC, or Parakeet for the fastest local option. Expected: a progress bar finishes and the model shows as ready.

4

Confirm the hotkey. The default Windows hotkey is Ctrl+Space: press and hold, talk, release. Change it in Settings if it clashes with something.

5

Test it in any app. Click into any text field (a browser, your code editor, a chat box), hold Ctrl+Space, say a sentence, release. The text lands at your cursor.

Expected result: with the model downloaded, you hold Ctrl+Space in any application, speak, release, and your words paste at the cursor with no internet in the loop after the download. Saying Hey whisper triggers an AI clean-up pass on the text before it lands, if you turn that on.

If the hotkey misfires: rebind it in Settings. I learned this one the hard way. The first version of the hotkey handler fired the recording-stop callback six times per real keypress on Windows, because the Windows input framework generates phantom Ctrl+Space release events at unpredictable intervals. It worked on a clean machine and broke on any laptop with a second language input enabled. It took telemetry, a 50ms guard that was not enough, and finally a 300ms debounce that was. My daughter's verdict stands: this is why dad's emails take forever.

On language coverage, the local Whisper engine handles 99 languages on its multilingual models, while the .en builds are English-only and a touch faster for that one job. The Parakeet engine runs 5 to 10 times faster than Whisper on CPU and covers English plus 24 European languages (25 in total), though it skips Asian languages and translate-to-English. If offline-first is the part that matters to you, our deeper guide to offline speech to text covers the engines in more detail.

Win+H vs Voice Access vs a dedicated app

Three ways to put speech into text on Windows 11, side by side. The table covers only what each tool documents. No invented accuracy or speed scores.

ToolTypeWorks offlinePricing modelLanguagesBest for
Win+H voice typingBuilt into Windows 11No (Azure online)Free with Windows~40, fixed listShort online notes in any text box
Voice AccessBuilt into Windows 11 (22H2+)Yes (on-device)Free with WindowsLimited setHands-free full PC control
Whisper (dedicated app)Install on Windows + macOSYes (local CPU)Free local tier; paid Cloud add-on99 on multilingual Whisper modelsOffline dictation in any app

If your only need is a quick Teams reply while you are online, Path 1 wins on simplicity. It is already on your PC. The moment offline, any-app coverage, or a missing language enters the picture, Path 2 earns the install.

Local vs cloud: which Whisper mode for your PC

Whisper runs in two modes, and the choice comes down to your hardware and whether you want web access.

Local mode does everything on your machine. Pick a Whisper model sized to your PC: Base is around 140 MB and runs on almost anything, Small is around 480 MB, Medium around 1.5 GB, and the multilingual Large v3 is around 3 GB for the best accuracy if you have the RAM. Or pick Parakeet at around 600 MB for the fastest local option if you mostly work in English or European languages. None of it touches the internet after the download.

Thinking...
The AI clean-up pass both local and cloud modes share, running before the text lands.

Cloud mode is the escape hatch. It uses your own OpenAI key: transcription through gpt-4o-mini-transcribe or gpt-4o-transcribe, and web search when you want a live answer pasted at the cursor. You bring the key; we take no cut.

Here is the opinion I will stand behind: try local mode first. If your Windows PC is from the last four years, you do not need cloud for everyday dictation, and local mode keeps your audio on your machine where it belongs. Cloud is the fallback for when you hit a wall, not the default. Whisper is free for the entire local pipeline once you sign in, with no payment method required to start. The Cloud surface is the paid Pro tier. Details are on the pricing page. For the full local walkthrough, the voice to text on Windows guide goes step by step.

When to skip the dedicated app

I would rather you keep Win+H than install something you will not use. Skip a dedicated app, and stay on the built-in voice typing, when all of these are true:

  • You dictate short bursts, not long documents.
  • You are online whenever you dictate.
  • You work only in a language Microsoft's voice typing already covers.
  • Your audio is low-stakes and you do not care whether it leaves the machine.

Win+H is free, already installed, and good at exactly that job. The dedicated app earns its place the moment you cross one of those lines — a plane, a contract draft, a language Microsoft skipped, or any app that is not a standard text box.

Honest pricing

Whisper's local mode is free for everyone who signs in: Whisper and Parakeet transcription, AI enhancement, history, presets, custom hotwords, hardware acceleration, model downloads, and the global hotkey, all of it, with no card required to start. Whisper Pro adds the Cloud features on top: OpenAI cloud transcription, cloud AI enhancement, and voice web search. The built-in Windows voice typing is also free, because it is part of Windows. The full plan breakdown lives on the pricing page. I would rather you read the exact numbers there than trust a figure in a blog post that drifts out of date.

Two paths, one decision. If you are online, the note is short, and the stakes are low, press Windows key + H and talk — it is free and already on your PC. The moment you need it on a plane, in any app, in a language Microsoft skipped, or with your audio staying on your own machine, set up the dedicated app instead. I crossed that line somewhere around the third meeting note I dictated one-handed while making lunchboxes, and I have not typed a long email since.

Try it offline on your own PC

Download Whisper, hold Ctrl+Space, talk, release — your words paste at the cursor in any app, with no internet in the loop.

Free local mode for any signed-in account. No card required to start. If Win+H already does everything you need, keep the shortcut — it is a good one.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.