By Denys Medvediev

Troubleshooting

Why is my dictation so inaccurate?

Dictation is usually inaccurate because of setup, not because the software is broken — a bad mic, a noisy room, the wrong language, or the wrong model.

Last updated: June 2026

Close-up of a studio condenser microphone, framing a discussion of why voice dictation misfires

Dictation is usually inaccurate because of setup, not because the software is broken. The biggest culprits are a bad microphone, a noisy room, the wrong language setting, and a model that does not fit your machine. Clean audio in a quiet room with the right language gets most people to around 95% accuracy — about one wrong word in twenty.

I once watched a relative throw a headset across the room. It was the late 1990s, the computer was a Windows 98 desktop with 64MB of RAM, and the software was Dragon NaturallySpeaking. The training took 45 minutes — you read a list of words out loud to "calibrate" it. Then it worked, sort of, at maybe 70% accuracy, with a four-second delay per sentence. Fifteen minutes to dictate one paragraph of a holiday letter. The headset survived. The dictation experiment did not.

I bring that up because the frustration in your question is old, but the cause has changed. Modern dictation does not need a 45-minute calibration ritual. When it gets words wrong now, it is almost never because the model is dumb. It is because the audio reaching the model is worse than you think — and a surprising amount of that is fixable in under a minute. Whisper's own local-mode accuracy lands between 95% and 99% on clean English audio — but that number assumes a few things that often are not true.

This is a diagnostic, not a fix-listicle. We will figure out which of five things is breaking your transcription, in rough order of how often each one is the real reason. If you want the deep microphone-and-custom-words walkthrough, our guide to fixing dictation that types the wrong words owns that ground. This piece helps you find the cause first, so you fix the right thing.

What accuracy is actually realistic

Close-up of a blue bar graph on paper, framing realistic expectations for transcription rates

Here is the number nobody puts on their marketing page. Speech recognition is measured in word error rate, or WER — the share of words the system gets wrong, counting substitutions, deletions, and insertions against what you actually said. Lower is better. A WER of zero is a perfect transcript; word accuracy is just one minus WER.

On the clean LibriSpeech English benchmark, Whisper's medium English model records about 3% WER — roughly 97% accuracy. The small English model lands around 5.1% WER, about 95%. Those are clean-audio numbers: a quiet room, a good mic, a careful reader. Real life adds noise, accents, crosstalk, and jargon, and every one of those legitimately pushes WER up.

So what is normal? About 95% on decent English audio — one wrong word in twenty. That is not a defect. That is the tool working as designed. If you are sitting at 85% in a noisy kitchen on a built-in laptop mic, the software is not broken — the conditions are below what the model needs. The fix is the conditions, not a bigger model. Set the bar at "one small correction per paragraph" and most of the rage drains out of the experience.

The five suspects, in order of likelihood

Magnifying glass on a blue surface, evoking the hunt for what is breaking transcription

When dictation goes wrong, the cause is almost always one of five things. Run down this list in order. The first two catch most cases.

  1. The language setting. You are speaking one language; the tool is listening for another, or guessing.
  2. The microphone. A built-in laptop mic three feet away is hearing your room more than your mouth.
  3. The room. Background noise, a TV, an echoey kitchen — the model transcribes all of it.
  4. The model. You picked one too heavy for your hardware, so it is slow or choking.
  5. The expectation. The audio is fine and the tool is fine; you are measuring against 100%, which nothing hits.

A 60-second self-test: dictate the same two sentences three times — once in a silent room close to the mic, once across the room, once with music playing. If accuracy swings hard between those takes, your problem is audio (suspects 2 and 3), and no software change will beat moving the mic closer and closing the door. If it is bad even on the silent close-up take, look at the language setting and the model. That one test sorts most people in a minute.

Cause 1: the wrong language setting

Two world globes on a gray background, standing in for choosing the right language and accent

This is the ten-second fix nobody checks first. If you know what language you are speaking, pick it explicitly in settings instead of leaving the tool on auto-detect. When you set a specific language, the tool stops trying to guess which language it is hearing and spends all its effort on getting the words right — noticeably faster and more reliable.

The mismatch traps are real. Whisper's multilingual models cover 99 languages with auto-detect, but the English-only models are locked to English — feed them another language and you get nonsense. Local Parakeet handles English plus 24 European languages and nothing outside that set, so dictating Japanese into it will never work no matter how clean your mic is. And if you genuinely code-switch mid-sentence, you want a multilingual Whisper model with auto-detect, not an English-only one. Match the setting to the words coming out of your mouth and a chunk of "inaccuracy" disappears before you touch anything else.

Cause 2: your mic is doing more damage than your accent

Condenser microphone with a pop filter in a studio, illustrating gear that shapes audio quality

People blame their accent. It is almost always the microphone. For years I blamed mine — turns out my voice was fine and my $0 laptop mic was the problem. Here is the opinion I will defend: "AI" does not fix bad audio. A $20 USB microphone does more for accuracy than any model upgrade — the microphone and a quiet room are the two biggest accuracy levers, ahead of which model you pick. Spend the money on hardware before you spend it on a bigger download.

The mechanism is dull and physical. A built-in laptop mic sits a foot or more from your mouth and picks up the desk, the fan, and the room. A headset boom or a USB mic six inches away hears your voice and not much else. The tool can only transcribe what reaches it, and a smeary, distant, noisy signal gives it less to work with — so it guesses, and guesses are how you get the wrong words. I will not re-teach the whole mic-and-vocabulary playbook here; our deep-dive on dictation typing the wrong words covers mic placement, input gain, and custom vocabulary in detail. For this article, the point is narrower: if your three-take test showed accuracy collapsing at distance, your mic is the suspect, not your voice.

Cause 3: the room, not the words

Microphone with a pop filter in a treated music studio, a low-noise environment for clear capture

A mic cannot un-hear a room. If there is a TV on, a dishwasher running, an open-plan office behind you, or kids debating the rules of a board game two meters away, the model transcribes that energy alongside your voice. It does not know which sound is the one you meant.

The fix is embarrassingly low-tech: close the door, turn off the music, move away from the fan. Soft surfaces help — a room with a rug and curtains is kinder to a mic than a tiled kitchen with bare walls, where your voice bounces and arrives twice. You do not need acoustic foam. You need the dishwasher to finish its cycle. I have dictated school emails while making lunchboxes and the model kept up fine — but that is because the kitchen was quiet, not because the software is magic. The moment the blender starts, accuracy drops, and that is not a bug to file.

Cause 4: the model is wrong for your hardware

Whisper
The real Whisper app — it presents three paths and lets you pick the model that fits your machine. Click around the Settings; it's live.

This is the one the competitors treat as a black box, and it matters. Bigger is not always better. Pick a model too heavy for your machine and it runs slow, falls behind, and the experience feels broken even when accuracy on paper is fine.

Whisper by Remskill does not pick a model for you. It presents three paths and lets you choose: Cloud mode using your own OpenAI key, local Parakeet, or local Whisper. Cloud mode runs on any hardware because it is just a network call. Locally, the math is about RAM. On an 8 GB machine, Parakeet (~600 MB), the Base model, or the Small model run comfortably, and the Medium model will struggle. The largest Whisper models — Large v3 at ~3 GB, or Turbo — want 16 GB or more and benefit most from a discrete GPU. The best-accuracy multilingual option is Large v3, which supports 99 languages but needs that 16 GB headroom.

The press-to-talk flow is the same whichever path you pick — hold the hotkey, speak, release, and the text pastes at your cursor. The default hotkey is Ctrl+Space on Windows and the Command+Option chord on macOS, both changeable in Settings. Unsure which model fits your laptop? Our guide to picking the right Whisper model maps each one to the hardware it needs. The rule of thumb: a model that fits and runs fast beats a bigger one that stutters.

When the tool really is the problem, and when it's just physics

Sometimes you have done everything right — close mic, quiet room, correct language, sensible model — and it is still wrong one word in fifteen. That can be the real ceiling. Heavy accents the model has seen little of, dense technical jargon, two people talking over each other, a phone speaker on the other end — these legitimately push WER up, and no setting fully fixes them. For names and domain jargon, local Whisper and Cloud mode let you add a Custom Words list that biases recognition toward the right spelling; Parakeet does not take those hints. But "it learns my voice the more I use it" is a myth from the Dragon era — modern speech-to-text does not adapt to your individual voice over time, and no amount of repetition trains it. The lever is the audio and the settings, not patience.

When to skip Whisper for this

If all you are doing is firing off a 20-word text or a quick note, do not download anything. Your operating system already dictates. On a Mac, Apple Dictation is built in and free — press the Microphone key or the keyboard shortcut, and on supported setups it processes on-device. It stops on its own after 30 seconds of silence, so it suits short bursts more than long-form writing. In Word, Microsoft's Dictate does the same with a microphone and an internet connection.

Reach for a dedicated tool once you are dictating full paragraphs, want it to work offline, or need accuracy on names and jargon the built-in tools fumble — our round-up of Apple Dictation alternatives covers the options. For a one-line reply, the free built-in tool is the right call.

Most of the time the answer to "why is my dictation so inaccurate" is not a confession about your voice. It is a foot of distance to the microphone and a dishwasher you forgot was running. Fix the audio, set the right language, pick a model your laptop can carry, and then judge it against 95%, not 100%. The relative with the Dragon headset was fighting 1999. You are not. You are mostly fighting your kitchen.

Want to find out in a minute?

Download Whisper and run the three-take test — you will know inside a minute whether it is the tool, the room, or just physics.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.