Guide
Dictation software for doctors
This is dictation for a physician's own writing — emails, referral letters, personal notes, admin, research drafts — by voice in any desktop app. It is not a clinical-documentation, EHR, or medical-transcription tool, and it makes no compliance promises.
Last updated: June 2026

Dictation software for doctors, in the sense covered here, is a general-purpose tool a physician uses for their own writing — emails, referral letters, personal notes, admin, research drafts — by voice in any desktop app. A hotkey transcribes speech at the cursor. It runs offline in local mode and is not a clinical-documentation or compliance tool.
I'll start with what this is not, because the keyword "dictation software for doctors" pulls up two very different worlds and conflating them wastes your time. One world is clinical documentation — speaking patient notes into an electronic health record, with the accuracy, integration, and compliance machinery that demands. That is a specialist product category, and Whisper is not in it. I'll say so again later, and I'll point you at the right kind of tool when that's what you actually need.
The other world is everything else a physician writes in a day that has nothing to do with a patient record. The referral letter. The reply to a colleague. The note to the practice manager about the rota. The first messy draft of a paper. The email to the conference organiser. That writing is just typing, the same as anyone else does, and it's the part this guide is about. You can talk it instead of typing it, in any app on your machine, with one hotkey.
Here's the line I want to draw cleanly and not blur once. Whisper is a productivity dictation tool. It turns your speech into text at the cursor in whatever app has focus. It is not a clinical or medical-records tool, it is not for protected health information, and it makes no HIPAA, EHR, or compliance guarantees. Don't use it to dictate patient notes. Use it for your own non-clinical writing.
Within that fence there's a lot of room. Two honest properties make it a sensible fit for a doctor's own writing in particular. Local mode runs fully on your machine, so the text of an email or a draft doesn't leave the laptop — a real property of where the processing happens, not a compliance certificate. And local Whisper takes a custom vocabulary, so the terminology you use every day stops coming out as nonsense. I'll set it up, show the everyday writing it's for, and tell you plainly when to walk away and buy a purpose-built medical product instead.
What this is, and what it is not

What this is: a general-purpose dictation tool that types your spoken words into any desktop app, so a physician can draft their own emails, referral letters, personal notes, admin messages, and research text by talking instead of typing. It behaves the same in your mail client, your word processor, and your browser, because it pastes at the cursor and doesn't care which app the cursor is in.
What this is not, stated plainly so there's no ambiguity: it is not a clinical-documentation tool, not an EHR or EMR add-on, not medical transcription, and not for patient records, diagnosis, or treatment. It makes no HIPAA, GDPR, or any other compliance promise. The honest reasons a doctor might still reach for it are mundane and true — long letters and drafts get tiring to type, and dictating your own correspondence is faster than typing it. That's the whole job. No health claim attaches to any of it.
The reason I keep the fence visible is that the two worlds get sold next to each other, and the gap matters. A clinical product is built around the patient record, with the integration and the compliance work that comes with it. A productivity tool like this one is built around your cursor and your own words. Same verb — dictation — completely different responsibility. If your writing is a patient note destined for a chart, this guide ends here and the "when you need a clinical tool" section is where you should go.
Press a hotkey, talk, text lands at the cursor
The mechanic is plain. You press a hotkey, speak, release, and the transcript pastes at your cursor in whatever text field has focus. Whisper holds a short tail after you let go, so your last word doesn't get clipped. Because it pastes at the operating-system cursor, the app underneath is just "any text box" — your email compose window, a Word document, a referral-letter template you keep in a doc, the body of a research draft. A small capsule appears while you speak so you know it's listening.
There's nothing to wire into a particular program. No plugin per app, no token to paste, no sync job. Your cursor is in the email, you talk, the words appear in the email. The same key fills the next sentence of a paper draft, or a message to the practice manager, or a note to yourself between tasks. One tool, every text field you'd type in anyway.
The hotkey is the one thing to set deliberately. On Windows it's Ctrl+Space; on Mac it's Command+Option, a modifier-only push-to-talk you hold while speaking and release to stop. Both are changeable in Settings if they clash with something you already use. (A hotkey conflict is the single most common "it's broken" report we get, and it's almost never a bug — it's two apps fighting over the same key, which is why every hotkey here is customisable.) If you've set up voice to text on Windows or on Mac before, this is the same muscle memory pointed at your own writing.
Set it up in two minutes (Windows or Mac)
You need a Mac on Apple Silicon or a Windows 10-or-newer PC, a working microphone, and the app you'll actually write in — mail client, word processor, browser — open. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.
Step 1 — Install Whisper and sign in.
Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.
You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.
Step 2 — Pick a transcription path.
The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. For correspondence and drafts you'd rather keep on the machine, start local — more on which one two sections down.
You'll know it worked when a model finishes downloading and shows as ready.
Step 3 — Confirm your hotkey.
Windows defaults to Ctrl+Space, Mac to Command+Option held as push-to-talk. On Mac, grant the Accessibility permission when prompted; without it, the paste-at-cursor can't reach other apps.
You'll know it worked when a test recording pastes into any text field.
Step 4 — Put your cursor in an email or a doc and talk.
Open your mail client or word processor, click where you'd type, hold the hotkey, say a sentence, release. The transcript appears where the cursor is.
You'll know it worked when your spoken sentence is sitting in the email or document as text.
The slow part is the model download, not the setup. Everything else is the four steps above. Once it runs, writing a long referral letter or a reply you'd been putting off stops being a typing task and starts being a talking task — which, at the end of a long day, is a different kind of tired.
The everyday, non-clinical writing it's for
Think of the writing in your day that isn't a patient record. The referral letter to a colleague, which is mostly prose you compose anyway. The email backlog — the conference reply, the message to the practice manager, the answer to a query from administration. The note to yourself about a follow-up or a reading you meant to chase. The first rough draft of a paper, a poster abstract, a teaching slide's worth of text. None of that is clinical documentation, and all of it is faster spoken than typed.
A long letter is where dictation earns its place. Sustained typing for most people sits around forty words a minute; talking runs closer to a hundred and forty-five. You won't produce a finished letter at speaking speed — nobody does — but you'll get the body of it down in roughly a third of the time, then tidy it. The point isn't to skip editing. It's to move the slow first pass from typing speed to talking speed, so the typing you do is correction, not composition.
The honest opinion underneath this whole guide is that most productivity tools are typing problems in disguise. A faster email client, a better template, a tidier inbox — they're all scaffolding around the act of typing. The actual fix for "I spend my evenings answering correspondence" isn't a slicker app. It's not typing it. Talk the reply, fix the two words it got wrong, send it, go home. That's the structural win, and it has nothing to do with any patient.
Local or cloud: keeping your own text on the machine
For a doctor's own writing, the property worth understanding is where the audio is processed. Local mode runs fully on your machine — the words of an email or a draft are transcribed on the laptop and never sent anywhere. That's a statement about plumbing, not a compliance guarantee, and I won't dress it up as one. But it's a real and useful property when the thing you're dictating is your own correspondence and you'd rather it stay yours. Cloud mode sends the audio to OpenAI for transcription, which is the opposite trade. Here's how the three paths differ, because the app makes you pick.
The choice maps to what you're writing and what you care about:
- Local Parakeet — NVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English, and no custom vocabulary. If you write in English and want quick, fully offline dictation for everyday letters and email, this is the simple pick.
- Local Whisper — slower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English, and — the part that matters here — it takes a custom vocabulary. That's where your terminology stops becoming nonsense. Pick this if your drafts are dense with specialist terms, or you write in a language other than English. Default English model is around 480 MB. Still fully on your machine.
- Cloud (OpenAI, BYOK) — best raw accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. It needs internet and sends audio off the machine, so it's the one path that leaves your laptop. The Cloud surface is part of Whisper Pro. For correspondence you'd rather keep local, this is the path I'd skip.
The boring truth is that for most of a doctor's own writing — letters, email, notes, draft prose — local is plenty. Both local engines run fully on your machine with nothing sent to a server. Cloud earns its place when you want top-tier accuracy on a hard recording or you need a fact pulled off the web mid-sentence. If keeping your own text on your own disk is part of why you're here, start local and leave cloud as the exception. None of this changes the fence: it's still not for patient records, whichever path you pick.
Terminology, run-ons, and cleaning up a spoken draft
Raw dictation comes out as a run-on. You say "thanks for the referral I've reviewed the notes and I'd suggest we book a follow up in six weeks and copy the practice manager in," and that's the unpunctuated wall any speech engine hands you. Two things turn that into a letter you'd send: getting the terms right, and cleaning up the mechanics.
Terminology is where general dictation usually falls down, because a speech model guesses at words it doesn't expect. Local Whisper takes a custom vocabulary — your own list of specialist terms, drug names, abbreviations, proper nouns — and biases toward them, so the words you say every day stop coming out wrong. Parakeet and cloud transcription don't take that list, so if your drafts lean heavily on terminology, local Whisper is the path that protects it. For the mechanics — stripping the "ums," fixing the run-on, breaking a monologue into sentences — Whisper can run an AI cleanup pass. Say the activation phrase "Hey whisper" and the text gets enhanced before it lands. On a local model that runs through Ollama; in cloud mode it's gpt-5-mini by default.
thanks for the referral i've reviewed the notes and i'd suggest we book a follow up in six weeks and copy the practice manager in
Thanks for the referral. I've reviewed the notes and I'd suggest we book a follow-up in six weeks, and copy the practice manager in.
A word on what the cleanup pass is for and what it isn't. It's a mechanics pass — punctuation, filler words, sentence breaks. It is not judgement about content, and it is certainly not a clinical check of anything. Treat it as a tidy-up of your own prose and read what it produced before you send it, the same as you'd reread anything you typed. The model fixes the run-on; you stay responsible for every word that goes out.
That same speak-then-clean flow pays off across all your writing — you can also keep your own quick notes by voice the same way, dropping a line into any notes app between tasks instead of typing it out.
When you need a clinical tool instead

This is the section that matters most, so I'll be blunt. If what you're dictating is clinical documentation — a patient note, anything going into an electronic health record, anything that is protected health information, or any medical transcription that carries compliance requirements — then Whisper is the wrong tool and you should stop reading and buy a purpose-built medical dictation product. The Dragon Medical class of software exists for exactly this: built around the patient record, integrated with EHR systems, and sold with the compliance machinery that clinical work demands. Whisper has none of that and claims none of it.
The reason isn't modesty. It's that a productivity dictation tool and a clinical-documentation product are answering different questions. One drops your own words into your own email. The other is responsible for accuracy, integration, and compliance in a regulated record about a patient. I'm not going to blur that line to keep you on this page. If your task lives in the chart, route to a medical product designed for it — that's the honest answer, and it's the one I'd give a colleague who asked.
For very short, non-clinical snippets, the right tool might already be free on your machine. On Windows, Windows key + H opens the built-in Voice Typing bar wherever your cursor is; it punctuates on its own but routes through Microsoft's servers and needs internet, so it isn't an offline option. On Mac, Dictation lets you speak to enter text anywhere you can type, and on Apple Silicon general text can be processed on-device. Below the bar of "a long letter or a real draft," use what's free. Reach for a dedicated tool when the writing gets long, the terminology gets dense, or you want one hotkey that behaves the same everywhere — and reach for a clinical product the moment a patient record is involved.
If the reason you care about local processing is keeping your own text off other people's servers, the broader case for private, on-device speech to text walks through what "local" actually means and where its limits are.
The whole guide is one fence and a lot of room behind it. The fence: this is not a clinical tool, not for patient records, no compliance promise. The room: every email, letter, note, and draft a physician writes that has nothing to do with a chart, talked instead of typed, in any app, offline if you want. I drafted most of this by voice into a text box that was not an EHR, with a tool that doesn't know what an EHR is. That's the point.
Try it on your next letter or email
Hold the hotkey, talk, release. The transcript lands wherever your cursor is — in your email, your draft, or any other app you write in. Not in a patient record.
Free local mode for any signed-in account. No card required to start.



