Guide
Voice typing software: free built-in options vs paid tools
Some of the best voice typing software is already on your machine. Here is which one to use, and when paying for a dedicated app actually makes sense.
Last updated: June 2026

Voice typing software listens through a microphone and writes down what you say, turning speech into editable text at around 145 words a minute against about 40 for typing. The good tools run system-wide, so the words land at the cursor. Some are free and built into Windows and Mac; paid apps add offline mode and AI cleanup.
Talk, and the words land at your cursor
The first time it works, it feels like a small magic trick. You hold a key, say a sentence, let go, and the sentence is just there in your email. No keyboard. My younger relative once dictated a 90-word note to her grandmother before I had finished explaining what dictation was. The hard part was never the speaking. For two decades the hard part was the software being good enough to trust, and that part finally got solved.
This article is about which voice typing software is worth your time, including the free options you already own.
Most people stuck on a keyboard are doing it out of habit, not need. Typing is a learned compromise, a way to get thoughts out of your head and into a machine that does not have ears. Voice typing software removes the compromise. The question stopped being whether it works around 2022 and became which one, and do I need to pay for it.
The honest answer depends on three things: how long you dictate, whether you want it to work in every app, and whether you care that your words never leave your laptop. By the end of this you will know which path fits, and I will tell you when the free built-in option is all you need. I read most of the support email for this app, and a steady share of it comes from people who paid for a tool when the one already on their machine would have done the job. So I have a small stake in talking you out of a purchase.
What voice typing software is

Voice typing software is a program that captures audio from your microphone and converts it to written text using a speech recognition model. The older name is dictation software. The newer marketing name is AI dictation, which mostly means the same thing with a language model bolted on to fix punctuation and tone.
There are three shapes it comes in. Built-in dictation ships with your operating system: Windows Voice typing, Apple Dictation. Browser-based voice typing lives inside one app, like Google Docs Voice typing. And dedicated desktop apps install separately and work across everything you type in. The shape matters more than the brand. A browser tool that only writes inside Google Docs is useless for your Slack messages, no matter how good its accuracy is. The first question to ask about any tool is not how accurate it is but where it lets you type. Accuracy is now a solved problem for most of them; reach is not.
The thing that separates a serious tool from a toy is where it pastes. Built-in and dedicated desktop tools are system-wide: press the hotkey in any text field and the text appears there. That is the whole game. Everything else, accuracy, languages, AI cleanup, is a refinement on top of whether it types where you are looking.
A second thing separates the categories: what the model can hear. Some tools only handle English. Others handle dozens of languages and can switch mid-sentence. Whisper's English-only models support exactly one language, while its multilingual builds cover 99. NVIDIA's Parakeet sits in the middle at 25 languages, English plus 24 European ones. If you only ever write in English, none of that matters and you should pick on speed instead. If you draft in two languages before lunch, it matters a lot. Most people overestimate how many languages they need and underestimate how much they care about latency. The lag between letting go of the key and seeing text is the thing you feel every single time.
How it works (and why accuracy finally got good)
Under the hood the pipeline has three steps. Your microphone records a short clip of audio. A speech recognition model turns that audio into text. Then the text gets pasted, sometimes after a language model tidies it up.
The accuracy jump everyone noticed came from the model in the middle. The open-source Whisper model from OpenAI changed what good meant. It handles accents, background noise, and 99 languages on its multilingual variants, with no training step. That last part is the quiet revolution. You do not teach modern voice typing software your voice. You install it and talk.
I am old enough to remember when that was science fiction. In the late 1990s a relative ran Dragon NaturallySpeaking on a Windows 98 desktop with 64 MB of RAM. Setup meant reading a list of words aloud for 45 minutes so the software could calibrate to your voice. Then it worked, barely, at maybe 70% accuracy, with a four-second delay per sentence. It took fifteen minutes to dictate one paragraph of a holiday letter. The headset got thrown across the room. The headset survived; the experiment did not. Twenty-five years later the same task takes ninety seconds and zero training. The hardware caught up to the idea.
Two flavors of the middle step exist today. Local processing runs the model on your own computer, offline, so your audio never leaves the machine, the way offline speech to text works. Cloud processing sends the audio to a server, which can be faster on weak hardware but means your words travel. Which one you want depends on what you are dictating. A grocery list, who cares. Your client's contract, maybe care.
The third step, the cleanup, is where the AI in AI dictation lives. Raw transcription gives you a wall of words with no paragraph breaks and the occasional um. A language model can fix the punctuation, drop the filler, and even match a tone you ask for. In Whisper by Remskill that step is optional and runs locally through Ollama, or in the cloud through your own OpenAI key if you turn Pro on. You can also trigger it by voice: say the activation phrase, currently Hey whisper, and the text gets handed to the model instead of pasted raw. None of that changes the core trick. It just decides how polished the words are when they arrive.
The free options you already have: Windows Voice typing, Apple Dictation, Google Docs
Before paying for anything, check what is already on your machine. Three free built-in options cover a lot of ground.
Windows · Win + H
macOS · Dictation
Docs · Voice typing
Windows Voice typing
On Windows 11, press the Windows logo key plus H in any text box and a voice typing bar appears. It is good for quick messages. The catch: it needs an internet connection and a working microphone to run, because the recognition happens in the cloud. It supports 43 languages per Microsoft's list. If you are offline on a train, it stops working. There is a full walkthrough in our guide to voice to text on Windows.
Apple Dictation
On a Mac, turn it on in System Settings, Keyboard, Dictation, then start it with the microphone key or your chosen shortcut. The current version transcribes text of any length and only stops after 30 seconds of silence, not after a hard time cap. On Apple Silicon it can process your speech on-device. For short notes it is free and fine; the longer setup lives in voice to text on Mac.
Google Docs Voice typing
Open a Google Doc in Chrome, Edge, or Safari, click Tools, then Voice typing, and a microphone box appears. It supports over 100 languages and regional variants. The hard limit is right there in the name: it only writes inside Google Docs and Slides. It will not type your email, your Slack, or your code.
The honest way to read these three: they are real tools, not demos, and for a large slice of people they are the end of the search. Where they stop is predictable. Windows Voice typing dies the moment you lose signal. Google Docs Voice typing never leaves the document. Apple Dictation is excellent on a Mac and absent everywhere else. If your work fits inside those edges, you are done. Close this tab and press the key. The paid category exists for the work that does not fit: all-day dictation, offline on a plane, every app instead of one, and audio that has to stay on your own disk.
The paid tools worth knowing (Dragon, Wispr Flow, Superwhisper, Voicy, Whisper by Remskill)
When the free tools run out of room, when you dictate all day or need offline mode or want AI cleanup, the paid category opens up. Here are the names worth knowing, with one honest line each.
I did not run these head-to-head on a stopwatch, so I will not pretend to. I picked the names below on three things: documented platform reach (does it work where you actually type), documented offline support (does your audio leave the machine), and documented language coverage. The table holds only facts each vendor publishes; I left speed and accuracy out of it because no neutral benchmark exists across all of them, and inventing one would be the exact thing I came here to talk you out of.
| Tool | Platform | Local / Cloud | Works offline | Pricing model | Languages | Best for |
|---|---|---|---|---|---|---|
| Windows Voice typing | Windows 11 | Cloud | No | Free, built in | 43 | Quick messages on a connected PC |
| Apple Dictation | macOS | Local on Apple Silicon | Yes (Apple Silicon) | Free, built in | Dozens | Short notes on a Mac |
| Google Docs Voice typing | Browser | Cloud | No | Free, browser feature | 100+ | Writing inside Google Docs only |
| Dragon by Nuance | Windows | Local | Yes | Paid, one-time license | English-focused | All-day dictation on Windows |
| Whisper by Remskill | Windows, macOS (Apple Silicon) | Local or Cloud (your key) | Yes (local engines) | Free local, paid Pro for cloud | 99 on Whisper multilingual | System-wide dictation, offline, any app |
Dragon by Nuance is the old guard. Dragon Professional v16 advertises dictation three times faster than typing with up to 99% recognition accuracy from first use, and it is optimized for Windows 11. That 99% is Nuance's own number, not a neutral benchmark. The catch: Dragon Professional is Windows-only, with no current Mac desktop version.
Wispr Flow, Superwhisper, and Voicy are the newer wave of AI dictation apps. They wrap a speech model in a clean interface and add tone or formatting cleanup. They are competent. The pattern across most of this category is the same architecture, a speech model, a UI, and a monthly invoice, and the invoice is where they differ most. If the invoice is the part that stings, we wrote up an honest superwhisper alternative that keeps the whole local pipeline free forever.
Whisper by Remskill, our app, is a desktop tool for Windows and macOS on Apple Silicon. You press a hotkey, speak, and the text pastes at your cursor in any app. The default hotkey is Ctrl+Space on Windows and a Command+Option push-to-talk chord on Mac — hold both, release either to stop. What you choose is the engine. You pick from three paths: local NVIDIA Parakeet (~600 MB, 5–10× faster than Whisper on CPU, English plus 24 European languages); local Whisper (eight models, 99 languages on the multilingual ones, translate-to-English); or Cloud mode, which uses your own OpenAI key for gpt-4o-mini-transcribe or gpt-4o-transcribe with no cut taken by us. All local transcription is pure-Rust, no Python. The full comparison of the wider field lives in our transcription software roundup.
This is also where my one opinion goes: try local mode first. If your Mac is Apple Silicon or your PC is from the last four years, you do not need the cloud for everyday dictation. Local Parakeet starts transcribing in well under two seconds on modern hardware, your audio never leaves the laptop, and cloud is the escape hatch for when you want OpenAI-grade accuracy or web search, not the default. Reach for the network when you hit a wall, not before. I am the kind of architect who reaches for the bigger, fancier solution by reflex and then talks myself back down. Local-first is me talking myself back down, in public, so you can skip the part where I waste a week.
The practical reason is hardware. A modern laptop already has a microphone and a processor fast enough to run a speech model on its own. Sending one paragraph of audio to a server and back, for a job your machine can do offline in under two seconds, is a habit left over from when laptops were too slow. They are not anymore. Cloud earns its keep for the hard cases: a noisy room, an unusual accent, a request that needs a live web answer pasted into your reply. For the daily flow of email, notes, and chat, local is faster to start, private by default, and free for signed-in users. The escape hatch is there when you need it; most days you will not.
When to skip a dedicated app and use the built-in one
Here is the part most best-software articles skip. If you send short messages, a 30-word text, a quick Slack reply, the free dictation already on your machine is all you need. Windows Voice typing (Windows key + H) and Apple Dictation are free, they are built in, and they work. Do not install or pay for anything to write one sentence. A dedicated app starts earning its place when you dictate often, need it to work offline on a plane, want it across every app and not just one, or care that your words stay on your device. Below that threshold, the boring truth is you already own the right tool.
What Whisper by Remskill costs
Whisper by Remskill is free for every signed-in user for the entire local pipeline: local Whisper, Parakeet, Ollama-based AI cleanup, history, custom hotkey, model downloads, with no payment method needed to sign up. Whisper Pro adds the Cloud surface: OpenAI cloud transcription, cloud AI enhancement, and voice web search through your own OpenAI key. Cloud mode bills you directly through OpenAI; we take no cut. The current plans and the Pro trial are on the pricing page. I am not going to quote numbers at you here; the page does that better than a paragraph can.
My relative threw a headset across a room in 1999 because dictation was a 45-minute chore that produced garbage. The headset outlived the experiment. Twenty-five years later the chore is gone. You press a key and talk, and the words show up. The only real decision left is which tool, and for a lot of people the right answer is sitting on their machine already, switched off, waiting. My own kids will never know it was ever hard, which is the goal, even if it makes for a worse story at the dinner table.
Try the one you already own first
If it runs out of room, download Whisper and pick the engine that fits how you work.
Free for signed-in users on the full local pipeline. No card at sign-up.



