By Denys Medvediev

Guide

Voice typing software: free built-in options vs paid tools

Some of the best voice typing software is already on your machine. Here is which one to use, and when paying for a dedicated app actually makes sense.

Last updated: June 2026

Sleek laptop and glass of water on a bright office desk, set up for hands-free voice typing

Voice typing software listens through a microphone and writes down what you say, turning speech into editable text at around 145 words a minute against about 40 for typing. The good tools run system-wide, so the words land at the cursor. Some are free and built into Windows and Mac; paid apps add offline mode and AI cleanup.

Talk, and the words land at your cursor

The first time it works, it feels like a small magic trick. You hold a key, say a sentence, let go, and the sentence is just there in your email. No keyboard. My younger relative once dictated a 90-word note to her grandmother before I had finished explaining what dictation was. The hard part was never the speaking. For two decades the hard part was the software being good enough to trust, and that part finally got solved.

This article is about which voice typing software is worth your time, including the free options you already own.

Pasted
Whisper's overlay finishing a dictation — the text pastes at your cursor.

Most people stuck on a keyboard are doing it out of habit, not need. Typing is a learned compromise, a way to get thoughts out of your head and into a machine that does not have ears. Voice typing software removes the compromise. The question stopped being whether it works around 2022 and became which one, and do I need to pay for it.

The honest answer depends on three things: how long you dictate, whether you want it to work in every app, and whether you care that your words never leave your laptop. By the end of this you will know which path fits, and I will tell you when the free built-in option is all you need. I read most of the support email for this app, and a steady share of it comes from people who paid for a tool when the one already on their machine would have done the job. So I have a small stake in talking you out of a purchase.

What voice typing software is

Close-up of a digital audio interface showing a vibrant sound wave, illustrating speech captured as data

Voice typing software is a program that captures audio from your microphone and converts it to written text using a speech recognition model. The older name is dictation software. The newer marketing name is AI dictation, which mostly means the same thing with a language model bolted on to fix punctuation and tone.

There are three shapes it comes in. Built-in dictation ships with your operating system: Windows Voice typing, Apple Dictation. Browser-based voice typing lives inside one app, like Google Docs Voice typing. And dedicated desktop apps install separately and work across everything you type in. The shape matters more than the brand. A browser tool that only writes inside Google Docs is useless for your Slack messages, no matter how good its accuracy is. The first question to ask about any tool is not how accurate it is but where it lets you type. Accuracy is now a solved problem for most of them; reach is not.

The thing that separates a serious tool from a toy is where it pastes. Built-in and dedicated desktop tools are system-wide: press the hotkey in any text field and the text appears there. That is the whole game. Everything else, accuracy, languages, AI cleanup, is a refinement on top of whether it types where you are looking.

A second thing separates the categories: what the model can hear. Some tools only handle English. Others handle dozens of languages and can switch mid-sentence. Whisper's English-only models support exactly one language, while its multilingual builds cover 99. NVIDIA's Parakeet sits in the middle at 25 languages, English plus 24 European ones. If you only ever write in English, none of that matters and you should pick on speed instead. If you draft in two languages before lunch, it matters a lot. Most people overestimate how many languages they need and underestimate how much they care about latency. The lag between letting go of the key and seeing text is the thing you feel every single time.

How it works (and why accuracy finally got good)

Under the hood the pipeline has three steps. Your microphone records a short clip of audio. A speech recognition model turns that audio into text. Then the text gets pasted, sometimes after a language model tidies it up.

CancelTranscribing
Whisper mid-transcription — the speech model turning audio into text.

The accuracy jump everyone noticed came from the model in the middle. The open-source Whisper model from OpenAI changed what good meant. It handles accents, background noise, and 99 languages on its multilingual variants, with no training step. That last part is the quiet revolution. You do not teach modern voice typing software your voice. You install it and talk.

I am old enough to remember when that was science fiction. In the late 1990s a relative ran Dragon NaturallySpeaking on a Windows 98 desktop with 64 MB of RAM. Setup meant reading a list of words aloud for 45 minutes so the software could calibrate to your voice. Then it worked, barely, at maybe 70% accuracy, with a four-second delay per sentence. It took fifteen minutes to dictate one paragraph of a holiday letter. The headset got thrown across the room. The headset survived; the experiment did not. Twenty-five years later the same task takes ninety seconds and zero training. The hardware caught up to the idea.

Two flavors of the middle step exist today. Local processing runs the model on your own computer, offline, so your audio never leaves the machine, the way offline speech to text works. Cloud processing sends the audio to a server, which can be faster on weak hardware but means your words travel. Which one you want depends on what you are dictating. A grocery list, who cares. Your client's contract, maybe care.

The third step, the cleanup, is where the AI in AI dictation lives. Raw transcription gives you a wall of words with no paragraph breaks and the occasional um. A language model can fix the punctuation, drop the filler, and even match a tone you ask for. In Whisper by Remskill that step is optional and runs locally through Ollama, or in the cloud through your own OpenAI key if you turn Pro on. You can also trigger it by voice: say the activation phrase, currently Hey whisper, and the text gets handed to the model instead of pasted raw. None of that changes the core trick. It just decides how polished the words are when they arrive.

The free options you already have: Windows Voice typing, Apple Dictation, Google Docs

Before paying for anything, check what is already on your machine. Three free built-in options cover a lot of ground.

Windows · Win + H

Listening…

macOS · Dictation

Docs · Voice typing

Click to speak
The dictation already on your computer, in two flavours — no install needed.

Windows Voice typing

On Windows 11, press the Windows logo key plus H in any text box and a voice typing bar appears. It is good for quick messages. The catch: it needs an internet connection and a working microphone to run, because the recognition happens in the cloud. It supports 43 languages per Microsoft's list. If you are offline on a train, it stops working. There is a full walkthrough in our guide to voice to text on Windows.

Apple Dictation

On a Mac, turn it on in System Settings, Keyboard, Dictation, then start it with the microphone key or your chosen shortcut. The current version transcribes text of any length and only stops after 30 seconds of silence, not after a hard time cap. On Apple Silicon it can process your speech on-device. For short notes it is free and fine; the longer setup lives in voice to text on Mac.

Google Docs Voice typing

Open a Google Doc in Chrome, Edge, or Safari, click Tools, then Voice typing, and a microphone box appears. It supports over 100 languages and regional variants. The hard limit is right there in the name: it only writes inside Google Docs and Slides. It will not type your email, your Slack, or your code.

The honest way to read these three: they are real tools, not demos, and for a large slice of people they are the end of the search. Where they stop is predictable. Windows Voice typing dies the moment you lose signal. Google Docs Voice typing never leaves the document. Apple Dictation is excellent on a Mac and absent everywhere else. If your work fits inside those edges, you are done. Close this tab and press the key. The paid category exists for the work that does not fit: all-day dictation, offline on a plane, every app instead of one, and audio that has to stay on your own disk.

When to skip a dedicated app and use the built-in one

Here is the part most best-software articles skip. If you send short messages, a 30-word text, a quick Slack reply, the free dictation already on your machine is all you need. Windows Voice typing (Windows key + H) and Apple Dictation are free, they are built in, and they work. Do not install or pay for anything to write one sentence. A dedicated app starts earning its place when you dictate often, need it to work offline on a plane, want it across every app and not just one, or care that your words stay on your device. Below that threshold, the boring truth is you already own the right tool.

What Whisper by Remskill costs

Whisper by Remskill is free for every signed-in user for the entire local pipeline: local Whisper, Parakeet, Ollama-based AI cleanup, history, custom hotkey, model downloads, with no payment method needed to sign up. Whisper Pro adds the Cloud surface: OpenAI cloud transcription, cloud AI enhancement, and voice web search through your own OpenAI key. Cloud mode bills you directly through OpenAI; we take no cut. The current plans and the Pro trial are on the pricing page. I am not going to quote numbers at you here; the page does that better than a paragraph can.

My relative threw a headset across a room in 1999 because dictation was a 45-minute chore that produced garbage. The headset outlived the experiment. Twenty-five years later the chore is gone. You press a key and talk, and the words show up. The only real decision left is which tool, and for a lot of people the right answer is sitting on their machine already, switched off, waiting. My own kids will never know it was ever hard, which is the goal, even if it makes for a worse story at the dinner table.

Try the one you already own first

If it runs out of room, download Whisper and pick the engine that fits how you work.

Free for signed-in users on the full local pipeline. No card at sign-up.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.