Guide
Voice to text for note taking
Capture notes by talking instead of typing. A system-wide hotkey pastes your words at the cursor in any notes app — Notion, Obsidian, Apple Notes, OneNote, plain text. No app switching, no per-app plugin. An AI pass tidies the brain-dump after.
Last updated: June 2026

Voice to text for note taking works through a system-wide tool, not the notes app itself. Press a hotkey, speak, and the transcript pastes at the cursor in whatever app has focus — Notion, Obsidian, Apple Notes, OneNote, or a plain text file. It runs offline and free on local models, and an AI pass cleans the spoken draft.
Most note-taking is just typing with extra steps. You have a thought, you open the app, you find the right page, you type the thought, the thought has already half-evaporated by the time your fingers catch up. The fastest way I've found to keep a thought is to say it out loud the second I have it, into whatever window happens to be open, and let the words land as text.
People search for "voice to text for note taking" expecting to pick the one app with the best dictation. That's the wrong question. Almost no notes app has good built-in dictation on desktop, and the ones that do only work inside themselves. The thing that actually works the same everywhere isn't an app feature. It's a hotkey that pastes at your cursor, and the cursor doesn't care which notes app it's sitting in.
Here's the part most pages dancing around this keyword won't say plainly. A note, in any app, is a text box. Notion's editor is a text box. An Obsidian note is a text box. Apple Notes, OneNote, a Stickies window, a .txt file open in any editor — all text boxes. Dictation that pastes at your cursor doesn't care which one it is.
So the real question isn't "which notes app has the best voice typing." It's "which dictation tool do I run on top of all of them." The answer is the one that works system-wide, runs offline if you want it to, and cleans up the spoken mess afterward. I'll show the why, the how, the two-minute setup, how it drops into each notes app, and — the part nobody else writes — when to skip the dedicated tool entirely.
Why talk your notes instead of typing them

The job a notes app is really doing is catching ideas before they leave. The bottleneck isn't the app. It's the gap between having the thought and getting it down. Typing is around 40 words a minute for most people. Talking is around 145. That's not a small edge; it's the difference between catching the idea whole and catching the half of it that survived the trip to the keyboard.
Dictation closes that gap in two ways. The first is raw speed — a paragraph of notes is fifteen seconds of talking instead of a minute of typing. The second is quieter and matters more: it lets you capture while your hands are busy. Standing at the whiteboard, walking the dog, washing up after the kids are in bed and the only good ideas of the day finally arrive. You don't sit down to take the note. You just say it.
There's also the hands themselves. If your wrists are tired by 3pm, dictating your notes is a way to keep working without adding to the pile of keystrokes. I'm not going to dress that up as anything medical — it's a productivity thing. Fewer keystrokes, same notes. For a long capture session, your hands feeling fine at the end is reason enough.
Press a hotkey, talk, text lands in the note
This is the whole mechanic, and it's boring in the best way. You press a hotkey, you speak, you release, and the transcript pastes at your cursor, in whatever text field has focus. Whisper holds a short tail after you let go of the key, so your last word doesn't get clipped. Because it pastes at the OS cursor, a Notion block, an Obsidian note, and an Apple Notes card are all just "any text box." Same key, same behaviour, every app.
That's the part the landing pages overcomplicate. There's no plugin to install into your notes app, no API token to paste, no sync job to babysit. Your cursor is in the note, you talk, the words appear in the note. A small capsule shows up while you speak so you know it's listening:
The hotkey is the one thing worth getting right up front. On Windows it's Ctrl+Space; on Mac it's Command+Option, a modifier-only push-to-talk you hold while speaking. Both are changeable in Settings if they clash with something you already use. (My younger daughter once told me a hotkey "didn't work" in her drawing app. It was a conflict, not a bug, which is how I learned the average person has no idea what a hotkey conflict even is. So now every hotkey is customisable.) If you've ever set up dictation on Windows or on Mac, this is the same muscle memory pointed at every app at once.
Set it up in two minutes (Windows or Mac)
You need a Mac on Apple Silicon or a Windows 10-or-newer PC, a working microphone, and your notes app open — any of them. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.
Step 1 — Install Whisper and sign in.
Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.
You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.
Step 2 — Pick a transcription path.
The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. For private notes, start local — more on that two sections down.
You'll know it worked when a model finishes downloading and shows as ready.
Step 3 — Confirm your hotkey.
Windows defaults to Ctrl+Space, Mac to Command+Option held as push-to-talk. On Mac, grant the Accessibility permission when prompted; without it, the paste-at-cursor can't reach other apps.
You'll know it worked when a test recording pastes into any text field.
Step 4 — Put your cursor in a note and talk.
Open whichever notes app you use, click into a note, hold the hotkey, say a sentence, release. The transcript appears where the cursor is.
You'll know it worked when your spoken sentence is sitting in the note as text.
The slow part is the model download, not the setup. Everything else is the four steps above. Once it's running, capturing a thought into any of your notes apps stops being a typing task and starts being a talking task.
Notion, Obsidian, Apple Notes, OneNote — same hotkey
The reason a system-wide hotkey beats a per-app feature is that you stop relearning. The few notes apps with their own dictation only work inside themselves, and most don't have it at all on desktop. With one hotkey that pastes at the cursor, the flow is identical no matter which app you opened this morning.
In Notion, click into any block or a database field, hold the key, talk — the text drops into the block. In Obsidian, put the cursor in a note and the words land in the markdown, same as typing them. Apple Notes and OneNote both have ordinary text areas, so the cursor catches the transcript there too. Even a plain .txt file in any editor works, because to a paste-at-cursor tool a text file is no different from a fancy editor. For app-specific walkthroughs, the same flow is covered for dictating into Notion and into Obsidian.
There's a free productivity move hiding in this. Most people's notes live in two or three apps — work notes in one, personal in another, quick captures in a third. With a per-app tool you'd need each app to support voice, and you'd switch buttons every time. With the hotkey, the same gesture fills all of them, and it fills your email and your chat app too, because voice typing isn't really about notes apps — it's about the cursor. I switch apps roughly forty times an hour and don't want forty different dictation buttons to remember.
Local or cloud: which mode for private notes
For notes, try local mode first. A lot of what goes into a notes app is exactly the stuff you'd never want on someone else's server — a half-formed idea, a salary figure, a draft of a difficult email, a thought about a person. It would be a strange choice to keep all that in a local notes file and then route your voice through a cloud to get it there. If your Mac is Apple Silicon or your PC is from the last few years, local handles everyday note capture without complaint, and cloud becomes the escape hatch rather than the default.
Here's how the three paths differ, because the app makes you pick and I'd rather you pick well:
- Local Parakeet — NVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English. If you take notes in English or another European language, this is the quick, fully offline pick.
- Local Whisper — slower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English. The English-only builds are English-only, not 99. Pick this for Chinese, Japanese, Korean, or any translation work, which Parakeet can't do. Default English model is around 480 MB.
- Cloud (OpenAI, BYOK) — best accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. Needs internet, so it's the one path that leaves your machine. The Cloud surface is part of Whisper Pro.
The boring truth is that for the kind of text most people put in their notes, local is plenty. Both local engines run fully on your machine with nothing sent to a server, which is the whole point if your notes are private. Cloud earns its place when you want top-tier accuracy on a hard recording or you need the model to pull a fact off the web mid-sentence. For a daily note habit, start local and only reach for cloud when local leaves you wanting.
Turning a spoken brain-dump into tidy notes
Raw dictation comes out as a run-on. You say "okay so three things for the launch first the pricing page second email the beta list third remind me to call the printer," and that's the unpunctuated wall any speech engine hands you. A spoken brain-dump is fast to produce and ugly to read. Cleaning it up is where the paths diverge.
Windows Voice Typing adds punctuation as you speak, and macOS Dictation handles basic punctuation when you say "comma" or "period." For heavier cleanup — stripping the "ums," fixing the run-ons, turning a spoken paragraph into something you'd actually keep — Whisper can run an AI pass. Say the activation phrase "Hey whisper" and the text gets enhanced before it lands. On a local model that runs through Ollama; in cloud mode it's gpt-5-mini by default.
okay so three things for the launch first the pricing page second email the beta list third remind me to call the printer um before friday
Three things for the launch: first, the pricing page; second, email the beta list; third, remind me to call the printer before Friday.
Now the honest limit. Dictation gives you words — clean, punctuated words. It does not give you your notes app's structure. The cleanup pass can turn a run-on into a tidy sentence, but it won't build a Notion toggle, indent an Obsidian bullet, check a OneNote box, or apply a heading. Each app's own shortcuts do that. Dictate the sentence, then press Tab to nest, type # or - for the structure you want, the way you always do. Anyone promising "say make a checklist and watch it format" is selling you a demo, not a Tuesday. Get the words down fast by voice, shape the note with the keys you already know.
That same speak-then-clean flow pays off well beyond note taking — you can dictate clean prose into any app with the one hotkey, so a long note becomes a few spoken sentences instead of a paragraph you type out.
When to skip a dictation tool for notes

Sometimes a dedicated dictation tool is the wrong answer, and pretending otherwise would be dishonest. Two cases come up a lot, and in both I'd point you elsewhere.
The first is recording a meeting or a lecture to transcribe later. That's a different job. Dictation types what you say in real time at your cursor; it doesn't sit in the corner capturing a 90-minute conversation between several people and handing you a speaker-labelled transcript afterward. For that you want a transcription tool built for it — multi-speaker, post-meeting summaries, the works. Don't reach for a dictation hotkey to record a room; it's the wrong shape. The second is quick capture on your phone. Whisper is desktop only, Windows and macOS, so when you're standing in a queue with a thought, your phone keyboard's built-in microphone already dictates into any notes app, free. Use it. I'm not going to tell you to install a desktop tool for a one-line capture you made on a phone.
And for short notes on the desktop itself, the built-ins are fine. On Windows, Windows key + H opens Voice Typing wherever your cursor is and punctuates on its own — the catch is it routes through Microsoft's servers and needs internet, so it isn't offline. On Mac, Dictation works in any text field, set up in System Settings under Keyboard, and on Apple Silicon general text can be processed on-device. Reach for a system-wide tool when the built-ins start hurting: long notes, multilingual capture, offline privacy on Windows, or wanting one hotkey that behaves the same in every notes app you keep. Below that bar, use what's free.
If most of your dictation ends up in one specific app, the focused walkthrough for dictating into Obsidian covers the same cursor-is-the-integration logic for a single local-first markdown app.
No notes app needs to build a great microphone button, because the cursor is the integration. Talk into the note, get text, shape it with the shortcuts you already know. I dictated most of this guide into a text box, with a tool that doesn't care which box it is, then pasted the lot into my own notes. The only thing it didn't do was take the notes for me, which is probably for the best.
Take your next note by talking
Hold the hotkey, talk, release. The transcript lands in whatever note your cursor is in — Notion, Obsidian, Apple Notes, OneNote, plain text, and every other app too.
Free local mode for any signed-in account. No card required to start.



