By Denys Medvediev

Guide

Voice to text in Standard Notes

Standard Notes has no built-in dictation on desktop. The fix is a system-wide tool: press a hotkey, speak, and the transcript pastes at your cursor in any note. Keep it local and your voice never leaves the machine, which is the whole point of an encrypted notes app.

Last updated: June 2026

Closed padlock resting on a laptop keyboard on a dark desk, evoking private, encrypted note-taking

Voice to text in Standard Notes works through a system-wide tool, not the app itself. The Standard Notes desktop editor has no built-in dictation. A tool like Whisper fixes it: press a hotkey, speak, and the transcript pastes at the cursor in any note. Run it locally and the audio never leaves the machine.

I moved my private notes into Standard Notes for one reason — it encrypts everything before it leaves my laptop, and I don't have to take that on faith. The one thing I missed was talking into a note instead of typing it. So I went looking for a dictation setting. There isn't one. Standard Notes gives you a clean editor and not much else by design, and after a fair bit of poking, I'm confident it isn't hiding a microphone button from me.

People search for "voice to text in Standard Notes," find nothing in the app, and assume they missed a toggle. They didn't. The toggle was never built. The good news: the fix takes about two minutes, can run fully offline, and — if you set it up the way I'm about to describe — keeps your voice on the same machine that's already encrypting your notes.

Here's the thing most pages dancing around this keyword won't say plainly. A Standard Notes editor is just a text box, the same as Gmail or a search bar. Dictation that pastes at your cursor doesn't care which app the cursor is in.

So the real question isn't "how do I turn on voice typing in Standard Notes." There's no switch. The question is "which dictation tool do I run on top of it, and does that tool quietly ship my voice to a server." For an encrypted-by-default notes app, that second half matters more than usual. I'll walk the options, set one up in two minutes, and tell you when to skip the dedicated route entirely.

Does Standard Notes have built-in dictation?

Person speaking thoughtfully at a laptop in a quiet room, contrasting talking with typing

No. The Standard Notes desktop app has no built-in speech-to-text, dictation, or voice-typing feature for writing into a note by voice. There's no microphone button in the editor, no voice command, no hidden preference. That isn't an oversight — Standard Notes leans deliberately minimal, a plain encrypted editor rather than a kitchen-sink workspace. If you've been combing settings for a dictation toggle, you can stop. It isn't there.

This is where it helps to know what Standard Notes is built around. Your note text is end-to-end encrypted before it ever leaves your device, which is the entire pitch. Any dictation you bolt on lives outside that boundary by definition — it's a separate tool turning your speech into characters, then handing those characters to the editor like a keyboard would. The question that actually matters isn't whether the editor can hear you. It's whether the thing that does the hearing keeps your audio on your machine or ships it somewhere. Hold that thought; it shapes the whole rest of this guide.

One thing worth a single sentence so you don't chase it on the wrong device: on a phone, you don't need any of this. Tap the microphone on your phone keyboard and dictate into a Standard Notes note like any other text field. Whisper is a desktop tool for Windows and macOS, so the phone keyboard mic is the practical route there. On the desktop app most people actually write in, you need a tool that sits on top of Standard Notes — and you want to pick that tool with privacy in mind.

Press a hotkey, talk, text lands in the note

This is the whole mechanic, and it's boring in the best way. You press a hotkey, you speak, you release, and the transcript pastes at your cursor, in whatever text field has focus. Whisper holds a short tail after you let go of the key, so your last word doesn't get clipped. Because it pastes at the OS cursor, a Standard Notes editor is just "any text box." Desktop app or the web version, same behaviour.

That's the part the landing pages overcomplicate. There's no extension to install into Standard Notes, no API token to paste into the app, no sync job to babysit. Your cursor is in a note, you talk, the words appear in the note. A small capsule shows up while you speak so you know it's listening:

Cancel
The recording overlay: a small capsule that appears while you speak, so you know Whisper is listening.

The hotkey is the one thing worth getting right up front. On Windows it's Ctrl+Space; on Mac it's Command+Option, a modifier-only push-to-talk you hold while speaking. Both are changeable in Settings if they clash with something you already use. (My younger daughter once told me a hotkey "didn't work" in her drawing app. It was a conflict, not a bug, which is how I learned the average person has no idea what a hotkey conflict even is. So now every hotkey is customisable.) If you've ever set up dictation on Windows, this is the same muscle memory pointed at a different app.

Set it up in two minutes (Windows or Mac)

You need a Mac on Apple Silicon or a Windows 10-or-newer PC, a working microphone, and Standard Notes open in either the desktop app or the web version. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.

Step 1 — Install Whisper and sign in.

Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.

You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.

Step 2 — Pick a local transcription path.

The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. For private notes, pick one of the two local paths — more on why two sections down.

You'll know it worked when a model finishes downloading and shows as ready.

Step 3 — Confirm your hotkey.

Windows defaults to Ctrl+Space, Mac to Command+Option held as push-to-talk. On Mac, grant the Accessibility permission when prompted; without it, the paste-at-cursor can't reach other apps.

You'll know it worked when a test recording pastes into any text field.

Step 4 — Put your cursor in a Standard Notes note and talk.

Open a note, click into the editor, hold the hotkey, say a sentence, release. The transcript appears where the cursor is, in the note.

You'll know it worked when your spoken sentence is sitting in the Standard Notes editor as text.

Whisper
The real Whisper desktop app on the settings screen, with the Transcription and AI panels open.

The slow part is the model download, not the setup. Everything else is the four steps above. Once it's running, capturing a thought into an encrypted note stops being a typing task and starts being a talking task — and with a local model picked, nothing about that thought leaves your laptop.

Keeping your voice as private as your notes

This is the section that matters most for a Standard Notes user, so I'll be blunt. If you chose an end-to-end encrypted notes app, routing your spoken words through a cloud transcription service to get them into that app is a contradiction. Your note text gets encrypted before it leaves your device; your voice, in that setup, doesn't. You'd be locking the front door and leaving the audio recording of yourself unlocking it on someone else's server.

Local mode closes that gap. Both local engines — Parakeet and local Whisper — run entirely on your machine through the pure-Rust transcription core. No audio upload, no API call, no account-linked transcript sitting in a vendor's logs. You can pull the network cable and dictation still works, which is the test I actually trust. The text lands at your cursor inside Standard Notes, which then encrypts it the way it encrypts everything else. The voice and the note both stay on the same machine, end to end.

I'm not neutral on this one, and I'll show my work rather than hand-wave. A team I worked with once let a contractor build an internal "AI dictation" prototype that called a cloud API for every utterance. The "smart retry" logic was a little too aggressive, so it transcribed the same standup recordings four times over. At quarter's end the manager opened the cloud-cost dashboard to a five-figure bill, and the CFO's takeaway wasn't "optimise the prompt" — it was "or we don't pay to send our meetings to a server in the first place." For a personal notes habit the bill isn't the risk; the principle is. If the app's whole reason to exist is that your data stays yours, the dictation feeding it should hold the same line.

Local or cloud: which mode for an encrypted note

For Standard Notes, I'd start local and treat cloud as the exception. The reason you're here is privacy, and the two local paths give you dictation that never touches a server. Cloud mode is genuinely better at a few things, but it's the one path that leaves your machine, so reach for it deliberately rather than by default. Here's how the three differ, because the app makes you pick and I'd rather you pick well:

  • Local ParakeetNVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English. If you write your notes in English or another European language, this is the quick, fully offline pick.
  • Local Whisperslower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English. The English-only builds are English-only, not 99. Pick this for Chinese, Japanese, Korean, or any translation work, which Parakeet can't do. Default English model is around 480 MB. Still fully offline.
  • Cloud (OpenAI, BYOK)best accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. It needs internet, so your audio leaves the machine — the one path that breaks the local promise. The Cloud surface is part of Whisper Pro.

The boring truth is that for the kind of text most people put in an encrypted note — a journal entry, a half-formed idea, a password hint you'd never paste into a cloud doc — local is plenty. Both local engines run fully on your machine with nothing sent to a server, which is exactly the contract Standard Notes already makes for the note itself. Cloud earns its place when you want top-tier accuracy on a hard recording or you need the model to pull a fact off the web mid-sentence. For private notes, that's rarely the trade you want to make.

If you genuinely need cloud-grade accuracy on a specific note, the honest move is to make that choice consciously, knowing the audio leaves your machine for that recording, and switch back to local for the private stuff. The app keeps the toggle one click away precisely so you're never stuck. Most days, for most notes, I never touch it.

Punctuation and cleanup without leaving your machine

Raw dictation comes out as a run-on. You say "okay so move the recovery codes to the encrypted note tag it security and remind me to rotate them next month," and that's the unpunctuated wall any speech engine hands you. Cleaning it up is where the paths diverge — and for a privacy app, where the cleanup happens matters too.

Windows Voice Typing adds punctuation as you speak, and macOS Dictation handles basic punctuation when you say "comma" or "period." For heavier cleanup — stripping the "ums," fixing the run-ons, turning a spoken paragraph into something you'd actually keep in a note — Whisper can run an AI pass. Say the activation phrase "Hey whisper" and the text gets enhanced before it lands. On a local model that pass runs through Ollama on your own machine, so even the cleanup stays offline; in cloud mode it's gpt-5-mini by default, which does send the text out.

Thinking...
The overlay during the AI cleanup pass, before the tidied text lands at your cursor.
Raw

okay so move the recovery codes to the encrypted note tag it security and remind me to rotate them next month um before the renewal

Cleaned

Okay, so move the recovery codes to the encrypted note, tag it security, and remind me to rotate them next month before the renewal.

A fair expectation to set: dictation gets you the words, not Standard Notes' own structure. The app's tags, its note titles, its editor choices — you still set those with the keys and clicks you already use. Dictate the sentence, then add the tag or rename the note the normal way. No dictation tool conjures an app's organisation into existence on command; anyone promising "say tag it security and watch it file itself" is selling you a demo, not a Tuesday. Get the words down fast by voice, shape the note with the controls you already know.

That same speak-then-clean flow pays off well beyond your notes — you can also dictate clean prose into any app with the one hotkey, so a long entry becomes a few spoken sentences instead of a paragraph you type out.

When to skip a dictation tool for Standard Notes

Two arrows chalked on pavement pointing different directions, illustrating a tool choice

Sometimes the right tool is the free one already on your machine, and pretending otherwise would be dishonest. If you only drop short captures into Standard Notes — a quick line, a two-word reminder — your operating system covers it for nothing.

On Windows, press Windows key + H and the built-in Voice Typing bar opens wherever your cursor is, a Standard Notes editor included. It punctuates on its own and is fine for short bursts. One catch worth flagging for this audience specifically: Win+H routes your speech through Microsoft's servers and needs an internet connection, so it isn't an offline option. For a notes app whose entire premise is that your data stays on your device, that's a real mismatch — a local Whisper model is the more consistent choice when the privacy is the point. On Mac, Dictation lets you speak to enter text anywhere you can type, set up in System Settings under Keyboard, and on Apple Silicon general text can be processed on-device, which keeps it local. Both are genuinely good for short snippets.

Reach for a dedicated, system-wide tool when the built-ins start hurting: long notes, multilingual work, wanting cleanup, or wanting one hotkey that behaves the same in Standard Notes, your email, and your editor — while keeping everything offline. Below that bar, use what's free, with the one caveat that on Windows "free" means "routed through Microsoft." I'm not going to tell you to install an app for a one-line reminder.

The same trade-off shows up if you also keep notes elsewhere — the logic in dictating into Notion is identical, because in both apps the cursor, not a built-in feature, is the real integration point.

Standard Notes never shipped a microphone button, and given how hard it works to stay minimal and private, I doubt it ever will. It doesn't need to, because the cursor is the integration. Talk into the note, get text, and if you keep it local, the audio stays on the same machine doing the encrypting. I dictated most of this guide into a text box that wasn't Standard Notes, with a tool that doesn't care which box it is and never sent a syllable to a server, then pasted the lot into my own encrypted note. That's the whole trick.

Try it in your next Standard Notes note

Hold the hotkey, talk, release. The transcript lands in whatever note your cursor is in — and stays on your machine if you keep it local.

Free local mode for any signed-in account. No card required to start.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.