Guide
Voice to text in Logseq
Logseq has no built-in dictation on desktop. The fix is a system-wide tool: press a hotkey, speak, and the transcript pastes at your cursor in any Logseq block. Your OS dictation works too, for short captures.
Last updated: June 2026

Voice to text in Logseq works through a system-wide tool, not Logseq itself. The Logseq desktop app has no built-in dictation. The fix is a tool like Whisper: press a hotkey, speak, and the transcript pastes at the cursor in any Logseq block. Your operating system's dictation works too, for short notes.
I keep my daily journal in Logseq because I trust a folder of plain markdown more than I trust any cloud. The one thing I always wanted was to talk into a block instead of typing it. I went looking for the setting. There is no setting. Logseq does not have a microphone button, and after a fair bit of digging, I'm confident it isn't hiding one from me.
People search for "voice to text in Logseq," find nothing in the app, and assume they missed a toggle. They didn't. The toggle was never built. The good news is the fix takes about two minutes, runs fully offline if you want it to, and works in every other app you open as a bonus.
Here's the thing most pages dancing around this keyword won't say plainly. A Logseq block is just a text box, the same as Gmail or a search bar. Dictation that pastes at your cursor doesn't care which app the cursor is in.
So the real question isn't "how do I turn on voice typing in Logseq." There's no switch. The question is "which dictation tool do I run on top of Logseq," and the answer depends on whether you want free-and-built-in, Mac-only, or one offline hotkey that behaves the same everywhere. I'll walk all of it, set one up in two minutes, and tell you when to skip the dedicated route.
Does Logseq have built-in dictation?

No. The Logseq desktop app has no built-in speech-to-text, dictation, or voice-typing feature for writing into a block by voice. There is no microphone button on a block, no voice command, no hidden preference. If you've been combing Settings for it, you can stop. It isn't there.
What does exist is a set of community plugins with "whisper" in the name, and this is where people get turned around. Those plugins transcribe an audio file or a YouTube link into text after the fact. They are useful, but they are not live dictation. You can't put your cursor in today's journal, talk, and watch words appear. They process a recording; they don't type for you while you think. Conflating the two costs an afternoon, and I'd rather you skip that afternoon.
The mobile picture is its own thing, and worth one sentence so you don't chase it on the wrong device: the newer Logseq mobile app has added some voice transcription, but that's a phone feature, and on a phone you'd just use the keyboard's microphone anyway. On the desktop graph most people actually live in, you need a tool that sits on top of Logseq. There are three honest categories, and the rest of this guide covers them.
Press a hotkey, talk, text lands in the block
This is the whole mechanic, and it's boring in the best way. You press a hotkey, you speak, you release, and the transcript pastes at your cursor, in whatever text field has focus. Whisper holds a short tail after you let go of the key, so your last word doesn't get clipped. Because it pastes at the OS cursor, a Logseq block is just "any text box." Desktop app or the browser version, same behaviour.
That's the part the landing pages overcomplicate. There's no plugin to install into Logseq, no API token to paste, no sync job to babysit. Your cursor is in a block, you talk, the words appear in the block. A small capsule shows up while you speak so you know it's listening:
The hotkey is the one thing worth getting right up front. On Windows it's Ctrl+Space; on Mac it's Command+Option, a modifier-only push-to-talk you hold while speaking. Both are changeable in Settings if they clash with something you already use. (My younger daughter once told me a hotkey "didn't work" in her drawing app. It was a conflict, not a bug, which is how I learned the average person has no idea what a hotkey conflict even is. So now every hotkey is customisable.) If you've ever set up dictation on Windows or on Mac, this is the same muscle memory pointed at a different app.
Set it up in two minutes (Windows or Mac)
You need a Mac on Apple Silicon or a Windows 10-or-newer PC, a working microphone, and Logseq open in either the desktop app or the browser. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.
Step 1 — Install Whisper and sign in.
Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.
You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.
Step 2 — Pick a transcription path.
The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. For private journal notes, start local — more on that two sections down.
You'll know it worked when a model finishes downloading and shows as ready.
Step 3 — Confirm your hotkey.
Windows defaults to Ctrl+Space, Mac to Command+Option held as push-to-talk. On Mac, grant the Accessibility permission when prompted; without it, the paste-at-cursor can't reach other apps.
You'll know it worked when a test recording pastes into any text field.
Step 4 — Put your cursor in a Logseq block and talk.
Open your graph, click into a block, hold the hotkey, say a sentence, release. The transcript appears where the cursor is, in the block.
You'll know it worked when your spoken sentence is sitting in the Logseq block as text.
The slow part is the model download, not the setup. Everything else is the four steps above. Once it's running, the act of capturing a thought into your graph stops being a typing task and starts being a talking task.
A Logseq plugin vs. a system-wide hotkey
Most pages ranking for this keyword point you at a Logseq plugin or at Blurt, a dedicated Mac menu-bar tool that speaks straight into your outline. Those are fine answers, with one structural catch each. The whisper-style plugins transcribe audio files, not live speech into the block you're editing. And Blurt, by its own description, is macOS only — if you're on Windows, it isn't an option at all.
A system-wide hotkey sidesteps both limits. It pastes at the OS cursor regardless of which window owns it, so the same key that fills a Logseq block also fills your Gmail compose box, a Slack message, and a commit message. One tool, every text field, on both Windows and Mac. You don't relearn anything when you switch apps, and you don't need a different solution depending on your laptop.
If you're on a Mac and you only ever capture inside Logseq, Blurt is a tidy, focused pick and worth a look. The moment you're on Windows, or you want the same flow across every program you open, the system-wide route wins. I'd reach for the one hotkey because I switch apps roughly forty times an hour and don't want forty different dictation buttons to remember.
Local or cloud: which mode for a private graph
For Logseq, try local mode first. The whole reason a lot of us chose Logseq is that it's local-first plain text — a meeting recap, a half-formed idea, a journal entry you'd never want on someone's server. It would be a strange choice to keep your notes on your own disk and then route your voice through a cloud to get there. If your Mac is Apple Silicon or your PC is from the last few years, local handles everyday dictation without complaint, and cloud becomes the escape hatch rather than the default.
Here's how the three paths differ, because the app makes you pick and I'd rather you pick well:
- Local Parakeet — NVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English. If you journal in English or another European language, this is the quick, fully offline pick.
- Local Whisper — slower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English. The English-only builds are English-only, not 99. Pick this for Chinese, Japanese, Korean, or any translation work, which Parakeet can't do. Default English model is around 480 MB.
- Cloud (OpenAI, BYOK) — best accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. Needs internet, so it's the one path that leaves your machine. The Cloud surface is part of Whisper Pro.
The boring truth is that for the kind of text most people put in Logseq, local is plenty. Both local engines run fully on your machine with nothing sent to a server, which is the entire point of a local-first graph. Cloud earns its place when you want top-tier accuracy on a hard recording or you need the model to pull a fact off the web mid-sentence. For a daily-journal habit, start local and only reach for cloud when local leaves you wanting.
Punctuation, blocks, and Logseq markdown by voice
Raw dictation comes out as a run-on. You say "okay so review the architecture doc tag it project alpha and remind me Thursday," and that's the unpunctuated wall any speech engine hands you. Cleaning it up is where the paths diverge.
Windows Voice Typing adds punctuation as you speak, and macOS Dictation handles basic punctuation when you say "comma" or "period." For heavier cleanup — stripping the "ums," fixing the run-ons, turning a spoken paragraph into something you'd actually keep in your graph — Whisper can run an AI pass. Say the activation phrase "Hey whisper" and the text gets enhanced before it lands. On a local model that runs through Ollama; in cloud mode it's gpt-5-mini by default.
okay so review the architecture doc tag it project alpha and remind me thursday um before the standup
Okay, so review the architecture doc, tag it Project Alpha, and remind me Thursday before the standup.
For Logseq's own structure — nested blocks, the #tag and [[page]] links, TODO markers — the honest answer is that voice gets you the text and Logseq's own syntax gets you the structure. Dictate the sentence, then type the Tab to indent, the # for a tag, or the [[ for a page link the way you always do. No dictation tool conjures Logseq's outline syntax into existence on command; anyone promising "say tag project alpha and watch it link" is selling you a demo, not a Tuesday. Get the words down fast by voice, shape the blocks with the keys you already know.
That same speak-then-clean flow pays off well beyond your graph — you can also dictate clean prose into any app with the one hotkey, so a long block becomes a few spoken sentences instead of a paragraph you type out.
When to skip a dictation tool for Logseq

Sometimes the right tool is the free one already on your machine, and pretending otherwise would be dishonest. If you only drop short captures into Logseq — a quick journal line, a two-word reminder — your operating system covers it for nothing.
On Windows, press Windows key + H and the built-in Voice Typing bar opens wherever your cursor is, a Logseq block included. It punctuates on its own and is fine for short bursts. The catch: it routes through Microsoft's servers and needs an internet connection, so it isn't an offline option, which matters more than usual when the whole point of your graph is staying local. On Mac, Dictation lets you speak to enter text anywhere you can type, set up in System Settings under Keyboard, and on Apple Silicon general text can be processed on-device. And if you're a Mac user who lives entirely inside Logseq, Blurt is a focused, native pick built for exactly that.
Reach for a dedicated, system-wide tool when the built-ins start hurting: long notes, multilingual work, offline privacy on Windows, or wanting one hotkey that behaves the same in Logseq, your email, and your editor. Below that bar, use what's free. I'm not going to tell you to install an app for a one-line reminder.
The same trade-off shows up if you also keep notes elsewhere — the logic in dictating into Obsidian is identical, because both are local-first markdown apps where the cursor, not a plugin, is the real integration.
Logseq never shipped a microphone button, and after writing this I'm fairly sure it never will. It doesn't need to, because the cursor is the integration. Talk into the block, get text, shape it with the syntax you already know. I dictated most of this guide into a text box that wasn't Logseq, with a tool that doesn't care which box it is, then pasted the lot into my own graph. That's the whole trick.
Try it in your next Logseq block
Hold the hotkey, talk, release. The transcript lands in whatever block your cursor is in — and in every other app too.
Free local mode for any signed-in account. No card required to start.



