By Denys Medvediev

Guide

Voice to text in Roam Research

Roam Research has no built-in dictation. The fix is a system-wide tool: press a hotkey, speak, and the transcript pastes at your cursor in any Roam block. Your OS dictation works too, for short captures.

Last updated: June 2026

Open notebook and pen beside a laptop on a dark desk, evoking networked note-taking and dictation

Voice to text in Roam Research works through a system-wide tool, not Roam itself. Roam Research has no built-in dictation. The fix is a tool like Whisper: press a hotkey, speak, and the transcript pastes at the cursor in any Roam block. The operating system's own dictation works too, for short notes.

I keep a daily-notes page in Roam because the linked-thinking thing actually changed how I hold ideas — every block is a node, every [[page]] is a thread I can pull later. The one thing I always wanted was to talk a thought into a block instead of typing it. I went looking for the setting. There is no setting. Roam has no microphone button, and after a fair bit of digging, I'm confident it isn't hiding one from me.

People search for "voice to text in Roam Research," find nothing in the app, and assume they missed a toggle. They didn't. The toggle was never built. The good news is the fix takes about two minutes, runs fully offline if you want it to, and works in every other app you open as a bonus.

Here's the thing most pages dancing around this keyword won't say plainly. A Roam block is just a text box, the same as Gmail or a search bar. Dictation that pastes at your cursor doesn't care which app the cursor is in.

So the real question isn't "how do I turn on voice typing in Roam." There's no switch. The question is "which dictation tool do I run on top of Roam," and the answer depends on whether you want free-and-built-in, or one offline hotkey that behaves the same everywhere. I'll walk all of it, set one up in two minutes, and tell you when to skip the dedicated route.

Does Roam Research have built-in dictation?

Hands writing in a paper notebook beside a keyboard, contrasting typing with dictation

No. Roam Research has no built-in speech-to-text, dictation, or voice-typing feature for writing into a block by voice. There is no microphone button on a block, no voice command, no hidden preference. Roam takes typed input. If you've been combing the menus for a dictation toggle, you can stop. It isn't there.

What does exist is a handful of Roam Depot extensions and a Live AI Assistant with "speech" in the description, and this is where people get turned around. Those transcribe an audio file you've already recorded — a meeting, an interview, a clip you uploaded with /upload — into text after the fact, usually by calling the OpenAI Whisper API with your own key. They are useful, but they are not live dictation. You can't put your cursor in today's daily note, talk, and watch words appear. They process a recording; they don't type for you while you think. Conflating the two costs an afternoon, and I'd rather you skip that afternoon.

The mobile picture is its own thing, and worth one sentence so you don't chase it on the wrong device: there are companion capture apps that send a speech-to-text note into your graph from a phone, but that's a phone feature, and on a phone you'd just use the keyboard's microphone anyway. On the desktop graph most people actually live in, you need a tool that sits on top of Roam. There are a couple of honest categories, and the rest of this guide covers them.

Press a hotkey, talk, text lands in the block

This is the whole mechanic, and it's boring in the best way. You press a hotkey, you speak, you release, and the transcript pastes at your cursor, in whatever text field has focus. Whisper holds a short tail after you let go of the key, so your last word doesn't get clipped. Because it pastes at the OS cursor, a Roam block is just "any text box." The browser app or a desktop wrapper, same behaviour — there's no difference Roam can even tell.

That's the part the landing pages overcomplicate. There's no extension to install into Roam, no API token to paste, no sync job to babysit. Your cursor is in a block, you talk, the words appear in the block. A small capsule shows up while you speak so you know it's listening:

Cancel
The recording overlay: a small capsule that appears while you speak, so you know Whisper is listening.

The hotkey is the one thing worth getting right up front. On Windows it's Ctrl+Space; on Mac it's Command+Option, a modifier-only push-to-talk you hold while speaking. Both are changeable in Settings if they clash with something you already use. (My younger daughter once told me a hotkey "didn't work" in her drawing app. It was a conflict, not a bug, which is how I learned the average person has no idea what a hotkey conflict even is. So now every hotkey is customisable.) If you've ever set up dictation on Mac, this is the same muscle memory pointed at a different app.

Set it up in two minutes (Windows or Mac)

You need a Mac on Apple Silicon or a Windows 10-or-newer PC, a working microphone, and Roam open in your browser. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up. Here's the sequence.

Step 1 — Install Whisper and sign in.

Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.

You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.

Step 2 — Pick a transcription path.

The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. For private daily notes, start local — more on that two sections down.

You'll know it worked when a model finishes downloading and shows as ready.

Step 3 — Confirm your hotkey.

Windows defaults to Ctrl+Space, Mac to Command+Option held as push-to-talk. On Mac, grant the Accessibility permission when prompted; without it, the paste-at-cursor can't reach your browser.

You'll know it worked when a test recording pastes into any text field.

Step 4 — Put your cursor in a Roam block and talk.

Open your graph, click into a block, hold the hotkey, say a sentence, release. The transcript appears where the cursor is, in the block.

You'll know it worked when your spoken sentence is sitting in the Roam block as text.

Whisper
The real Whisper desktop app on the settings screen, with the Transcription and AI panels open.

The slow part is the model download, not the setup. Everything else is the four steps above. Once it's running, the act of capturing a thought into your graph stops being a typing task and starts being a talking task.

voice to text on Windows · on Mac

A Roam extension vs. a system-wide hotkey

Most pages ranking for this keyword point you at a Roam Depot extension — the Live AI Assistant, the Otter importer, something with "speech" in the name. Those are fine tools, with one structural catch in common. They transcribe audio you've already recorded — a meeting file, an Otter session, a clip uploaded into a block — not live speech into the block you're editing right now. You record, then you transcribe, then you clean up the result. That's a transcription workflow, not dictation. They're solving "I have an hour of audio" rather than "I want to talk this sentence into my daily note."

A system-wide hotkey sidesteps that entirely. It pastes at the OS cursor regardless of which window owns it, so the same key that fills a Roam block also fills your Gmail compose box, a Slack message, and a commit message. One tool, every text field, on both Windows and Mac. You don't relearn anything when you switch apps, and nothing has to know it's Roam — the cursor does the integrating.

If you mostly have recordings to transcribe — calls, lectures, voice memos you already captured — a Depot extension that calls Whisper on the file is the right shape, and worth a look. The moment what you actually want is to think out loud into a fresh block, live, the system-wide route wins. I'd reach for the one hotkey because I switch apps roughly forty times an hour and don't want forty different dictation buttons to remember.

Local or cloud: which mode for a private graph

For Roam, try local mode first. A graph fills up with the unfiltered stuff — a half-formed idea, a meeting recap, a journal entry you'd never want on someone else's server. If you'd think twice before posting a block publicly, you'd probably think twice about routing your voice through a cloud to write it. If your Mac is Apple Silicon or your PC is from the last few years, local handles everyday dictation without complaint, and cloud becomes the escape hatch rather than the default.

Here's how the three paths differ, because the app makes you pick and I'd rather you pick well:

  • Local ParakeetNVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. Covers English plus 24 other European languages, 25 in total. No translate-to-English. If you journal in English or another European language, this is the quick, fully offline pick.
  • Local Whisperslower than Parakeet on the same machine, but the multilingual builds cover 99 languages and can translate to English. The English-only builds are English-only, not 99. Pick this for Chinese, Japanese, Korean, or any translation work, which Parakeet can't do. Default English model is around 480 MB.
  • Cloud (OpenAI, BYOK)best accuracy and web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. Needs internet, so it's the one path that leaves your machine. The Cloud surface is part of Whisper Pro.

The boring truth is that for the kind of text most people put in Roam, local is plenty. Both local engines run fully on your machine with nothing sent to a server. Cloud earns its place when you want top-tier accuracy on a hard recording or you need the model to pull a fact off the web mid-sentence. For a daily-notes habit, start local and only reach for cloud when local leaves you wanting.

Punctuation, blocks, and Roam syntax by voice

Raw dictation comes out as a run-on. You say "okay so review the architecture doc tag it project alpha and remind me Thursday," and that's the unpunctuated wall any speech engine hands you. Cleaning it up is where the paths diverge.

Windows Voice Typing adds punctuation as you speak, and macOS Dictation handles basic punctuation when you say "comma" or "period." For heavier cleanup — stripping the "ums," fixing the run-ons, turning a spoken paragraph into something you'd actually keep in your graph — Whisper can run an AI pass. Say the activation phrase "Hey whisper" and the text gets enhanced before it lands. On a local model that runs through Ollama; in cloud mode it's gpt-5-mini by default.

Thinking...
Raw

okay so review the architecture doc tag it project alpha and remind me thursday um before the standup

Cleaned

Okay, so review the architecture doc, tag it Project Alpha, and remind me Thursday before the standup.

For Roam's own structure — nested blocks, the #tag and [[page]] links, TODO markers — the honest answer is that voice gets you the text and Roam's own syntax gets you the structure. Dictate the sentence, then type the Tab to indent the block, the # for a tag, or the [[ for a page link the way you always do. No dictation tool conjures Roam's outline syntax into existence on command; anyone promising "say double-bracket project alpha and watch it link" is selling you a demo, not a Tuesday. Get the words down fast by voice, shape the blocks with the keys you already know.

That same speak-then-clean flow pays off well beyond your graph — you can also dictate clean prose into any app with the one hotkey, so a long block becomes a few spoken sentences instead of a paragraph you type out.

When to skip a dictation tool for Roam Research

Two arrows chalked on pavement pointing different directions, illustrating a tool choice

Sometimes the right tool is the free one already on your machine, and pretending otherwise would be dishonest. If you only drop short captures into Roam — a quick daily-note line, a two-word reminder — your operating system covers it for nothing.

On Windows, press Windows key + H and the built-in Voice Typing bar opens wherever your cursor is, a Roam block included. It punctuates on its own and is fine for short bursts. The catch: it routes through Microsoft's servers and needs an internet connection, so it isn't an offline option, which matters more than usual when your graph is full of half-private thinking. On Mac, Dictation lets you speak to enter text anywhere you can type, set up in System Settings under Keyboard, and on Apple Silicon general text can be processed on-device. And if what you really have is recorded audio — a call, a lecture — a Roam Depot extension that transcribes the file is a better fit than any live-dictation tool.

Reach for a dedicated, system-wide tool when the built-ins start hurting: long notes, multilingual work, offline privacy on Windows, or wanting one hotkey that behaves the same in Roam, your email, and your editor. Below that bar, use what's free. I'm not going to tell you to install an app for a one-line reminder.

The same trade-off shows up if you also keep notes elsewhere — the logic in dictating into Obsidian is identical, because there too the cursor, not a plugin, is the real integration.

Roam never shipped a microphone button, and after writing this I'm fairly sure it never will. It doesn't need to, because the cursor is the integration. Talk into the block, get text, shape it with the [[ and # you already know. I dictated most of this guide into a text box that wasn't Roam, with a tool that doesn't care which box it is, then pasted the lot into my own graph. That's the whole trick.

Try it in your next Roam block

Hold the hotkey, talk, release. The transcript lands in whatever block your cursor is in — and in every other app too.

Free local mode for any signed-in account. No card required to start.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.