By Denys Medvediev

Guide

Dictate your dissertation

A dissertation is too long to type past the blank page. Talk the first draft instead: press a hotkey, speak a chapter, and the words land in Word, Scrivener, or LaTeX. Then edit. The local mode is free and runs offline.

Last updated: June 2026

Quiet university library reading room with long study desks and shelves of bound theses

To dictate your dissertation, install a system-wide dictation tool, press a hotkey, and speak the draft into whatever editor you write in — Word, Scrivener, Google Docs, or a LaTeX file. The transcript pastes at the cursor. A local engine runs fully offline and is free for any signed-in account, then you edit by keyboard.

The hardest page of a dissertation is the one with nothing on it yet. You have read the papers, you have the argument in your head, and the cursor sits there blinking while you decide how to start a sentence you have rewritten in your mind nine times. I have watched friends finishing PhDs lose whole evenings to that cursor. The thinking was done. The typing was the wall.

Talking is a way over the wall. You can say a rough version of a paragraph in the time it takes to type half of it, and a rough version on the page is something you can fix. A blank page is not. Dictating the first draft of a chapter is not about typing faster — it is about getting the bad version out so the good version has something to argue with.

Here is the part most pages about dissertation dictation skip. Your word processor is just a text field. So is Scrivener's editor, so is a Google Doc, so is the body of a `.tex` file in your code editor. A dictation tool that pastes at your cursor does not care which one you are in. There is no plugin to wire into your reference manager, no special "dissertation mode."

So the real question is not "which app supports dictation." It is "which dictation tool runs on top of the app I already write in," and for years-long, often unfunded work, two things matter more than they would for a quick email: it should run offline and on a free local tier, and it should learn the names and jargon your field throws at it. I will walk the workflow chapter by chapter, set one up in two minutes, and tell you the one job to give a different tool.

Why grad students talk the first draft

Desk covered with open academic books, printed papers, and a laptop during late-night thesis writing

The job is not "write faster words." The job is "stop staring." A dissertation chapter is eight to twelve thousand words, and the first version of every section of it is going to be clumsy no matter how you produce it. The only question is whether you produce a clumsy draft in an afternoon by talking, or fail to produce a clean one for a week by typing. Talking wins because it is allergic to perfectionism. You cannot edit a sentence mid-breath the way you can mid-keystroke, so the words come out and stay out, and you fix them later.

There is a second reason, and it is a plain physical one. A dissertation is the longest thing most people will ever write, often over months of marathon sessions, and hands have opinions about that. Dictating part of the draft means part of the day's writing happens with your hands off the keyboard. I am not going to dress that up as a medical claim, because it is not one — it is a productivity and comfort point, the same as standing up every hour. If wrist strain is the specific thing on your mind, the longer write-up on dictation as a way to rest your hands covers the productivity side of that honestly. For the dissertation itself, the point is simpler: you can keep drafting on the days your hands would rather you didn't type.

And the boring truth is that most of a dissertation is not the elegant final prose. It is the scaffolding — the "in this chapter I argue," the summaries of what so-and-so found, the linking paragraphs between sections. That scaffolding is exactly the stuff that comes out fine by voice and reads no worse than if you had typed it. Save the keyboard for the sentences that actually need to be precise.

Press a hotkey, speak, the text lands in your editor

The mechanic is dull, which is the highest compliment I can pay it. You press a hotkey, you speak, you release, and the transcript pastes at your cursor in whatever has focus — a heading in Word, a document in Scrivener, a paragraph in a Google Doc, a comment block in your LaTeX file. Whisper holds a short tail after you let go of the key, so the last word of a long sentence does not get clipped. Because it pastes at the operating-system cursor, your editor is just "the text box that happens to be in front."

That is the part the tutorials overbuild. There is no integration to install into Word, no add-on for Scrivener, no token to paste into your reference manager. Your cursor is in the document, you talk, the words appear. A small capsule shows up while you speak so you know it is listening rather than ignoring you:

Cancel
The recording overlay: a small capsule that appears while you speak, so you know Whisper is listening.

The hotkey is the one thing worth setting right before you start a long session. On Windows it is Ctrl+Space; on Mac it is Command+Option, a modifier-only push-to-talk you hold while speaking and release to stop. Both are changeable in Settings if they clash with something — and in a writing setup full of LaTeX shortcuts and reference-manager hotkeys, something usually does. If you have set up dictation on Windows or on Mac before, this is the same muscle, pointed at your thesis.

Set it up in two minutes (Windows or Mac)

You need a Mac on Apple Silicon or a Windows 10-or-newer PC, a working microphone, and your editor open — Word, Scrivener, a browser tab with Google Docs, or your LaTeX editor. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up, which matters when the work is going to take years and the funding situation is what it is. Here is the sequence.

Step 1 — Install Whisper and sign in.

Download from the download page, install, and create a free account. No card. The whole local transcription pipeline opens right away.

You'll know it worked when the app's tray icon appears and the setup wizard offers to pick a model.

Step 2 — Pick a transcription path.

The app doesn't choose for you. You get three: Cloud (OpenAI, bring your own key), Local Parakeet, or Local Whisper. For a long offline draft with field-specific terms, local Whisper is the one to reach for — more on why two sections down.

You'll know it worked when a model finishes downloading and shows as ready.

Step 3 — Confirm your hotkey.

Windows defaults to Ctrl+Space, Mac to Command+Option held as push-to-talk. On Mac, grant the Accessibility permission when prompted; without it, the paste-at-cursor can't reach your editor.

You'll know it worked when a test recording pastes into any text field.

Step 4 — Put your cursor in your draft and talk.

Open the chapter, click where the next paragraph goes, hold the hotkey, say a few sentences, release. The transcript appears at the cursor, in the document.

You'll know it worked when your spoken paragraph is sitting in the chapter as text.

Whisper
The real Whisper desktop app on the settings screen, with the Transcription and AI panels open.

The slow part is the one-time model download, not the setup. Everything after that is the four steps above. Once it runs, opening a chapter stops being "find the energy to type" and starts being "find the energy to talk," which on a tired Thursday is a much lower bar.

voice to text on Windows · on Mac

Drafting a chapter by voice, then teaching it your jargon

The workflow that works for long-form is talk in chunks, edit in passes. Do not try to dictate a polished chapter top to bottom — that is the typing mindset wearing a microphone. Instead, open your outline, put the cursor under a heading, and say the rough version of that section out loud the way you would explain it to a labmate over coffee. One section, a few hundred words, release the key, move to the next heading. You are filling the skeleton, not carving the statue. The carving is editing, and it comes later with the keyboard.

The thing that makes or breaks academic dictation is vocabulary. A dissertation is full of words no general transcriber expects — the methods you cite, the chemicals or constructs or theorems in your field, and worst of all the surnames. "Foucault," "Nyquist," "Bourdieu," a co-author's Polish or Korean name spelled exactly the way the citation needs it. A general engine will guess, and it will guess wrong, the same way autocorrect mangles a name it has never seen. This is where local Whisper earns its place: it supports custom vocabulary — you give it a list of hotwords, the author names and field terms you keep using, and it biases toward transcribing them correctly. Parakeet, the faster local engine, does not do hotwords, so for a jargon-heavy draft Whisper is the local pick. Cloud mode is strong on accuracy too, but the custom-vocabulary lever specifically is a local-Whisper feature.

Set that list up once at the start of the dissertation and it pays off for two years. Add the twenty or thirty terms and names that recur in your work, and the run-on you get back stops needing a find-and-replace for "Burdew" every paragraph. You will still fix things — no tool spells every name right on the first pass — but you are correcting the occasional miss instead of retyping every technical term you own.

Local or cloud for years-long, private work

For a dissertation, I would start local, and not only on principle. Unpublished research, an unfinished argument, interview material you are bound to keep confidential — none of that has any reason to travel to someone's server so you can type it with your voice. A local engine runs entirely on your machine with nothing sent anywhere, which is the same reasoning behind choosing a private, offline speech-to-text setup in the first place. It also has no per-minute cost and no internet requirement, which matters when the writing happens in a library basement with bad Wi-Fi over a couple of unfunded years. Here is how the three paths differ, because the app makes you pick.

The app does not choose for you, so pick with your actual draft in mind:

  • Local ParakeetNVIDIA's TDT engine, around 600 MB, and the fastest local option — 5 to 10 times faster than Whisper on CPU. English plus 24 other European languages, 25 in total. No translate-to-English, and no custom vocabulary, so it is the wrong pick for a jargon-heavy thesis. Good for fast, plain-English drafting where the terms are ordinary.
  • Local Whisperslower than Parakeet on the same machine, but it covers 99 languages, can translate to English, and crucially supports custom vocabulary and hotwords for your field's terms and cited names. For a dissertation full of surnames and jargon, this is the local engine to use. The default English model is around 480 MB; larger models trade speed for accuracy.
  • Cloud (OpenAI, BYOK)best raw accuracy and live web access, using your own OpenAI key billed straight by OpenAI. Transcription runs on gpt-4o-mini-transcribe by default. It needs internet, so it is the one path that leaves your machine — fine for non-sensitive sections, less ideal for confidential material. The Cloud surface is part of Whisper Pro.

The honest answer is that for most of a dissertation, local Whisper with a good vocabulary list is plenty, and it costs nothing and stays on your laptop. Cloud earns its place when you want top-tier accuracy on a hard recording or you need a fact pulled off the web mid-sentence. For two years of confidential drafting, local is the default and cloud is the occasional escape hatch.

Turning a spoken chapter into prose you can submit

Raw dictation comes out as a run-on. You say "so this chapter examines how Foucault's notion of discipline maps onto modern workplace surveillance drawing on the empirical work in chapter three," and that is the unpunctuated wall any speech engine hands back. That is fine — it is a first draft, and first drafts are supposed to be ugly. The cleanup is where it becomes readable.

Windows Voice Typing adds punctuation as you speak, and macOS Dictation handles basic punctuation when you say "comma" or "period." For heavier cleanup — stripping the "ums," fixing the run-ons, breaking one breathless sentence into three — Whisper can run an AI pass before the text lands. Say the activation phrase "Hey whisper" and the text gets enhanced on the way in. On a local model that runs through Ollama, fully offline; in cloud mode it is gpt-5-mini by default. It tidies the mechanics so you can spend your editing time on the argument, not the commas.

Thinking...
Raw

so this chapter examines how foucaults notion of discipline maps onto modern workplace surveillance drawing on the empirical work in chapter three um and the interview data

Cleaned

This chapter examines how Foucault's notion of discipline maps onto modern workplace surveillance, drawing on the empirical work in Chapter Three and the interview data.

What an AI pass will not do, and should not, is the academic editing. It will not check whether your citation supports the claim, fix a misremembered date, or notice that paragraph four contradicts paragraph one. That is your job, and it is the job, and dictation does not pretend otherwise. The honest sequence is: talk the rough draft, run the cleanup so the mechanics are sane, then read every line yourself with the keyboard and your supervisor's last set of comments open. The tool gets you a readable draft an hour earlier. It does not get you a defensible argument — that part is still on you, as it should be.

That speak-then-clean rhythm carries past the dissertation too — you can type faster with your voice in your email, your grant applications, and the eventual job-market cover letters, all with the same hotkey.

When dictation is the wrong tool for the job

Two arrows on a wooden signpost pointing different directions, illustrating a tool choice

Dictation drafts the words you say. It is not a transcription service for the words other people say, and confusing the two will cost you a frustrating afternoon. The most common mismatch in research work: turning a recorded interview, focus group, or field session into text. That is a different job. You are not drafting there — you are transcribing a multi-speaker recording, often with overlap, accents, and a need for speaker labels and timestamps. For that, reach for a dedicated transcription service built for audio files. A live-dictation hotkey is the wrong shape entirely; it listens to your microphone now, not to a two-hour MP3 from last Tuesday.

And for genuinely short bits, the right tool is the free one already on your machine. If you are dropping a one-line note into your reference manager or a quick comment in a shared doc, your operating system covers it. On Windows, press Windows key + H and the built-in Voice Typing bar opens wherever your cursor is. The catch: it routes through Microsoft's servers and needs internet, so it is not an offline option, which matters more than usual for confidential research. On Mac, Dictation lets you speak anywhere you can type, set up in System Settings under Keyboard, and on Apple Silicon general text can be processed on-device.

Reach for a dedicated, system-wide tool when the built-ins start hurting: long chapters, field jargon that needs a custom vocabulary, offline privacy for unpublished work, or wanting one hotkey that behaves the same in Word, Scrivener, and your LaTeX editor. Below that bar, use what is free, and for interview audio use something built for it. I am not going to tell you to dictate a dissertation chapter into the same tool you would use to transcribe a recording — those are two jobs, and pretending they are one is how people end up disappointed in both.

No editor ever shipped a "write my dissertation" button, and after a few years in the trenches you stop waiting for one. The cursor is the integration: talk into the document, get a rough draft, then earn the clean version with the keyboard and a lot of coffee. Get the bad draft out of your head and onto the page where you can fight with it. The fighting is the real work — dictation just gets you to the fight a few hours sooner, which on the days the page is blank is the whole game.

Talk your next chapter into existence

Open the draft, put your cursor under the heading, hold the hotkey, and say the rough version out loud. Edit it after. A blank page is harder than a bad one.

Free local mode for any signed-in account. No card required to start.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.