Explainer
Which Whisper model should I use
There's no single right Whisper model — the right one depends on whether you care most about speed, accuracy, language, or disk space. This guide maps each shipped model to a use case so you can pick in about a minute, and tells you when to skip Whisper for Parakeet instead.
Last updated: June 2026

The best Whisper model depends on the job: pick a small English model for everyday English dictation, a multilingual model for other languages, the large model for top accuracy, or Turbo for speed near large quality. For mostly-English speed, Parakeet beats Whisper. The app presents all of them and lets the user choose.
I get this question more than any other, usually phrased as "I downloaded the app, now which model do I pick." It's a fair question, and the honest first answer is that there isn't one model that wins. There's a model that wins for your machine, your language, and how much you care about waiting an extra half-second. So the app doesn't pick for you. It shows you the options and gets out of the way.
That sounds like a cop-out until you see the spread. The smallest English model is around 140 MB and runs on a laptop from 2016. The best multilingual one is around 3 GB and wants 16 GB of RAM. Between those two live six other choices plus a separate engine called Parakeet. Pick wrong and you either wait too long or transcribe in the wrong language. Pick right and you forget the model exists, which is the goal.
Here's the frame that makes the whole list click. Every model is a trade between four things: speed, accuracy, how many languages it knows, and how much disk and RAM it eats. You can't max all four. A 3 GB model is more accurate and knows more languages, but it's slower and won't fit on an 8 GB machine. A 140 MB model is instant but only does English and only so well.
So the real question isn't "which model is best." It's "which trade do I want." Once you know whether you're an English-only dictator on a modest laptop, a translator working across nine languages, or someone who just wants the fastest local option that exists, the choice falls out on its own. I'll walk the English-only models, the multilingual ones, where Parakeet beats all of them, and the one-line recommendation if you don't want to read the rest.
Start with one question: what do you care about most?

Before any model name, answer one question: which of these matters most to you right now — speed, accuracy, language coverage, or disk space? You only get to pick one as the priority, because the models trade against each other. Most people who agonise over this haven't decided what they're optimising for, which is why the list looks paralysing. It isn't. It's four short answers wearing eight names.
If you want speed and you speak English, you'll end up on a small English model or, more likely, on Parakeet. If you need a language other than English, you're in the multilingual family whether you like it or not. If you want the most accurate transcription you can get locally and you have the RAM for it, that's the large model. And if disk space is tight, the smallest model is your friend and the 3 GB one is off the table. That's the entire decision tree, and the rest of this guide just fills in the names.
One thing the app does on purpose: it never forces a default on you. There's no "recommended" badge nudging you toward the model that happens to make us look good in a benchmark. You see Cloud, you see Parakeet, you see the eight Whisper models split into English-only and multilingual, and you choose. If you've set up voice to text on Windows or on Mac before, this is the same screen pointed at a different question.
The English-only models, from tiny laptop to top accuracy
If you only ever dictate in English, the English-only models are the efficient pick — they drop the multilingual machinery and spend that budget on English instead. There are four, and they line up neatly from "old laptop" to "best English you can run locally." You press the hotkey, speak, release, and the transcript pastes at your cursor regardless of which one you chose; the only difference is speed and how often it nails a tricky word. A small capsule shows up while you talk so you know it's listening:
The smallest is Base, around 140 MB. It's the one to pick on a 2016 laptop or an 8 GB machine where you want dictation that just works without thinking about RAM. Above it sits Small at around 480 MB, the balanced English option — slower than Parakeet, but it supports translate-to-English and hotword biasing, which Parakeet doesn't. Then Medium at around 1.5 GB, which wants 16 GB of RAM and gives you the highest plain-English accuracy in the family. (On a public benchmark the medium English model lands around 3% word error rate on clean audio; Small is closer to 5%. Real-world numbers depend far more on your microphone than on which of these you pick.)
The fourth one confuses people, so let me be plain about it. Turbo, which is the distil-large-v3 model, is also around 1.5 GB and is described as 6× faster than the large model with 99% of its accuracy. That sounds like a free lunch, and for English it nearly is — it's the pick when you want close-to-best English accuracy without the speed penalty of the full large model. The catch is the "English-only" label: these four know English and only English. The moment you need a second language, you've left this family entirely, which is the next section.
The multilingual models, for the other 98 languages
The moment your audio isn't English, you want a multilingual model. Whisper's multilingual builds cover 99 languages with auto-detect, and they're the only local path that can translate speech into English as it transcribes. The English-only models can't do that, and neither can Parakeet. So if you dictate in Ukrainian, draft a note in Japanese, or want a Spanish recording to come out as English text, this family is the answer, full stop.
There are four here too, and they mirror the English-only sizes. Small, around 480 MB, is the fast multilingual baseline — the overall default model the app ships with, because it's the safest first guess when nobody knows your language yet. Medium, around 1.5 GB, trades speed for noticeably better quality. Large v3, at around 3 GB, is the best accuracy you can get locally and the right pick for professional multilingual work, provided you have 16 GB of RAM to feed it. And Large v3 Turbo, around 1.62 GB, is the fast multilingual tier — most of the large model's quality at a fraction of the wait.
A word on the language count, because the marketing-safe number and the real one differ depending on what you mean. The multilingual models genuinely cover 99 languages; the English-only models cover exactly one. If you mostly speak English and occasionally hit a second European language, you have a faster option than any of these, and that's Parakeet — which is the next thing to understand, because it's the model people most often pick by mistake or skip by mistake.
When Parakeet beats Whisper, and when it doesn't

Parakeet isn't a Whisper model at all — it's NVIDIA's TDT engine, around 600 MB, and it's the fastest local option the app ships, described as 5 to 10 times faster than Whisper on CPU. If you have an older or laptop-class CPU with no spare GPU, that speed gap is the difference between dictation that feels instant and dictation that makes you wait. For everyday English work, Parakeet is the one I reach for first.
It covers English plus 24 other European languages — 25 in total — so for a lot of European users it's plenty. What it deliberately doesn't do is the Whisper-only stuff: no translate-to-English, no hotword biasing, no custom-vocabulary prompt. If your work is monolingual English (or one of those 24 European languages) and you just want it fast, Parakeet wins and the question's over. There's more on it in the Parakeet model breakdown if you want the full picture.
Whisper wins the moment you step outside that box. Need Chinese, Japanese, or Korean? Multilingual Whisper, because Parakeet doesn't speak them. Need to translate a recording into English? Whisper multilingual, the only local path that does it. Want to bias the model toward a list of product names or jargon so it stops mangling them? Whisper, via hotwords. The rule of thumb: Parakeet for English speed, Whisper for languages, translation, and control. The app ships both because neither one is the right answer for everyone.
Size, speed, and accuracy: how the trade actually works
It helps to see the three forces side by side, because every model is just a different point on the same triangle. Bigger files are more accurate and slower; smaller files are faster and lighter on RAM; and the special engines bend the curve. Here's the honest version of each force, since the app makes you pick and I'd rather you picked knowing the cost.
Three ways to read the lineup, depending on what's pinching you:
- If speed is the problem — reach for Parakeet first — around 600 MB and 5 to 10 times faster than Whisper on CPU. On a machine without a GPU, nothing local touches it for everyday English. The cost is no translate-to-English and no hotwords.
- If accuracy or language is the problem — go bigger in the Whisper family. Large v3 at around 3 GB is the best local accuracy and covers 99 languages, but it wants 16 GB of RAM. Turbo variants get you most of that quality with far less waiting. Small and Medium are the sensible middle.
- If disk space or RAM is the problem — stay small (Base at around 140 MB), or skip local entirely and use Cloud mode, which runs on any hardware because it's just a network call to OpenAI with your own key. Cloud is part of Whisper Pro and needs internet.
The boring truth is that for most people, on a recent machine, the difference between the mid-sized models is smaller than the difference your microphone makes. A $20 USB mic does more for accuracy than jumping from Small to Large — the public Whisper benchmarks back this up, and I've watched it play out on my own desk more than once. So don't agonise over Medium versus Large on day one. Pick something that fits your RAM, get dictating, and upgrade the model later if a word keeps coming out wrong. The model you'll actually keep is the one that's fast enough that you forget it's there.
Try one, then switch in two clicks if it's wrong
Here's the part that takes the pressure off the whole decision: you are not marrying the model you pick first. Switching is two clicks in Settings, and the only real cost is the download for whichever model you move to. So the right strategy isn't to research for an hour — it's to make a reasonable first guess, dictate with it for a day, and switch if it annoys you. The whole local pipeline is free for any signed-in account, with no payment method asked for at sign-up, so trying a few models costs you nothing but disk space.
Step 1 — Open Settings and find the Transcription panel.
That's where the model list lives, split into English-only and multilingual, with Parakeet and Cloud alongside. Nothing is pre-selected as "the best."
You'll know you're in the right place when you see the model list with sizes next to each name.
Step 2 — Make your first guess from the section above.
English and want speed: Parakeet. English and want accuracy: Small or Medium English. Other languages: a multilingual model. Tight on RAM: Base.
You'll know it worked when the model finishes downloading and shows as ready.
Step 3 — Dictate with it for a day.
Use it on real work, not a test sentence. You learn more from one afternoon of actual notes than from any benchmark chart.
You'll know it's the right model when you stop noticing it and just talk.
Step 4 — Switch if it's wrong.
Too slow, pick something smaller or Parakeet. Missing a language or mangling words, go multilingual or larger. Two clicks, one download, done.
You'll know it worked when the new model loads and your next recording uses it.
People treat this like a one-way door, and it isn't. The first model I ever ran wasn't the one I kept; I started on a multilingual model out of habit, realised I was dictating in English all day, and moved to Parakeet for the speed. Took two clicks and a coffee's worth of download. Treat your first pick as a draft.
The quick recommendation, if you skipped to the end
If you read nothing else, here it is. English, want it fast, modest machine: Parakeet. English, want the best local accuracy: the Medium English model, or Turbo if you want that accuracy without the wait. Another language, or you need translation: a multilingual model — Small to start, Large v3 if accuracy matters and you have 16 GB of RAM. Tight on disk or RAM: Base. Want top-tier accuracy with web access and you're fine using your own OpenAI key: Cloud. That's the whole map.
Whichever you pick, the raw transcript comes out as a run-on, and that's true of every speech engine, not just ours. You say "okay so set the meeting model to medium and remind me to test the large one later," and that's the unpunctuated wall you get back. Whisper can run an AI cleanup pass to fix the punctuation and strip the filler before the text lands — say the activation phrase "Hey whisper" and it tidies up first. On a local model that runs through Ollama; in cloud mode it's gpt-5-mini by default.
okay so set the meeting model to medium and remind me to test the large one later um maybe parakeet for the quick stuff
Okay, so set the meeting model to Medium and remind me to test the Large one later — maybe Parakeet for the quick stuff.
One honest caveat that belongs at the end of any "which model" guide: if all you ever do is drop a 30-word note into a text field, you may not need to pick a model at all. On Windows, the built-in Voice Typing bar opens with Windows key + H wherever your cursor is — it punctuates on its own and is free, though it routes through Microsoft's servers and needs internet. On a Mac, Dictation in System Settings does the same, and on Apple Silicon general text can be processed on-device. Below the threshold where accuracy and length start to hurt, use what's already on your machine. We start being worth the download when you're doing real volume, want offline privacy, or need a language and control the built-ins don't offer. I'm not going to tell you to install an app to dictate a grocery list.
The "best" Whisper model is the one you stop thinking about. Pick the trade you care about, make a first guess, and switch in two clicks if it annoys you. I've shipped systems where the architecture diagram was wrong by the second commit, so I have a healthy respect for "just try it and adjust." Your model choice is lower stakes than that, and a lot easier to undo. Start somewhere. The download is the slow part; the deciding shouldn't be.
Pick a model and start talking
Make a first guess, dictate for a day, switch in two clicks if it's wrong. The app shows you every option and lets you choose.
Free local mode for any signed-in account. No card required to start.



