Guide
OpenAI Whisper for Windows
OpenAI Whisper is a free, open-source speech-to-text model under the MIT License. On Windows it normally runs through Python and the command line to transcribe audio files. Whisper by Remskill bundles those models into a desktop app so you can dictate live into any app instead.
Last updated: June 2026

OpenAI Whisper is a free, open-source speech-to-text model released under the MIT License. On Windows it normally runs through Python and the command line, transcribing audio files you point it at. Whisper by Remskill bundles those models into a desktop app so you can dictate live into any app instead.
OpenAI Whisper is a free, open-source model. On a fresh Windows machine the official version wants Python, ffmpeg, and the command line to transcribe files. If you have a file, free GUI tools like Buzz or Whisper Desktop handle it. If you want to talk and watch your words land at the cursor in any app, Whisper by Remskill bundles the same models with nothing to build and a free local tier.
What people mean by "OpenAI Whisper for Windows"
The boring truth is that "OpenAI Whisper" is two different things wearing the same name, and the search results mix them up daily.
The first thing is the model. Whisper is a speech recognition model OpenAI open-sourced under the MIT License, so the code and the trained weights are both free to download and use. It ships in six sizes (tiny, base, small, medium, large, and turbo), four of which have an English-only variant, trading speed for accuracy. It is multilingual, and it can translate speech to English with one flag. That is genuinely impressive, and it is genuinely free.
The second thing is the way you actually run it. The official Whisper is a Python package. You install it with pip, you install the ffmpeg command-line tool alongside it, and then you feed it an audio file from a terminal. If "terminal", "pip", and "ffmpeg" already sound like a Saturday you did not plan on having, you have found the gap this whole article is about. The command line is the tool you use to type commands at the computer instead of clicking. Most people on Windows have never opened it on purpose.
So when someone types "OpenAI Whisper for Windows" into Google, they usually want one of two answers. Either: how do I get this free model transcribing my files without a computer-science degree? Or: I just want to talk and have my words appear, can this thing do that? Those are different needs, and they want different tools. I will answer both, and I will be honest about which tool wins each one.
The free model is great. The setup is the catch.
Here is the part the product pages skip. Whisper the model costs nothing. Whisper the experience, on a fresh Windows machine, costs you an afternoon.
To run the official OpenAI Whisper you install Python, then install the Whisper package, then install ffmpeg and make sure Windows can find it, then open a terminal and run a command for each file. Nothing here is hard for a developer. All of it is a wall for everyone else: the writer, the lawyer, the student, the salesperson, my own mother, who agreed to try dictation on the third demo and would have agreed to exactly zero demos involving the phrase "add ffmpeg to your PATH".
There are friendlier on-ramps, and they are worth knowing. Whisper.cpp is a plain C/C++ port of the same model: MIT licensed, fast, and CPU-only, with no Python at all. But you still build it from source or run it from the command line. It is a beautiful piece of engineering aimed squarely at people who enjoy compilers. The rest of this article is for the people who do not.
When you want the command-line Whisper instead (or a file transcriber)
I am going to send you somewhere else now, because this is the honest part.
If what you actually have is an audio file (a recorded interview, a podcast episode, a Teams call you saved, a voice memo) then our app is the wrong tool, and I would rather tell you that than sell you a mismatch. We do live dictation: you talk, the words land at your cursor. We do not take an existing file and transcribe it. Different job.
For that job, three free tools are genuinely good, and they are built exactly for it:
- Buzz transcribes and translates audio files offline, powered by OpenAI's Whisper, MIT licensed, and it runs on Windows. If you want a real window with buttons instead of a terminal, start here.
- Whisper Desktop (Const-me) is a Windows GUI app. You unzip it, run WhisperDesktop.exe, point it at a file, and it transcribes using your GPU via DirectCompute. It is MPL-2.0 licensed, and fast on a decent graphics card.
- whisper.cpp is the lean option if you are comfortable at the command line and want raw speed with no Python.
That is not me being diplomatic for the sake of it. Sending you to the right tool when it is not ours is the whole reason you should believe the rest of this. If you have a file, go use Buzz. If you have a microphone and a sentence in your head, keep reading.
What Whisper by Remskill actually does on Windows
We took the same open-source Whisper models, plus a second engine, and wrapped them in a Windows app so there is nothing to build and nothing to type into a terminal.
You install one app, about 25 MB. You sign in. You press the hotkey, which is Ctrl + Space by default and fully remappable. You talk. You let go. The text appears at your cursor in whatever app you were already in: Word, Outlook, the browser, Slack, a code editor, the search box. No file, no terminal, no GPU required; all local transcription runs on your CPU.
Under the hood you pick from three paths, because we do not pick a model for you:
- Local Whisper (8 models) is the open-source Whisper you came here for, bundled and ready. English-optimized from Base (~140 MB) up to Medium (~1.5 GB), plus multilingual builds up to Large v3 (~3 GB). The multilingual builds cover 99 languages and can translate to English.
- Parakeet (NVIDIA TDT, ~600 MB) is a separate engine, 5 to 10 times faster than Whisper on CPU, covering English plus 24 other European languages. No translate-to-English. Pick it if you want speed and you mostly work in English.
- Cloud (OpenAI, BYOK) lets you bring your own OpenAI key for top-end accuracy and web search; we take no cut. This is the one Pro feature.
The local pipeline (every Whisper model, Parakeet, AI cleanup via Ollama, history, presets, custom hotkey, model downloads) is free for any signed-in user, with no card at signup. The Cloud path is the paid Pro tier; you can see the numbers on the pricing page.
Why a real Windows dictation app is harder than pip install
Here is the thing nobody warns you about when they say "just wrap Whisper in a UI."
The model is the easy part. Getting a hotkey to behave on Windows is not. The first version of our hotkey handler fired the stop-recording callback six times for one real keypress. It worked perfectly on a Mac. It worked perfectly on a clean Windows install. It fell apart on real customer machines, the ones with a language input method enabled, which on Windows generates phantom Ctrl + Space release events at unpredictable moments. It took days of telemetry, then a 50ms debounce that was not enough, then a 300ms debounce that finally was. I learned more about the Windows input method framework than any person should, and I have a master's degree. My older daughter's verdict, when I explained it: "this is why dad's emails take forever."
That is the difference between a model and a product. The free Whisper gives you a transcription of a file. A dictation app has to survive the real Windows desktop, in real apps, while you do something else. The model never sees that fight. We do, and we lost it for about a week first.
When the built-in Windows tool is all you need
Tell people when not to buy your thing, and they might believe you about the rest. So: if you only dictate the occasional short note, you may not need any of this. Windows 11 has a built-in voice typing tool you open with Win + H. It is free and fine for a couple of lines, though it routes your audio to Microsoft's online speech recognition rather than running on your machine. For a quick Teams reply, that is plenty.
We start being worth the install around the point where you are drafting real text (long emails, briefs, lecture summaries, code comments, marketing variants) and you want it to stay on your machine, in 99 languages, with the same hotkey everywhere. If your day is two-line chats, you are done. If your day is writing, keep the app.
OpenAI Whisper is a free, open-source model, and on Windows it normally wants Python, ffmpeg, and a terminal to transcribe files. If you have a file, Buzz or Whisper Desktop will do it for free with a real window. If what you actually want is to talk and watch your words land at the cursor in any app, with no build, no command line, running locally on your CPU, that is what we made.
For the longer treatment of the free-versus-paid landscape, see voice to text on Windows. To choose between our two local engines, see Whisper vs Parakeet.
Dictate your first sentence in about a minute
Download Whisper by Remskill for Windows, sign in with no card required, press Ctrl + Space, and talk. The local pipeline is free for as long as you use it.
Free local transcription forever. No payment method at signup. The Cloud tier is the only paid feature.



