How to convert mp3 to text

To convert an mp3 to text, run the file through a speech-to-text tool. The free, private route is a local open-source app like Buzz or the OpenAI Whisper command line, which transcribe on your own machine. The faster-to-start route is a web converter you upload to.

Last updated: June 2026

A computer screen showing the waveform of a sound recording in audio editing software

So you have an mp3 and you need the words inside it. A recorded interview, a voice memo, a podcast episode, a lecture you saved off your phone. The job is the same in every case: take audio, get text you can edit.

The good news is that this is a solved problem in 2026, and most of the ways to do it are free. The slightly annoying news is that the tools all have names that sound the same, so let me sort them out.

There are three honest routes. Run a free local tool on your own computer (most private, no upload, costs nothing). Use a Mac app built for the job. Or upload the file to a web service that transcribes it on a server, which is fastest to start, though the audio leaves your machine. The right one depends on whether you care more about privacy or convenience, and how technical you feel today.

I should say the awkward part early, because it would be dishonest to bury it. The app my team makes, Whisper by Remskill, does not convert mp3 files. It is a live dictation tool. You hold a hotkey, you talk, and your words appear in whatever you are typing into. Different job entirely. I will explain where it fits near the end, but if you came here to convert an existing recording, the tools below are the ones you want.

The free, private route is a local open-source tool

If you do not want your recording sitting on someone else's server, run the transcription on your own computer. The engine almost everyone uses for this is OpenAI Whisper, released under the MIT license, free to use, free to read, free to run. It is the same family of model that powers a lot of the paid apps you have seen advertised.

There are a few ways to actually use it, from "I am comfortable in a terminal" to "please give me a button to click".

OpenAI Whisper (Python command line)

Install it with pip, install the ffmpeg tool it depends on, then point it at your file: whisper recording.mp3 --model turbo. It reads the mp3, transcribes it, and writes out a text file. There are six model sizes, from a tiny fast one to a large accurate one, so you can trade speed for accuracy. It is multilingual and can even translate non-English audio into English. The catch is the setup. pip and ffmpeg are not hard, but they are not nothing. I once spent twenty minutes fixing an ffmpeg path on a fresh laptop. I have a master's degree.

whisper.cpp

Same Whisper model, rewritten in plain C and C++ so it runs fast with no Python and no heavy dependencies. It runs on CPU alone and is tuned hard for Apple Silicon Macs. Also MIT licensed. You build it from source and run it from the command line, so it is squarely for the comfortable-in-a-terminal crowd. It is the lean option if you have a lot of files to chew through.

Buzz

This is the one I send non-technical people to. Buzz is a normal desktop app with a normal window. You open it, you pick your mp3, it transcribes offline on your machine. It is built on OpenAI Whisper, it can transcribe and translate, and it runs on macOS, Windows, and Linux. MIT licensed and free. No terminal, no pip, no ffmpeg wrangling. If you have one file and you want it done with the least fuss, this is the answer.

Whisper Desktop (Const-me)

A Windows app for people with a graphics card. It transcribes audio files and uses the GPU to do it quickly, which matters when your file is long. It is open source under the MPL-2.0 license. Windows only. If you are on a PC with a decent GPU and a two-hour recording, this is the fast lane.

Code and a command-line terminal open on a laptop screen on a clean desk

On a Mac, a dedicated app saves you the setup

If you are on a Mac and the command line is not your idea of a good evening, MacWhisper is built for exactly this. You drag an audio or video file into it and it transcribes on-device, so nothing leaves your machine. It runs the same OpenAI Whisper models, plus NVIDIA's Parakeet engine, and it does the file-transcription job well. It also exports to the formats you actually need, like subtitle files for video.

MacWhisper is file-first by design: recordings in, text out. That is the whole point of it, and it is good at it. I am pointing it out specifically because it is the closest thing to a one-click Mac answer for the exact thing you searched.

A web converter is fastest to start, but your audio leaves your machine

The other route needs no install at all. Plenty of web services let you upload an mp3, wait a minute, and download a transcript. No setup, no model to download, works from a phone or a borrowed laptop. For a quick one-off, that convenience is real, and I am not going to pretend otherwise.

Here is the one strong opinion in this article, and I will back it with the obvious reason rather than hand-waving. When you upload a recording to a web converter, the audio leaves your computer and lands on someone else's server. For a podcast you are about to publish anyway, who cares. For a recorded HR call, a doctor's note, or a client meeting where a salary number or a patient name gets said out loud, that is a privacy decision you are making, often without reading the page that tells you how long the file is kept. A local tool does the same job and the audio never goes anywhere. Cloud-only transcription is, for sensitive recordings, a privacy disaster waiting to be transcribed.

If a web converter is genuinely the right call for you, the transcription-service landscape is worth a look. I have written about that crowd elsewhere. Start with the fast-transcription walkthrough and the audio-to-text converter guide, which both cover the upload route and the local one side by side.

Pick accuracy and language with the model, not the marketing

Whatever tool you land on, accuracy mostly comes down to two things you control: the model size and the microphone the audio was recorded on. Bigger models are slower and more accurate. Smaller models are faster and lighter. Most of the local tools above let you pick, because they are all running the same underlying Whisper models under different buttons.

The boring truth nobody selling you a "smart AI" converter wants to say out loud: a clean recording on a cheap USB mic beats a muddy one run through the biggest model. The tool cannot un-hear the air conditioner. If your mp3 was recorded across a room on a laptop mic, manage your expectations and maybe re-record if you still can.

Where Whisper by Remskill fits, and where it does not

Now the honest bit I promised. Whisper by Remskill does not take your mp3 and turn it into text. It is built for a different moment.

It is a live dictation tool. You press a hotkey (Ctrl+Space on Windows by default, remappable), you talk, and your words get typed straight into whatever app you are in: your email, your doc, a Slack message, a code comment. The transcription happens locally as you speak, and the text lands at your cursor a beat after you stop. No file, no upload, no record-then-convert loop.

Pasted

The shipped post-dictation overlay — a live dictation finishing at your cursor, not a file being converted.

So when is that the tool you actually want? When the words you need do not exist as a recording yet, because they are still in your head. If your real goal was never "convert this file" but "get my own spoken words into a document fast", you skip the recording entirely. You think it, you say it, it is typed. The whole local pipeline is free, and it runs on Windows and Mac (Apple Silicon). I once dictated a teacher email, a grocery list, and a reply to my sister in the time it took the kettle to boil, then forgot to actually pour the tea. The tool worked. I did not.

Whisper

The live Whisper by Remskill app — sidebar, transcription panel, and AI instruction cards. This is the real interface, not a screenshot.

For the full picture of how the live, offline transcription works under the hood, the offline speech-to-text guide goes deeper. But if you have a recording sitting in your downloads folder right now, go back up the page. Buzz or the Whisper command line is what you want, not us.

If you just need this once

One file, one time, no plans to do it again? Open Buzz, drop your mp3 in, let it run. It is free, it works offline, and you will not have installed anything you have to maintain. That is the whole recommendation. Save the terminal tools for the day you have fifty files instead of one.

The fastest way to convert an mp3 is to not have an mp3. But for the recording you already have, a free local tool gets you there without sending it anywhere.

Whisper by Remskill is for live dictation, not file conversion

If your goal is getting your own spoken words into a document without typing, see how live dictation works. For converting a recording you already have, Buzz is the free answer above.

See how live dictation works How it works

Free local pipeline. Windows and Mac (Apple Silicon).

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.

How to convert mp3 to text

Last updated: June 2026