By Denys Medvediev

Guide

How to transcribe interviews

To transcribe an interview automatically, run the recording through a speech-to-text tool: a free open-source option like Buzz or OpenAI Whisper on your own computer for privacy, or a cloud transcription service when you also need speaker labels and a polished editor. Pick local for free and private, cloud for diarization.

Last updated: June 2026

A podcast studio desk with microphones and an audio mixer, set up for recording a conversation

To transcribe an interview automatically, run the recording through a speech-to-text tool: a free open-source option like Buzz or OpenAI Whisper on your own computer for privacy, or a cloud transcription service when you also need speaker labels and a polished editor. Pick local for free and private, cloud for diarization.

I'll say the awkward part first, because it saves you ten minutes. Whisper by Remskill, the app this blog belongs to, does not transcribe interview recordings. It is live dictation: you hold a hotkey, speak, and the words land at your cursor in any app. That is a different job from feeding it a one-hour recording of two people and getting back a labelled transcript. So this guide is about the tools that actually do the interview job, written by someone who'd rather send you to the right one than pretend we're it.

An interview transcript is harder than it sounds for one reason: speakers. A plain transcription tool gives you a wall of text. What you usually want is "Interviewer:" and "Subject:" in front of each turn. That's called diarization, and not every tool does it. The split that matters is local versus cloud. Local tools run on your laptop, cost nothing, and never upload your audio. Cloud services upload the file but tend to handle speaker labels and give you an editor. Below is the honest map, then the part where I tell you exactly where we fit and where we don't.

The free, private way runs on your own computer

If the interview is sensitive (a source who needs protecting, a patient, an internal exec) the recording should never leave your machine. The free open-source tools transcribe entirely on-device.

OpenAI's Whisper is the model most of these are built on. It's released under the MIT license, you install it with a single pip command, and it transcribes audio files from the command line. It ships in six sizes, four with English-only variants, so you trade speed for accuracy depending on your hardware. It's multilingual and can even translate speech to English as it transcribes. The catch for interviews: base Whisper writes down the words, but it does not label who said them. Speaker diarization needs extra tooling bolted on, or a cloud service that bakes it in.

If a command line makes your eye twitch, Buzz is the easy button. It's a graphical app that transcribes and translates audio offline on your personal computer, powered by Whisper, and it's MIT-licensed and available on macOS, Windows, and Linux. Drag in the recording, pick a model, wait, read the transcript. For most people transcribing an interview for free, this is the shortest path.

Two more worth knowing. whisper.cpp is a plain C/C++ port of Whisper that runs CPU-only and is heavily optimized for Apple Silicon: faster, no Python, but you build it and drive it from the command line. And MacWhisper is a Mac app built around on-device Whisper and NVIDIA's Parakeet that leads with file transcription, which is exactly the interview use case. All of these keep the audio on your machine. None of them, on their own, hands you clean speaker labels.

Cloud services add speaker labels and an editor

This is the fork where you decide what your privacy is worth. The dedicated transcription services upload your recording to their servers, run it, and give you back a transcript that usually names the speakers and drops it into an editor where you can fix names and export. That convenience is real, and for a public podcast or a panel you're fine with sharing, it's the better tool.

If that's your need (multi-speaker labels, timestamps, a clean editing surface) look at the established meeting-and-recording transcription category rather than a dictation app. I've written up that landscape in the Otter.ai alternatives piece and the Rev alternatives one; both cover the cloud tools that do diarization and editing properly.

Here's the opinion, and it comes with a bill attached. A team I worked with once had a contractor build an internal dictation prototype that called a cloud AI for every utterance. The manager opened the cost dashboard at the end of the quarter and found a five-figure number, most of it from re-transcribing standup recordings four times over because the retry logic was too eager. The CFO's response was short: or we could stop paying to upload meetings that already have notes. The money was the small problem. The bigger one was that a quarter of internal calls now lived on a vendor's servers. Cloud transcription is genuinely the right call for speaker labels and editing. It is the wrong call for a recording you'd never want leaving the building. Pick on that axis, not on the marketing.

How to pick, in one breath

There are three kinds of people who land on this page: the privacy-conscious, the deadline-driven, and the ones who just want speaker names without thinking about it. Two of them should go local.

  • Need it free and privateBuzz (easiest) or Whisper on your own machine. Audio never uploads.
  • Need speaker labels and a polished editora cloud transcription service. Audio uploads; that's the trade.
  • On a Mac, want a file-first appMacWhisper, on-device.

The honest tiebreaker: if the recording is sensitive, the answer is local, full stop. If it's a public talk and you want diarization handed to you, cloud earns its keep. Most interview transcription is the first kind, which is why I lead with the free local tools. If raw turnaround time is your worry, the guide to transcribing audio fast walks through the speed knobs.

Where Whisper by Remskill actually fits

Pasted
The shipped post-dictation overlay — what one live, fully-local dictation looks like the moment it finishes. There is no "upload a recording" button here.

Now the part where I draw the line clearly, because the worst outcome of this article would be you downloading our app expecting it to chew through a recording. It won't. Whisper by Remskill is dictation-first: a hotkey triggers your live speech, which gets transcribed and pasted at the cursor in whatever app you're in. There's no "upload an interview file" button, and there's no speaker diarization, because it's built for one voice: yours, in the moment.

So where does an interviewer use it? Around the interview, not on it. Dictating your prep questions into a doc before you walk in. Talking out your follow-up notes the second the conversation ends, while the impressions are fresh and your hands are still holding a coffee. Drafting the write-up by voice once the transcript exists. The default hotkey on Windows is Ctrl+Space, fully remappable, and the whole local pipeline is free for any signed-in user with no card at signup. There's also a paid cloud tier that adds OpenAI-powered transcription and web search for live dictation, but that's still about typing with your voice, not transcribing a two-person recording.

Whisper
That's the real desktop app — sidebar, transcription panel, and AI instruction cards — not a screenshot.

Use it for the writing around the interview. Use Buzz or a cloud service for the interview itself. Different tools, different jobs. I'd rather you knew that before you installed anything.

A last word

Most of the interview recordings worth transcribing are the ones you'd least like to upload: the off-the-record aside, the source who trusted you, the patient. That's the whole reason the free local tools earn their place, because the file stays on your laptop. I once spent a weekend tuning model settings to clean up my own muddy audio before I noticed the real problem was the laptop mic sitting six inches from a fan. I have a master's degree. Buy the microphone first.

And when the transcript is done and it's time to actually write the piece, that's when our app stops being a bystander.

Dictate the write-up, once the transcript exists

Transcribe the interview with Buzz or a cloud service. Then download Whisper by Remskill and dictate the piece — the one part of the interview workflow we were built for. The local pipeline is free, with no card at signup.

Free local dictation forever. No payment method at signup. We don't transcribe recordings — use a local tool or a cloud service for that.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.