Guide
Meeting transcription software
One search term, two very different jobs. Some tools send a bot to join your call and write shared notes. Some run on a recording you already have, offline, on your own laptop. Here is how to tell which one you actually need.
Last updated: June 2026

Meeting transcription software turns spoken conversations from a call into searchable, written text. It works two ways: real-time, where the transcript appears live as people speak, and post-meeting, where a recording is processed afterward for cleaner, speaker-labeled, timestamped notes. Most tools auto-join calls through calendar sync and a meeting bot; a few work bot-free from a recording you already have.
The first time I watched a team rack up a real bill for transcribing meetings, the number had five digits and the meetings already had notes. A contractor had wired up an internal AI dictation prototype that called the cloud API for every utterance, with retry logic so aggressive it transcribed the same standup recording four times. The manager opened the cost dashboard at quarter-end. The room got quiet.
The boring truth about this category is that picking the wrong tool is expensive in money, in privacy, or in time, usually before anyone notices.
That is the whole point of this article. Teams have wanted hands-off meeting notes for a decade, and the tools finally do the job well. The catch is that they do wildly different jobs under one search term. Some join your video call as a bot and write shared notes for everyone. Some run on a recording after the fact, offline, on your own laptop.
Below I walk through both paths, name the tools worth knowing, and stay honest about the one job our own app does not do: it is not a meeting bot. As the person who reads our support email, I can tell you most of the confusion in this category comes from people picking the wrong kind of tool on day one. I have answered that same email enough times to recognize it from the subject line.
What meeting transcription software does (and the two ways it works)

Underneath the marketing, every tool here does one thing: it takes audio and produces text. The audio is human speech from a meeting. The text is a transcript. Everything else — summaries, action items, speaker labels, search — is built on top of that one conversion.
The category splits on when the conversion happens.
- Real-time transcription runs while people are talking. The words appear on screen a second or two behind the speaker. This is what you get from a live caption track in Zoom or Microsoft Teams, and from notetaker bots that show a running transcript during the call. It is useful in the moment: for accessibility, for following along, for catching a name you missed.
- Post-meeting transcription runs on a recording after the call ends. The tool has the whole file, so it can take its time. It cleans up false starts, labels who said what, adds timestamps, and stitches together a readable document. Post-meeting processing produces cleaner, speaker-labeled, timestamped text than the live version. The trade is that you wait for it.
Most of the well-known tools (Otter, Fireflies, Fathom, tl;dv) do both, and they get the audio the same way: a bot joins the call. You connect your Google or Outlook calendar, the tool sees a meeting with a video link, and it sends a participant into the room to listen and record. That little recording tile you have seen in a Zoom grid is the whole model in one frame.
A quieter third path hides behind the search term: transcribing a recording you already have, on your own machine, with no bot and no call. That is dictation-and-transcription software rather than a meeting notetaker, and it is where our own app lives. More on that below, including the honest part about when it is the wrong choice.
When a meeting bot is the right call (and Whisper is not)
Maria 10:02 Let us start with the launch date.
Tom 10:02 I think we slip a week.
If your problem is "something should join my Zoom, Teams, or Google Meet call, capture everyone, and hand the whole team shared notes afterward," you want a meeting bot. Whisper does not do this. It will not auto-join a call, it does not record other participants, and it does not run multi-speaker diarization across a video meeting. Pretending otherwise would waste your afternoon.
For that job, the right picks are the bot-based notetakers. Otter.ai joins Zoom, Microsoft Teams, and Google Meet to write and share notes automatically, and it has a free Basic plan if you want to try the model before paying. Fireflies.ai joins by invitation or by auto-joining your calendar meetings, and its free tier includes unlimited transcription with limited AI summaries. tl;dv records Google Meet, Zoom, and Teams, markets a no-bot-required capture mode, and offers a free-forever plan with no time limit. Fathom has a free-forever plan with unlimited recordings and a choice of bot-free (in beta) or bot capture.
Here is the part of the article where I send you elsewhere on purpose. Otter is for meetings. Whisper is for writing. They are different categories, and paying for the wrong one is the most common mistake in this whole space. If you need multi-speaker diarization across a recorded call, calendar auto-join, and a summary in the team channel by the time the meeting ends, a bot notetaker does a job our app was never built to do. We make the act of writing-by-voice fast; they make the act of capturing a room automatic. Pick the category first, the tool second.
How accurate is AI meeting transcription, really?

The honest answer: better than you expect on clean audio, worse than you hope on a real meeting. The category lands around 85 to 95 percent accuracy on clear, single-language audio, dropping with background noise, accents, jargon, and people talking over each other. Human-verified services climb back toward 99 percent, because a person fixes what the model missed.
Our own local mode reports accuracy that typically lands between 95 and 99 percent, with larger models scoring higher. I want to be careful here. That is our measurement on our software, not an independent head-to-head against Otter or Fireflies, and I am not going to invent one. Anyone who hands you a single accuracy percentage for meeting transcription without telling you the audio conditions is selling, not measuring.
Here is the part nobody markets, because there is no upsell in it. The microphone matters more than the model. A twenty-dollar USB mic does more for your transcript than jumping from a small model to the largest one. Most bad transcripts I have seen were not a model failure. They were a laptop mic picking up an air conditioner, four people sharing one room and one speakerphone, or a Bluetooth headset cutting the first word of every sentence. Fix the audio first. The AI cannot un-hear a kettle.
Two more things drive accuracy under the hood. One is how the tool decides where one person stops and another starts, which gets harder when people talk over each other (the reason any transcript of my family at dinner would read like a single 400-word run-on). The other is custom vocabulary support: the ability to feed it the product names, surnames, and acronyms that no general model has ever seen. Whisper lets you set custom vocabulary and bias toward hotwords on its local Whisper engine, and many meeting bots do too. If your calls are full of jargon, that single setting is worth more than a model upgrade.
Bot-free and offline: transcribing a recording you already have
Here is the path the search term keeps quiet, and the one our app is built for. You do not always need software to join a meeting. Sometimes you already have the recording — a voice memo from a one-on-one, an interview, a webinar export, a clip a colleague sent — and you just need clean text from it, on your own machine, without a bot in anyone's call.
Dictation-and-transcription software like Whisper fits here, and earns its keep on privacy. Everything in local mode runs on your laptop. The audio never leaves the device: no server in the loop, no vendor logs, no cloud cost meter. Your boss's salary discussion, the legal recording, the HR conversation: none of it should land in a third party's storage because you needed a transcript. Local-first is not a feature here. It is the whole point.
Whisper runs two local engines, both pure Rust through transcribe-rs, with no Python sidecar slowing the launch. The first is OpenAI's open-source Whisper, which on its multilingual builds covers 99 languages and can translate to English, with model sizes from Base at about 140 MB up to Large v3 at about 3 GB. The English-only builds are exactly that, English only, and they tend to run a little leaner. The second engine is NVIDIA's Parakeet TDT, about 600 MB, described in-app as 5 to 10 times faster than Whisper on CPU, covering English plus 24 European languages (25 in total) with no translate-to-English. Pick Parakeet for speed if you mostly work in English. Pick Whisper if you need translation or a language Parakeet does not cover.
The interaction is the same one I use all day. You hold the hotkey — Ctrl+Space on Windows, or the Command+Option push-to-talk chord on a Mac, holding both keys and releasing either to stop — speak, and the text lands at your cursor in whatever app is focused. A small overlay shows the state while it works. For a recording rather than live speech, you point the app at the file and get the transcript back. If you want the dictation side specifically, our offline speech-to-text guide goes deeper on running everything on-device.
A Cloud option exists too, for people who want the latest OpenAI models and a voice-driven web search in the same tool. Bring your own OpenAI key, say "Hey whisper" to route the text through the AI. But for transcribing a recording you already hold, local mode is the answer, and it is free for any signed-in user.
The other tools worth knowing
This category is crowded, and the search results are dominated by lists ranking six to ten tools each. Here is a plain map so you are not reading ten reviews to learn what each one is for. Every capability below comes from the tool's own pages.
- Otter.ai — the default meeting notetaker. Bot joins Zoom, Teams, and Meet; free Basic plan with 300 monthly minutes, paid Pro and Business tiers above it. Transcription in six languages: English, Spanish, French, German, Japanese, Chinese.
- Fireflies.ai — bot joins by invite or calendar auto-join. Free plan with unlimited transcription and limited AI summaries; advertises 100+ languages across tiers.
- tl;dv — records Meet, Zoom, and Teams, markets a no-bot-required mode, transcribes in 30+ languages, free-forever plan with no time limit and no card required.
- Fathom — free-forever plan with unlimited recordings, plus a choice of bot-free (beta) or bot capture; paid Premium, Team, and Business tiers above.
- Notta — has a meeting bot for Zoom, Teams, and Meet and a free tier; its own help center lists around 58 languages.
- Zoom and Teams, built in — before you buy anything, check what you already pay for. Zoom transcribes cloud recordings and offers AI Companion real-time transcription in 46 languages on eligible paid plans. Microsoft Teams has built-in live transcription across roughly 50-plus spoken languages; live translated transcription needs Teams Premium.
Here is the same map as a table, with only the parts you can verify on each tool's own pages. No accuracy or speed numbers, because nobody has run them head-to-head on the same audio, and I will not invent the test.
| Tool | Capture | Local/Cloud | Works offline | Pricing model | Languages | Best for |
|---|---|---|---|---|---|---|
| Otter.ai | Bot joins the call | Cloud | No | Free tier + per-user paid | 6 | The default team notetaker |
| Fireflies.ai | Bot by invite or auto-join | Cloud | No | Free tier + per-user paid | 100+ | Generous free transcription |
| tl;dv | Records call, no-bot-required mode | Cloud | No | Free-forever + paid | 30+ | No bot in the meeting grid |
| Fathom | Bot-free (beta) or bot | Cloud | No | Free-forever + paid | Not stated on its pricing page | Unlimited free recordings |
| Notta | Bot joins the call | Cloud | No | Free tier + paid | ~58 (its help center) | A bot plus a free tier |
| Zoom / Teams (built in) | Native to the call | Cloud | No | Included in eligible paid plans | Zoom 46, Teams 50+ | What you already pay for |
| Whisper by Remskill | No call; transcribes a file or dictation | Local (Cloud optional) | Yes | Free local tier + Pro | 99 multilingual, 25 Parakeet | Private, bot-free, on-device |
If your meetings already run on a paid Zoom or Teams plan, the built-in transcription may be all you need, and you are not adding another subscription or another bot to the call.
What I'd pick for each situation
I read the support email, so I see the wrong-tool regret often enough to have opinions. Here is how I would choose.
- You want notes from a team video call, automatically, shared with everyone. Use a bot notetaker. Otter if you want the polished default, Fireflies or Fathom if you want a generous free tier, tl;dv if no-bot-in-the-grid matters to you.
- You are already on a paid Zoom or Teams plan. Try the built-in transcription before paying for a third tool.
- You have a recording and want clean text, privately, on your own machine. This is the bot-free, offline path: Whisper, or another local transcription tool. The audio stays on the device.
- You want to write by voice (emails, docs, notes during or after the call) at the cursor, in any app. That is dictation, and it is the job Whisper was built for. Our comparison of transcription software lays out the dictation-versus-meeting-notes split in more detail.
- You need a guaranteed near-perfect transcript for a legal or compliance record. Use a human-verified service. AI alone tops out below 99 percent on real audio.
The mistake to avoid is paying for a meeting bot to do dictation, or expecting a dictation tool to join your calls. Different categories. Pick the one that matches the job. I have built software for fifteen years and still bought the wrong tool for a job last year, so this is not a lecture from someone who got it right the first time.
Pricing, in flat numbers
Most tools here have a free tier worth trying before any card comes out. Otter, Fireflies, tl;dv, and Fathom all offer a free plan, with paid tiers when you need more minutes, more seats, or unlimited storage. The bot notetakers generally charge per user per month, which adds up fast across a team.
Whisper is free for every signed-in user across the entire local pipeline — both engines, AI enhancement through Ollama, history, presets, custom vocabulary, the hotkey, all of it — with no payment method asked for at signup. The paid tier adds the Cloud surface for people who want OpenAI's models and voice web search. Exact numbers for monthly, yearly, lifetime, and team seats live on the pricing page. I would rather you start free and decide for yourself than take a price out of context here.
Pick the kind of tool before the brand. If a bot should join your call, use a notetaker. If a recording on your laptop should become private text, use something offline. The five-figure bill I watched a team run up came from never asking which job they were paying for, and that is a meeting nobody needed a transcript of.
Try the bot-free path on a recording you already have
Download Whisper, point it at a recording, and watch clean text come back — on your own machine, with no bot in anyone's call.
Free for every signed-in user across the entire local pipeline. No payment method at signup.



