By Denys Medvediev

Tutorial

Voice to text in Zendesk: calls vs your replies

Zendesk Talk transcribes the customer's call. It doesn't type your reply. For dictating the reply, note, or macro you actually write, a system-wide hotkey tool like Whisper handles it — offline, in the Agent Workspace, and in every side app you live in.

Last updated: June 2026

Close-up of a support agent's desk with a headset resting on printed charts and ticket documents

Voice to text in Zendesk splits into two different things. Zendesk Talk transcribes the recorded customer call or voicemail and attaches the transcript to the ticket log — it does not type your reply. For dictating the reply, note, or macro you actually write, Zendesk has no native feature; a system-wide tool like Whisper handles that with a hotkey.

Most agents searching this want one thing: to stop typing the same answer for the fortieth time today and just say it out loud. The speaking-versus-typing gap is real — most people speak around 150 words a minute and type maybe a third of that under queue pressure, which is the whole case for dictation. So the search makes sense. The confusion is what Zendesk's voice features actually do, because they sound like dictation and aren't. I spent a good twenty minutes in Zendesk's docs convincing myself I'd missed the agent-dictation toggle. I hadn't. There isn't one. Let me draw the line cleanly, then show you the part that works.

Here is the short version. Zendesk's voice tooling lives on the phone channel. It listens to the customer. The thing you're picturing — you, talking, and your words landing in the reply box — is a different category, and it lives at the operating-system level, not inside Zendesk. Once you see that split, the whole thing stops being confusing.

Press a hotkey, talk, and your reply types itself

The mechanic is one key. You hold a global hotkey, you talk, you release, and the text lands wherever your cursor is sitting — the public reply, an internal note, a macro body, a Guide article. On Windows the default is Ctrl+Space; on macOS it's Command+Option held as push-to-talk. No menu, no upload, no "click record." The same key works in the Zendesk Agent Workspace and in every other app you jump to between tickets — Slack, Teams, Gmail, Notion.

That last part matters more than it sounds. Whisper is a native desktop app for Windows and macOS, not a browser extension. So when you alt-tab out of the Zendesk tab to ping engineering in Slack about a bug, the same hotkey still works. A browser extension stops at the edge of the tab. The same OS-level reach is why the trick works in your CRM too — agents use it the same way for voice to text in Salesforce and dictation in HubSpot.

Zendesk Talk transcribes the call. It does not type your reply.

Headset resting on customer-service charts and documents on an agent's desk

This is the line everyone trips on, so here it is plainly. Per Zendesk's own call-transcription FAQ, Talk takes a recorded phone call between a customer and an agent, and after the call ends, it adds the transcript and a summary to the ticket conversation log as internal notes. Only recorded calls get transcribed. Zendesk also transcribes voicemail audio, which Zendesk prices at around a cent a minute.

All of that is the voice channel. It transcribes the call the customer is on. It is genuinely useful — if you want a written record of a spoken call attached to the ticket, that is exactly Zendesk's job, and you should use it.

What it is not is agent dictation. None of those features let you speak your typed reply into the composer. The boring truth is Zendesk has no native feature for that. A Zendesk employee confirmed it in the company's own community forum: real-time voice transcription was roadmap-only and slipped from early 2024 to a later quarter, and even that item was about the call channel, not agent dictation. A separate request thread asking for speech-to-text typing went unanswered. The in-thread workaround a staffer suggested was turning on Chrome's live captions, which tells you how far this is from a real feature. When the official answer to "can I dictate my replies" is "have you tried the browser's accessibility menu," the honest answer is no.

What Zendesk actually has for voice, and what it doesn't

Three things promise three different outcomes, and only two of them exist. Here's the honest map:

  • Recorded calls — Zendesk transcribes them and files the transcript in the ticket log.
  • Voicemail audio — Zendesk transcribes it too, feeding triage and summaries.
  • Your typed reply, dictated by voice — Zendesk does not do this at all.

So if you came here hoping to talk your way through the queue, Zendesk's voice features won't get you there. They're built around the customer's audio, not your keyboard. Whisper sits in that gap — it's an operating-system-level dictation tool, so it works inside the Agent Workspace composer the same way typing does, because to the browser it's just text arriving at the cursor.

How to dictate into a Zendesk ticket reply, note, or macro

Cancel
The live recording overlay: a small indicator that appears while you talk, so you know Whisper is listening — nothing that hijacks your screen.

The setup is short. Here's the whole thing, start to finish.

  1. Install Whisper for Windows or macOS and sign in. The entire local pipeline is free for signed-in users, with no card at signup.
  2. Pick a model and let it download. The one-time download runs from about 140 MB to 3 GB depending on the model you choose. After that, transcription needs no internet.
  3. Open a ticket in the Agent Workspace and click into the field you want — the public reply, an internal note, or the body of a macro you're editing.
  4. Hold the hotkey and talk. Ctrl+Space on Windows, Command+Option on macOS. (If you're setting this up on a PC, the Windows voice-to-text walkthrough covers the hotkey in more detail.) Say the reply the way you'd say it to the customer's face.
  5. Release the key. The text lands at the cursor in the focused field. Read it, fix anything, send.

The recording overlay above shows what you'll see while you talk — a small live indicator, nothing that hijacks your screen. The first time the reply just appears in the composer, it feels slightly illegal. That feeling fades around ticket five. The hand cramp fading is the part that doesn't.

The whole app, live

Whisper
The real Whisper desktop app, embedded and clickable — poke around the settings, the model list, and the hotkey config. What you see is what installs.

That's the real desktop app embedded above — not a screenshot, the actual thing. Poke around it. The settings, the model list, the hotkey config are all there. What you see is what installs.

Clean up the dictation automatically

Thinking...

Spoken language has stray "um"s and runs sentences together. Whisper can run an optional AI cleanup pass over the raw transcript — punctuation, casing, and a light tone tidy-up — before it pastes. In the free local mode that cleanup runs on your machine through Ollama; with Whisper Pro it runs through your own OpenAI key. For a public reply that a QA lead is going to read, that pass is the difference between "spoken notes" and "a reply that passes review."

It handles over 90 languages in both modes, which matters if your ticket queue switches between English, Spanish, and German before lunch — roughly the same number of languages my seven-year-old uses to negotiate bedtime. The multilingual model line specifically reaches 99-plus languages; the English-only variants cover English alone.

Why local and offline matters when you handle customer data

A device wrapped in a chain and padlock, symbolizing private, locked-down data that never leaves the machine

Here's the one opinion I'll spend in this article: dictation that only runs in the cloud, with no offline option, is a privacy disaster when you're a support agent. You read out a customer's email, their order, sometimes their home address or a card dispute. With a cloud-only tool, all of that makes a detour through a third party's servers — for no reason other than you wanted to talk instead of type. A tool that can run the whole thing on your own machine doesn't ask you to make that trade.

Whisper's local mode runs entirely offline. The audio never leaves your machine; the only time it touches the network is the one-time model download. The customer PII you speak into a reply stays on the device. The browser-extension and cloud dictation tools that dominate this search can't say that — they ship your audio out to be transcribed. If your support org handles regulated data, "the audio never left the laptop" is a sentence your security team will want to hear.

What it won't do (the honest limits)

No tool deserves a clean bill of health, so here's where Whisper stops.

It pastes into one focused field at a time. It does not fill a whole multi-field ticket form, and it does not decide which field your words belong in — they go wherever the cursor is. That means you have to mind the difference between the public reply and the internal note before you talk. Dictate into the wrong one and you can leak an internal note straight to the customer. The cursor does exactly what you point it at, which is either a feature or a confession depending on where you pointed it. Click first, then talk.

It inserts text, not formatting. It won't drive the composer's bold button or build a bulleted list by voice — it types words into CKEditor, the same as your keyboard would. And like every dictation tool, it's weakest on strings that aren't words: account IDs, order numbers, SKU codes, error codes. I've watched it turn "ticket ZD dash four four oh two" into something with one too many fours, which is exactly the kind of detail a customer notices. It transcribes what you say, but eyeball any code before you hit send.

When to skip Whisper and use something else

Minimal office desk with a keyboard and monitor, framing a decision about which support tool to reach for

I'd rather you used the right tool than ours. If you need a written record of a spoken call attached to the ticket, that's Zendesk Talk call transcription — that is Zendesk's job, not Whisper's, and it's already built into your phone channel. Don't reach for a dictation app to solve a call-logging problem.

If you just want to dictate the occasional reply and don't want to install anything, your operating system already ships a free option. Windows has Voice Typing on Win+H; macOS has built-in Dictation. Both work system-wide, including in the Zendesk composer. They're single-platform, lean on the cloud by default, and give you less control — but for an agent on one machine who dictates twice a day, free and already-installed is a fair trade.

Reach for Whisper when you want to clear the queue by voice every day, want it offline so customer data stays put, want one hotkey across Zendesk and the side apps you live in, and want it free without a card. Whisper is free for the local pipeline at signup; the Pro Cloud surface adds a 7-day trial. The current numbers live on the pricing page.

Zendesk listens to the customer's call. It was never built to type your half of the conversation. That second job — you talking, your words landing in the reply box — is the one that turns a 200-ticket day into something your wrists forgive. Click the field, hold the key, talk. Download Whisper and clear one ticket by voice. If your hands don't thank you by lunch, go back to typing.

Clear your next ticket by voice

Click the field, hold the key, talk, release. The reply lands at the cursor — in the Zendesk Agent Workspace and in every side app you live in.

Free local mode for any signed-in account. No card required to start.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.