By Denys Medvediev

Comparison

The honest Aqua Voice alternative

Aqua Voice is a cloud dictation app for Mac, Windows, and iPhone that types your speech into any app in real time, fixing grammar and formatting as you go. The strongest private alternative is Whisper by Remskill, whose local pipeline runs offline, free, and never sends your audio anywhere.

Last updated: June 2026

A studio condenser microphone on a desk in warm low light, evoking offline voice dictation

Let me get the conflict of interest out of the way. This is a comparison piece, and I built one of the two things in it. So I am going to credit Aqua Voice where it earns it, which is more places than you might expect from a competitor's blog, and I am going to be specific about the one spot where we genuinely win.

Whisper by Remskill is a private Aqua Voice alternative whose entire local pipeline runs offline and free forever, with no word cap and no payment method at signup. Aqua Voice is real-time, polished, and refines your speech in the cloud as you talk. The difference that matters is one word: cloud.

What this comparison is, and who built it

The boring truth is that Aqua Voice is good at the thing it does. It is real-time, it is polished, and it processes every word as you speak, fixing phrasing and grammar on the fly. If you are already paying for it and happy, you can probably close this tab. There is a whole section near the end that tells you exactly when to stay.

For everyone still reading, the difference is one word: cloud. Aqua sends your audio to its servers to do that real-time magic. We do it on your laptop, for free, and the audio never leaves the room.

That is the entire argument, and I will spend the rest of the article showing it rather than asserting it. No fake review counts, no invented user numbers. Just two feature lists and a table you can check against both homepages.

What Aqua Voice actually does

Aqua Voice runs on Mac and Windows, with an iPhone app too. It does live, real-time dictation that fits into every app you already use, with no setup ritual. Press, talk, and refined text appears. The refinement is the selling point: it fixes grammar, cleans up phrasing, and formats as you speak.

Under the hood it is a proprietary cloud engine. The marketing calls it Avalon on the paid tier and the Aqua Engine on the free one, and there are no open weights to inspect. It understands 49 languages. That is a real number on a real homepage, and I am not going to minimise it.

On data handling, Aqua is more careful than most cloud tools. Its own site says nothing is stored on their servers on the Starter and Pro tiers, with Zero Data Retention reserved for Enterprise. That is a fair policy. It is also still a policy. Your audio leaves your machine, travels to their cloud, gets transcribed, and comes back. You are trusting a promise instead of trusting physics.

There is a free Starter tier capped at 1,000 words, then a Pro plan billed annually, a Team plan, and Enterprise. I am not quoting the figures here, because pricing pages move and you should read theirs and ours straight from the source. The shape is what matters: the free tier runs out, and everything past it is a subscription.

What you also get with Whisper by Remskill

Here is the part where I describe the thing I built, then let you judge the table. Whisper by Remskill is two products on one hotkey. The free tier is the whole local pipeline. You get the 8 Whisper transcription models, the Parakeet engine, fully-offline AI cleanup through Ollama, transcription history, presets, hotwords, hardware acceleration, model downloads, and a custom hotkey. No payment method at signup, and no word cap. You make an account, download the app, press the hotkey, and talk.

Whisper
The live Whisper by Remskill app — sidebar, transcription panel, and AI instruction cards. This is the real interface, not a screenshot.

You pick your local engine based on what you need, not what we push. Whisper gives you 99 languages, translate-to-English, custom vocabulary, beam-size control, and hotword biasing, all at the cost of speed. Parakeet is the NVIDIA TDT engine, about 600 MB, and it runs 5 to 10 times faster than Whisper on a CPU, covering English plus 24 other European languages. Pick Parakeet for speed and English. Pick Whisper for languages, translation, or fine control. Neither one is the default. That is your call.

Local accuracy typically lands between 95% and 99%, and it all runs on your CPU with no GPU required. The app itself is about 25 MB on disk. All of that happens on your machine. No round trip, no servers, no promise to trust.

If you want the cloud, we have it too, and it is bring-your-own-key. The Pro tier adds OpenAI cloud transcription. You paste your own OpenAI key and pick the model: gpt-4o-mini-transcribe at about $0.003 a minute, or gpt-4o-transcribe for higher quality. The AI enhancement runs on gpt-5-mini by default, and there is also web search at your cursor through OpenAI's Responses API. We take no cut on top of OpenAI's rate. Your key, your bill — read the pricing page for the Pro figures.

The platform story is the honest part. Windows and macOS on Apple Silicon both ship today. There is no iPhone app, no iPad app, no Android. If you dictate from your phone, this is where Aqua is plainly ahead.

Aqua Voice vs Whisper by Remskill, side by side

The table nobody else seems to fill with real rows. No figures in it — check both pricing pages for those.

Feature comparison between Aqua Voice and Whisper by Remskill
FeatureAqua VoiceWhisper by Remskill
PlatformsMac, Windows, iOSMac (Apple Silicon) and Windows; no mobile app
Where audio is processedCloud; audio leaves your machineOn your machine, offline
Free tierFree up to 1,000 wordsEntire local pipeline free forever, no word cap
Works with no internetNo, it is cloud-basedYes, fully offline on the local pipeline
EngineProprietary cloud (Avalon / Aqua Engine), no open weights8 Whisper models + Parakeet, open models
Bring your own OpenAI keyNot offeredYes — Cloud is BYOK, no markup
Languages4999 multilingual Whisper, 25 Parakeet
Translate to EnglishNot statedYes, on Whisper models
Real-time refinement as you speakYes; its signature featureCleanup runs after the utterance, via Ollama or cloud AI
Mobile dictationYes — iPhoneNo mobile app
Pricing modelSubscription past the free capFree local; optional Pro for Cloud

A few honest reads of this table. Aqua's real-time refinement is genuinely slick, its 49 languages cover most of what people dictate, and it ships a mobile app, which we do not. Those are not small wins. Everywhere the row is about offline, privacy, the free local pipeline, or language count, the gap runs the other way.

Your audio never leaving the machine is the whole point

This is what most people came here to compare, so let me be concrete. Aqua's data policy is good. It says nothing is stored on their servers on the consumer tiers. I believe them. But "we don't store it" is not the same as "it never left." Your audio still travels to a cloud to be transcribed, because that is how a cloud engine works. With Whisper by Remskill's local pipeline, there is no server in the loop at all. The model loads into your RAM, your microphone feeds it, and the text appears: on a flight in airplane mode, in a SCIF, on a train through a tunnel. You are not trusting a promise. There is nothing to promise.

Pasted
The shipped post-dictation overlay — what one free, fully-offline local dictation looks like the moment it finishes.

Here is the one opinion I will spend in this article. Cloud-only dictation is a privacy disaster waiting to be transcribed. Your boss's salary spreadsheet, the email to your kid's school, the legal brief you are drafting on the train: none of that should leave your laptop because you wanted to type with your voice. A team I worked with once had a contractor build an internal cloud dictation prototype that called an AI API for every utterance. It transcribed the same standup recordings four times over because the "smart retry" logic was too aggressive. The manager opened the cloud-cost dashboard at the end of the quarter and found a five-figure bill. The contractor's fix was "let's optimise the prompt." The CFO's fix was "or we stop sending meetings we already have notes for to a server." Local-first was the cheaper answer and the more private one, in the same sentence.

Your laptop already has a microphone and a CPU. It does not need a server in the loop to type one paragraph. The cloud is a great escape hatch and a strange default.

When to stay on Aqua Voice

This section earns the rest of the article. There are real reasons to stay, and I am not going to pretend otherwise.

You want a fully managed cloud experience and don't care about offline

This is the big one. Aqua is real-time, hosted, and zero-setup. If your audio living briefly on a vendor's server does not bother you, and you would rather not download a model or think about engines, the managed cloud experience is genuinely lower-friction. Keep the thing that works.

You specifically want its real-time formatting

Aqua refines phrasing and fixes grammar as you speak, in the moment, before the text lands. Our cleanup runs after the utterance rather than mid-sentence. If that live, word-by-word refinement is the feature you fell in love with, theirs does it and ours works differently.

You dictate from your phone

Aqua ships an iPhone app. We ship on Windows and Mac on Apple Silicon, and there is no mobile app on our roadmap. If your daily dictation happens on a phone, you need their tool.

For everyone else — solo writers, marketers, salespeople, students, parents answering a teacher's email while making dinner, anyone whose words shouldn't leave the room — start with our free local tier and see whether you ever hit a wall. There is no word cap and no internet required.

If you only remember one thing

Most dictation comparison articles end by telling you to transform your workflow. This one ends smaller. The thing voice typing fixes is the gap between having something to say and getting it into the document. Aqua closes that gap in real time, in the cloud, and charges past the free cap. We close it on your machine, offline, with the local part free. If your words can live on someone else's server, Aqua is a fine pick. If they can't, or you'd just rather they didn't, that is exactly the line we were built for.

If you want the deeper version of that argument, I wrote it up in our piece on offline speech to text, and there is also our superwhisper alternative comparison if you are weighing local options.

Try the free local pipeline first

Download Whisper by Remskill, make an account with no card required, press the hotkey, and dictate. Your audio stays on your machine, and if you never need the Cloud tier, you never pay.

Free local transcription forever, fully offline. No payment method at signup. The 7-day Cloud trial asks for a card only at upgrade.

Photo of Denys Medvediev

Denys Medvediev

I'm the one who reads our support email, most probably by dictating the replies.