Cohere Transcribe: transcribe speech in 14 languages, in the browser or locally

01What it does

Cohere Transcribe is a 2B automatic speech recognition model for 14 languages. The editorial value here is that you do not have to choose between a polished demo and a real open workflow. You can start with the public WebGPU demo to sanity-check short clips in the browser, then move to the model card's documented Transformers or vLLM path when you want offline or production-style runs. A good first test is a short interview clip, meeting excerpt, or voice note where you care about named entities and general readability more than diarization.

02How to try it

Start with the public WebGPU Space on a recent Chromium-based browser and test one short, clean clip in the target language first. That gives you a quick read on transcript quality without setting up a full stack. If the output is promising, move to the model repo and follow the documented Transformers or vLLM examples for local or server-side inference. The API-backed Gradio Space is useful as a product demo, but the WebGPU and offline paths are the better proof that this is not just a closed hosted workflow.

03Caveat

Treat this as transcription, not a full speech workflow. The model does not do speaker diarization, does not provide timestamps, and works best when you already know the language you want to transcribe.

04What you can do with it

Turn interviews, calls, or voice notes into editable text without defaulting straight to a hosted API workflow.
Compare multilingual transcription quality across a few supported languages before committing to a larger ASR stack.
Prototype browser-side transcription UX for internal tools, note-taking, or lightweight media workflows.
Pressure-test whether an open ASR model is already good enough for meeting summaries, rough captions, or searchable archives.

Try the demo

View model page

Neural Expedition · Useful open-source AI, curated without hype.