Voxtral

launchJuly 2025

powered byVoxtral Small 24B, Voxtral Mini 3B, Voxtral Realtime

goblin vibe check:

solid pick if you need voice commands or audio q&a and you're already comfortable running mistral models locally

open-source speech understanding family from mistral built for transcription, audio q&a, and voice instructions instead of plain asr only.

context

32k

tokens

speed

<200ms

cost

$0.003

per min

Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestamps

key features

Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestamps

spec & usage

Processes up to 30 minutes for transcription or 40 minutes for summarization and Q&A

Built on the Mistral Small 3.1 backbone with a causal audio encoder and transformer decoder

Apache 2.0 release that can run locally on consumer GPUs like an RTX 4090

Context biasing helps it lock onto niche technical terms and proper nouns

scope:

audiovoicesearchagentapilocalopen-sourcereal-timeresearchfree

launchJuly 2025