Voxtral
launchJuly 2025
powered byVoxtral Small 24B, Voxtral Mini 3B, Voxtral Realtime
goblin vibe check:
solid pick if you need voice commands or audio q&a and you're already comfortable running mistral models locally
open-source speech understanding family from mistral built for transcription, audio q&a, and voice instructions instead of plain asr only.
context
32k
tokens
speed
<200ms
cost
$0.003
per min
Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestamps
key features
Audio reasoning beyond plain transcription13+ native languagesRealtime variant for live agentsSpeaker diarization and timestamps
spec & usage
Processes up to 30 minutes for transcription or 40 minutes for summarization and Q&A
Built on the Mistral Small 3.1 backbone with a causal audio encoder and transformer decoder
Apache 2.0 release that can run locally on consumer GPUs like an RTX 4090
Context biasing helps it lock onto niche technical terms and proper nouns
scope:
audiovoicesearchagentapilocalopen-sourcereal-timeresearchfree
launchJuly 2025