tools
updated every night
OmniVoice
last releaseApril 28, 2026
goblin vibe check:
solid if you need tts in languages the big services ignore but expect some setup time
massively multilingual zero-shot tts model for generating natural speech and cloning voices across hundreds of languages.
speed
40x
realtime
600+ language zero-shot TTS3-second voice cloningInstruction-based voice designNon-verbal expression tags like [laughter] and [sigh]
key features
600+ language zero-shot TTS3-second voice cloningInstruction-based voice designNon-verbal expression tags like [laughter] and [sigh]
spec & usage
Diffusion-style non-autoregressive architecture initialized from a pretrained language model
Trained on a 581,000-hour multilingual dataset and outputs at 24 kHz
Apache 2.0 release that can run locally on consumer GPUs such as RTX 4090 and 5090 cards
Supports explicit phonetic overrides with Pinyin or CMU dictionary controls
limitations
Full inference still wants roughly 6.5GB of VRAM
Fine-tuning for bespoke voices is weaker than its zero-shot performance
scope:
audiovoiceresearchlocalopen-sourcefreereal-time