ElevenLabs has released Eleven v3 (alpha), a text-to-speech model that incorporates emotional control tools and multi-speaker dialogue capabilities for multimedia content applications.
This experimental version of their speech synthesis technology includes new expressiveness features. The model allows generating voices with different emotions through specific audio tags and supports conversations between multiple speakers, characteristics developed after detecting demands from the audiovisual sector.
The system incorporates support for over 70 languages and uses tags inserted in text to modify tone and vocal expressions. Users can apply commands like [whispers], [sighs] or [excited] directly in their scripts to generate specific effects. The technology also allows combining multiple tags in the same phrase to create more complex expressions.
The multi-speaker dialogue functionality operates through an API that processes JSON structures, where each object represents a different speaker's intervention. The system automatically manages transitions between voices, tone changes and conversational interruptions, generating a cohesive audio file that simulates natural conversations.
Development of this version has been oriented toward sectors requiring greater vocal expressiveness, such as film production, video game development, education and accessibility tools. Developers indicate that technical audio quality was no longer the main limitation, but rather the ability to generate nuanced emotions and believable dialogues.
The v3 model requires greater precision in prompt formulation compared to previous versions. For applications needing real-time response or conversational use, maintaining v2.5 Turbo or Flash models is recommended, while a real-time version of v3 is being developed.
This update is part of the evolution since the launch of Multilingual v2, which had already found adoption in professional productions across various sectors. The new model seeks to cover expressive needs that previous versions did not fully satisfy in advanced multimedia content applications.
Explore the most advanced text to speech and voice cloning software ever. Create lifelike voiceovers for your content or use our AI voice generator as an easy-to-use text ...
24/04/2026
DeepSeek releases a preview of its V4 family, two open-source models capable of processing up to one million tokens of context and competing with the ...
23/04/2026
OpenAI launches GPT-5.5, a model designed to handle complex tasks autonomously — coding, researching, analyzing data and operating a computer ...
21/04/2026
OpenAI introduces ChatGPT Images 2.0, an image generation model with greater precision, multilingual support, flexible aspect ratios and, for the ...
17/04/2026
Anthropic has launched Claude Design, a tool that enables users to create visual designs, interactive prototypes and presentations through ...