Hume unveils Octave, a groundbreaking AI model that goes beyond reading text: it grasps its meaning, producing natural, expressive voices that capture emotions and contexts like never before.
Hume has introduced Octave, a text-to-speech system bringing a fresh approach to artificial intelligence. Unlike conventional methods that simply pronounce words, this model—described by its creators as the first large language model for text-to-speech—interprets a text’s context and emotions. It adjusts tone, rhythm, and timbre, delivering whispers for intimate scenes or calm explanations, much like an actor reading a script.
In a test with 180 evaluators, Octave outperformed ElevenLabs, a notable competitor. It earned 71.6% preference in audio quality, 51.7% in naturalness, and 57.7% in matching voice descriptions, based on 120 varied examples, from movie narrators to medieval characters. These results highlight its ability to adapt to diverse styles and needs.
The system features tools like Voice Design, which crafts unique voices from detailed descriptions, such as an empathetic counselor or a medieval knight. It also offers Acting Instructions, enabling real-time tweaks to emotions and styles. Soon, it will add voice cloning, requiring just five seconds of audio to replicate a voice.
Octave is now accessible on platform.hume.ai and via API, making it suitable for audiobooks, podcasts, or interactive apps. Alongside this, Hume has launched Expressive TTS Arena, a public platform where anyone can compare advanced voice systems and test their skills with complex, expressive texts.
Developed initially for English and Spanish, Octave is still evolving. Beyond synthesizing speech, it explores how people express themselves, paving the way for future AI applications.
Research laboratory and technology company specialized in AI models with emotional intelligence. Its main model integrates voice and language processing, with adjustable voice synthesis in timbre, ...
03/06/2025
ElevenLabs has released Eleven v3 (alpha), a text-to-speech model that incorporates emotional control tools and multi-speaker dialogue capabilities ...
29/05/2025
Black Forest Labs introduces FLUX.1 Kontext, a new family of artificial intelligence models that enables image generation and editing using both text ...
22/05/2025
Anthropic presents Claude Opus 4 and Sonnet 4, artificial intelligence models that achieve new records in code evaluations and incorporate extended ...
16/05/2025
Codex is an AI-powered agent that optimizes software development by automating multiple tasks simultaneously. OpenAI has launched a preliminary ...