Hume AI releases TADA under an open-source license, a text-to-speech system that synchronizes text and audio to eliminate content errors and achieve five times the speed of current systems.
Hume AI has released TADA (Text-Acoustic Dual Alignment), a voice generation system that addresses one of the most common problems in current large language model-based systems: the mismatch between how text and audio are represented.
Conventional text-to-speech systems generate between 12.5 and 75 acoustic signal frames per second of audio, compared to just 2 or 3 text tokens. This gap forces models to handle very long sequences, which slows down processing and increases the risk of the system skipping words or inserting non-existent content — a flaw known as hallucination.
TADA resolves this imbalance with a tokenization scheme that assigns exactly one continuous acoustic vector per text token. As a result, text and audio are processed in parallel and at the same rate, without compressing the audio or adding extra intermediate layers.
In terms of speed, the system achieves a real-time factor of 0.09 — more than five times faster than comparable LLM-based text-to-speech systems. In tests with over 1,000 samples from the LibriTTSR dataset, the model produced zero hallucinations. In human evaluations on expressive, long-form speech, it scored 4.18 out of 5 for speaker similarity and 3.78 out of 5 for naturalness, ranking second overall.
The model's compact size allows it to run on mobile devices without relying on cloud services. In terms of context management, it can handle up to 700 seconds of audio within a 2,048-token context window, compared to around 70 seconds for conventional systems under the same conditions.
Hume AI is releasing two versions: a one-billion-parameter model for English and a three-billion-parameter multilingual model supporting eight languages. Both are available on Hugging Face under an open-source license. The researchers themselves acknowledge limitations still to be resolved, including potential speaker drift during very long generations and reduced text quality when generating text and speech simultaneously.
Research laboratory and technology company specialized in AI models with emotional intelligence. Its main model integrates voice and language processing, with adjustable voice synthesis in timbre, ...
17/04/2026
Anthropic has launched Claude Design, a tool that enables users to create visual designs, interactive prototypes and presentations through ...
17/04/2026
Anthropic publishes Claude Opus 4.7, a model with notable gains in software development tasks, higher image resolution and new cybersecurity ...
08/04/2026
Meta Superintelligence Labs launches Muse Spark, a multimodal artificial intelligence model capable of processing text and images simultaneously, ...
07/04/2026
Anthropic has launched Project Glasswing, a cybersecurity initiative with twelve major technology companies to use its new AI model, Claude Mythos ...