AI models require large amounts of memory to operate quickly. Google Research has introduced TurboQuant, a compression algorithm that reduces that consumption up to six times without any loss of precision.
AI models, like those powering virtual assistants or modern search engines, work by processing enormous amounts of information. To do so quickly, they store part of that information in a kind of working memory, similar to notes taken while studying to avoid rereading an entire book each time. The problem is that this memory takes up a lot of space and becomes a bottleneck that slows systems down and drives up operating costs.
Google Research has developed TurboQuant, a technique that drastically reduces the space used by this working memory without causing the model to make more errors. In tests, the team compressed that information up to six times without any loss of precision, reducing data from 32 bits to just 3, while the system ran up to eight times faster than the uncompressed version on specialised hardware such as Nvidia H100 GPUs.
The approach combines two mathematical techniques. The first reorganises data more compactly, in a way comparable to describing a location with an angle and a distance rather than X and Y coordinates: less information is needed to convey the same thing. The second uses just one additional bit to correct the small errors introduced by compression, acting as an automatic corrector that maintains the accuracy of the final result.
One of the most notable practical advantages is that it requires neither retraining models nor fine-tuning them from scratch. TurboQuant applies directly to existing models, making adoption considerably easier. Google notes that the technique also improves semantic search engines — those that allow search tools to understand the meaning of a query rather than looking for exact keywords.
The research is backed by theoretical proofs placing the results close to the maximum efficiency limit achievable from a mathematical standpoint.
Google AI develops advanced platforms that improve people's lives. Its Gemini ecosystem integrates models, products, and APIs, driving responsible innovation and enabling developers and businesses to ...
09/06/2026
Anthropic introduces Claude Fable 5 and Claude Mythos 5, two versions of its most capable model to date. They share the same foundation, but one is ...
25/05/2026
Pope Leo XIV publishes the first encyclical dedicated to artificial intelligence, setting human dignity as the criterion for all technological ...
19/05/2026
Rime introduces Coda, a text-to-speech model for real-time conversational agents that reproduces the rhythm, pauses and intonation of natural ...
11/05/2026
Thinking Machines Lab has published a research preview of TML-Interaction-Small, an interaction model designed to collaborate with the user in real ...