IBM has introduced Granite 4.0, language models designed for enterprise environments that combine Transformer and Mamba-2 architectures. The company claims they reduce memory consumption by up to 70%. They are the first open source models with ISO 42001 certification.
IBM has announced the launch of Granite 4.0, a family of large language models incorporating a hybrid architecture designed to reduce computational resource consumption in enterprise environments. The new models combine Transformer architecture layers with Mamba-2 layers in a 9:1 ratio, a configuration that according to IBM allows processing long contexts with lower RAM usage. The Tiny and Small models also include mixture of experts (MoE) blocks with shared experts that improve parameter efficiency.
The company has introduced three initial variants: Micro, Tiny and Small. Each is available in Base and Instruct versions, designed for different enterprise use cases and corporate deployments. IBM plans to release additional versions, including larger (Medium) and smaller (Nano) models, before the end of 2025.
One of the standout aspects of this generation is the ISO 42001 certification obtained by the Granite family, becoming the first open source language models to achieve this accreditation. The ISO 42001 standard evaluates artificial intelligence management systems in aspects such as data privacy, explainability and accountability.
Granite 4.0 models have been trained with a corpus of 22 trillion tokens from curated enterprise sources. The hybrid architecture allows memory requirements to remain constant regardless of context length, while in conventional Transformer models these requirements grow quadratically. This facilitates processing extensive documents or long conversations without proportionally increasing necessary resources.
In terms of performance, Granite 4.0-H-Small achieves competitive results in benchmarks such as IFEval, which evaluates instruction-following capability, and Berkeley Function Calling Leaderboard v3, which measures precision in function call execution. IBM has worked with companies like EY and Lockheed Martin to validate these models' performance in real use cases.
The company also offers unlimited indemnification for intellectual property claims related to content generated by Granite models when used in watsonx.ai.
The models are available on IBM watsonx.ai and open source platforms like Hugging Face, Ollama, NVIDIA NIM and Replicate. IBM has established collaborations with hardware manufacturers like Qualcomm and AMD to optimize performance across different device types, from servers to mobile equipment.
Suite of generative artificial intelligence products that integrates development, management and automation. Enables management of foundation or custom AI models, automation of business processes and ...
09/06/2026
Anthropic introduces Claude Fable 5 and Claude Mythos 5, two versions of its most capable model to date. They share the same foundation, but one is ...
25/05/2026
Pope Leo XIV publishes the first encyclical dedicated to artificial intelligence, setting human dignity as the criterion for all technological ...
19/05/2026
Rime introduces Coda, a text-to-speech model for real-time conversational agents that reproduces the rhythm, pauses and intonation of natural ...
11/05/2026
Thinking Machines Lab has published a research preview of TML-Interaction-Small, an interaction model designed to collaborate with the user in real ...