IBM’s Granite 4.0, hybrid models with 70% less memory consumption

02/10/2025

IBM has introduced Granite 4.0, language models designed for enterprise environments that combine Transformer and Mamba-2 architectures. The company claims they reduce memory consumption by up to 70%. They are the first open source models with ISO 42001 certification.

IBM’s Granite 4.0, hybrid models with 70% less memory consumption

IBM has announced the launch of Granite 4.0, a family of large language models incorporating a hybrid architecture designed to reduce computational resource consumption in enterprise environments. The new models combine Transformer architecture layers with Mamba-2 layers in a 9:1 ratio, a configuration that according to IBM allows processing long contexts with lower RAM usage. The Tiny and Small models also include mixture of experts (MoE) blocks with shared experts that improve parameter efficiency.

The company has introduced three initial variants: Micro, Tiny and Small. Each is available in Base and Instruct versions, designed for different enterprise use cases and corporate deployments. IBM plans to release additional versions, including larger (Medium) and smaller (Nano) models, before the end of 2025.

One of the standout aspects of this generation is the ISO 42001 certification obtained by the Granite family, becoming the first open source language models to achieve this accreditation. The ISO 42001 standard evaluates artificial intelligence management systems in aspects such as data privacy, explainability and accountability.

Granite 4.0 models have been trained with a corpus of 22 trillion tokens from curated enterprise sources. The hybrid architecture allows memory requirements to remain constant regardless of context length, while in conventional Transformer models these requirements grow quadratically. This facilitates processing extensive documents or long conversations without proportionally increasing necessary resources.

In terms of performance, Granite 4.0-H-Small achieves competitive results in benchmarks such as IFEval, which evaluates instruction-following capability, and Berkeley Function Calling Leaderboard v3, which measures precision in function call execution. IBM has worked with companies like EY and Lockheed Martin to validate these models' performance in real use cases.

The company also offers unlimited indemnification for intellectual property claims related to content generated by Granite models when used in watsonx.ai.

The models are available on IBM watsonx.ai and open source platforms like Hugging Face, Ollama, NVIDIA NIM and Replicate. IBM has established collaborations with hardware manufacturers like Qualcomm and AMD to optimize performance across different device types, from servers to mobile equipment.

Key points

  • Granite 4.0 combines Transformer and Mamba-2 architectures in 9:1 ratio to reduce memory consumption by up to 70%
  • First open source language models to obtain ISO 42001 certification for AI management
  • Specifically designed for enterprise environments with three variants: Micro, Tiny and Small
  • Trained with 22 trillion tokens from curated enterprise sources
  • Hybrid architecture maintains constant memory requirements regardless of context length
  • Validated by companies like EY and Lockheed Martin in real use cases
  • Unlimited indemnification for intellectual property claims on watsonx.ai
  • Available on watsonx.ai, Hugging Face, Ollama, NVIDIA NIM and Replicate
  • Collaborations with Qualcomm and AMD for optimization across different devices

Related AI

Watsonx

Enterprise AI Platform from IBM

Suite of generative artificial intelligence products that integrates development, management and automation. Enables management of foundation or custom AI models, automation of business processes and ...

Lastest news

Trustpilot
This website uses technical, personalization and analysis cookies, both our own and from third parties, to facilitate anonymous browsing and analyze website usage statistics. We consider that if you continue browsing, you accept their use.