IBM Corp. on Thursday open-sourced Granite 4, a language model series that combines elements of two different neural network architectures.
The algorithm family includes four models on launch. They range in size from 3 billion to 32 billion parameters. IBM claims they can outperform comparably-sized models using less memory.
Granite-4.0-Micro, one of the smallest algorithms in the lineup, is based on the Transformer architecture that powers most large language models. The architecture’s flagship feature is its so-called attention mechanism. The mechanism enables an LLM to review a snippet of text, identify the most important sentences and prioritize them during the decision-making process.
The three other Granite 4 models combine an attention mechanism with processing components based on the Mamba neural network architecture, a Transformer alternative. The technology’s main selling point is that it’s more hardware-efficient.
Like Transformer models, Mamba can identify the most important pieces of data in a prompt and adjust its processing accordingly. The difference is that it does so using not an attention mechanism but rather a so-called state space model. That’s a mathematical structure originally used for tasks such as calculating the flight path of spacecraft.
The Transformer architecture’s attention mechanism requires a significant amount of memory to process long prompts. Every time the length of a prompt doubles, the attention mechanism’s RAM usage quadruples. Mamba models require a fraction of the memory, which reduces inference costs.
The Granite 4 series is based on the latest Mamba-2 release of the architecture that debuted early last year. It compresses one of the technology’s core components into about 25 lines of code. That enables Mamba 2 to perform some tasks using less hardware than the original version of the architecture.
The most advanced Granite 4 model, Granite-4.0-H-Small, includes 32 billion parameters. It has a mixture-of-experts design that activates 9 billion parameters to answer prompts. IBM envisions developers using the model for tasks such as processing customer support requests.
The two other Mamba-Transformer algorithms in the series, Granite-4.0-H-Tiny and Granite-4.0-H-Micro, feature 7 billion and 3 billion parameters, respectively. They’re designed for latency-sensitive use cases that prioritize speed over processing accuracy.
IBM compared the memory requirements of Granite-4.0-H-Tiny and its previous-generation Granite 3.3 8B model in an internal benchmark test. The former algorithm used 15 gigabytes of RAM, one-sixth what Granite 3.3 8B required. IBM says that its new models also provide increased output quality.
“While the new Granite hybrid architecture contributes to the efficiency and efficacy of model training, most improvement in model accuracy are derived from advancements in our training (and post-training) methodologies and the ongoing expansion and refinement of the Granite training data corpus,” IBM staffers wrote in a blog post.
Granite 4 is available via IBM’s watsonx.ai service and more than a half-dozen third-party platforms, including Hugging Face. Down the line, the company plans to bring the models to Amazon SageMaker JumpStart and Microsoft Azure AI. IBM also plans to expand the Granite 4 lineup with new algorithms that will offer more advanced reasoning capabilities.
Image: IBM
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.