IBM Corp. today announced the release of Granite 4 Nano, a family of extremely small generative artificial intelligence models designed to run at the edge, on-device or in browsers.
The company said the models exhibit extremely high performance for size and represent the company’s smallest models yet.
The Granite 4.0 Nano family includes four instruct models and their base model counterparts between 1.5 billion and 350 million parameters. Parameters are the internal values that a large language model learns during training to understand context from user text queries and generate answers.
Larger LLMs need increased computing power and energy, leading to increased operational costs. They also require specialized hardware, such as powerful graphics processing units and substantial machine memory. Tiny LLMs require far less compute and memory, meaning that they can run on consumer hardware, such as laptops, PCs and mobile devices.
The tradeoff is a reduction in accuracy and contextual knowledge that is trimmed from the models to reduce their size. But with advanced compression techniques, a lot of knowledge and capability can be packed into a smaller size.
Very small LLMs enhance privacy and security, provide offline access to reasoning and allow complete control and customization. By avoiding the transmission of sensitive data to cloud servers, local LLMs can also be cost-effective because they don’t incur cloud expenses.
The models include Granite 4.0 H 1B and 350M, 1.5 billion and 350 million parameter models featuring the model family’s hybrid architecture and two alternative traditional transformer-based versions designed to be compatible where hybrid workloads may not have optimized support.
Granite 4 models have a specialized architecture developed by IBM that combines an additional algorithm with the transformer design that powers most LLMs. Transformers use an attention algorithm to understand and generate text by focusing on the most important parts of an input. IBM hybridized the transformer with processing components based on the Mamba neural network architecture, which is more hardware-efficient than traditional transformers.
There is a lot of competition in the sub-billion- to near 1 billion-parameter model design market, where developers focus on performance and capability. Rivals include the Qwen models from Alibaba Group Ltd., liquid foundation models from Liquid AI Inc. and Gemma models designed by Google LLC.
IBM stated that Granite Nano models perform better than several similarly sized models across various benchmarks in general knowledge, math, coding and safety. Additionally, the Nano models outperformed competitors for agentic workflows, including instruction following and tool calling in IFEval, or Instruction-Following Evaluation, and Berkley’s Function Calling Leaderboard v3.
Granite 4.0 H 1B reached top marks in accuracy on IFEval at 78.5 compared to Quen3 1.7B at 73.1 and Gemma 3 1B scoring 59.3. In tool calling, the same model secured 54.8 on Berkley’s leaderboard, compared to Quen3 at 52.2 and Gemma 3 at 16.3.
IBM released all the Granite 4 Nano models under the open-source Apache 2.0 license, which is highly permissive. The license allows for broad commercial use and includes special considerations for research.
Images: Microsoft Designer/ News, IBM
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
