Nvidia introduced Llama Nemotron large language models (LLMs) and Cosmos Nemotron vision language models (VLMs) with a special emphasis on workflows powered by AI agents such as customer support, fraud detection, product supply chain optimization, and more. Models in the Nemotron family come in Nano, Super, and Ultra sizes to better fit the requirements of diverse systems.
AI agents are a new frontier of generative AI evolution, says Nvidia, aiming to create systems able to act autonomously to carry complex tasks through. This requires combining language skills, as displayed by LLMs, with the ability to perceive and interact with the environment.
To be effective, many AI agents need both language skills and the ability to perceive the world and respond with the appropriate action.
That explains why the Nemotron Model family includes models derived from Meta’s LLaMA models as well as new Cosmos Nemotron VLMs that enable analyzing and responding to images and video captured in the user environment.
The availability of agents with vision capabilities, says Nvidia, could make it feasible to analyze videos from industrial cameras in a multitude of environments in real-time to help detect incidents, reduce defects, or guide humans through some course of action. Currently, according to the company, less than 1% of video from industrial cameras is watched live by humans.
According to Nvidia, they trained Llama Nemotron models to efficiently execute a number of common agentic tasks so you can use just one single model whereas you would normally use multiple specialized models.
The models are pruned to reduce latency and improve compute efficiency, then retrained using a hiqh-quality dataset with distillation and alignment methods to increase accuracy across tasks. This results in smaller models with high accuracy and throughput.
Nemotron models are optimized for distinct compute requirements, including Nano for PC application developers, Super to provide high performance on a single GPU, and Ultra, designed for data-center-scale applications.
The Nvidia Nemotron ecosystem also includes Nvidia NeMo to customize models with proprietary data, and NeMo Aligner to better align a model to follow instruction and generate human preferred responses. Additionally, Nvidia provides Nvidia AI Blueprints as a tool to quickly create AI agents by using NIM microservices as building blocks to serve Nemotron models.
On a related note, Nvidia also announced its Cosmos world foundation models which are specially tailored to generate physics-aware videos for robotics and autonomous vehicles.