AI has made it possible numerous advances in the world of the PC in very diverse sectors, such as gaming, content creation, productivity and development. Currently about 600 applications and games on Windows are prepared to run AI locally, and RTX graphics cards are capable of offering all the performance necessary to run that AI locally.
During the Ignite event, Microsoft and NVIDIA announced new tools to help Windows developers build and optimize AI-powered applications on PCs with RTX graphics cards. These new tools allow you to take advantage of the specialized hardware of these GPUs, which have components such as tensor cores and optical flow accelerator.
Within the ecosystem of solutions offered by NVIDIA, there is no doubt that ACE is one of the best known. This allows you to create digital humans, and has everything to revolutionize the world of avatars, agents and virtual assistants. It will also allow create much more realistic NPCs in games, which will lead us to an important generational leap.
The model NVIDIA Nemovision-4B-Instruct, Coming soon, it uses the latest NVIDIA VILA and NVIDIA NeMo framework to distill, tune, and quantize until it is small enough to run on RTX GPUs with the precision developers need. This model allows digital humans to understand visual images in the real world and on the screen, and provide relevant responses.
Multimodality serves as the foundation for agent workflows, and is a preview of a future in which digital humans will be able to reason and take action. independently or with minimal help from the user.
NVIDIA has also introduced the family Mistral NeMo Minitron 128k Instructa set of small, large-context language models designed for streamlined and efficient digital human interactions, launching soon. Available in 8B, 4B, and 2B parameter versions, this model offers flexible options to balance speed, memory usage, and accuracy on RTX AI-enabled PCs.
All of those versions can handle large data sets in a single pass, eliminating the need for data segmentation and reassembly. It is built in the GGUF format, improves efficiency in low consumption devices and is compatible in all its versions with multiple programming languages.
Lastly, NVIDIA announced updates to NVIDIA TensorRT Model Optimizer (ModelOpt). With this, the company wants to offer developers an improved solution to optimize models for their implementation in ONNX Runtime. With the latest updates, TensorRT ModelOpt allows you to optimize models in an ONNX checkpoint and deploy the model within ONNX execution environments, using GPU execution providers such as CUDA, TensorRT, and DirectML.
TensorRT-ModelOpt includes advanced quantization algorithms such as INT4-Activation Aware Weight Quantization (AWQ). Compared to other tools, such as Olive, the new method further reduces the model’s memory footprint and improves performance on RTX GPUs. During deployment, models may have a memory footprint up to 2.6 times smaller compared to FP16 models. This results in superior performance with minimal degradation in accuracy, and with the advantage that can run on a wider range of PCs.