By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference
News

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

News Room
Last updated: 2026/01/31 at 5:08 AM
News Room Published 31 January 2026
Share
NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference
SHARE

Microsoft and NVIDIA have released Part 2 of their collaboration on running NVIDIA Dynamo for large language model inference on Azure Kubernetes Service (AKS). The first announcement aimed for a raw throughput of 1.2 million tokens per second on distributed GPU systems. Now, this latest release focuses on helping developers work faster and improving operational efficiency. It does this through automated resource planning and dynamic scaling features.

The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the “rate matching” challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.

The Dynamo Planner Profiler is a pre-deployment simulation tool. It automates the search for the best configurations. Developers can skip manually testing various parallelization strategies and GPU counts, saving hours of GPU utilization. Instead, they define their needs in a DynamoGraphDeploymentRequest (DGDR) manifest. The profiler runs an automated sweep of the configuration space. It tests different tensor parallelism sizes for both prefill and decode stages. This helps find settings that boost throughput while staying within latency limits.

The profiler includes an AI Configurator mode that can simulate performance in approximately 20 to 30 seconds based on pre-measured performance data. This capability allows teams to rapidly iterate on configurations before allocating physical GPU resources. The output gives a tuned setup to boost what teams call “Goodput.” This is the highest possible throughput while staying within set limits for Time to First Token and Inter-Token Latency.

Once a system enters production, the SLO-based Dynamo Planner takes over as a runtime orchestration engine. This component is “LLM-aware”, which means that, unlike traditional load balancers, it keeps an eye on the cluster state. It tracks things like key-value cache load in the decode pool and the depth of the prefill queue. The Planner uses the profiler’s performance bounds to scale prefill and decode workers. This helps meet service level goals as traffic patterns change.

The announcement illustrates these capabilities through a detailed airline assistant scenario. In this case, a Qwen3-32B-FP8 model supports an airline mobile app. It follows strict service level agreements: 500 milliseconds for Time to First Token and 30 milliseconds for Inter-Token Latency. During normal operations with short passenger queries, the system runs with one prefill worker and one decode worker. When a weather disruption leads to 200 users sending complex rerouting requests, the Planner notices the spike. It then scales up to two prefill workers but keeps one decode worker. The teams report that the new worker comes online within minutes, allowing the system to maintain latency targets during the traffic spike.

This release builds on the framework introduced in the original Dynamo announcement, which InfoQ covered in December 2024. In the last article, Azure and NVIDIA explained how Dynamo’s design splits compute-heavy and memory-bound tasks across various GPUs. This allows teams to optimize each phase independently, matching resources to workload needs. For example, an e-commerce app’s prefill task may process thousands of tokens, while its decode task only generates short descriptions.

The move from manual setup to automated, SLO-driven resource management shows how teams can better handle large language model deployment on Kubernetes. The Planner components provide tools that turn latency needs into GPU allocation and scaling choices. This aims to lower the operational burden of running disaggregated inference architectures. Automation tools can help organizations with reasoning-heavy or long-context LLMs. They make it easier to manage the complex multi-node GPU setups. They also support meeting service level goals during changing traffic patterns.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Time Warp Back to the 1990s With Maingear's Retro98 PC Time Warp Back to the 1990s With Maingear's Retro98 PC
Next Article I found a secret workaround for YouTube background playback in third-party browsers I found a secret workaround for YouTube background playback in third-party browsers
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

GNOME 50 Is No Longer Treating Variable Rate Refresh “VRR” As Experimental
GNOME 50 Is No Longer Treating Variable Rate Refresh “VRR” As Experimental
Computing
Clues your exact location is being watched via your phone or hidden tracker tag
Clues your exact location is being watched via your phone or hidden tracker tag
News
Don’t Put Up With Built-In TV Speakers. These Soundbars Are the Best We’ve Tried
Don’t Put Up With Built-In TV Speakers. These Soundbars Are the Best We’ve Tried
Gadget
Chams grows 17.9% in 2025 on a .26 million SIM and bank card boom
Chams grows 17.9% in 2025 on a $4.26 million SIM and bank card boom
Computing

You Might also Like

Clues your exact location is being watched via your phone or hidden tracker tag
News

Clues your exact location is being watched via your phone or hidden tracker tag

16 Min Read
Mito AI raises .5M to empower video professionals with AI tools –  News
News

Mito AI raises $4.5M to empower video professionals with AI tools – News

4 Min Read
Google Introduces Managed Connection Pooling for AlloyDB
News

Google Introduces Managed Connection Pooling for AlloyDB

4 Min Read
I Don't Like Turning on My Big Oven. This iQ MiniOven Is My Secret Weapon
News

I Don't Like Turning on My Big Oven. This iQ MiniOven Is My Secret Weapon

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?