By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Enhancing Reliability Using Service-Level Prioritized Load Shedding at Netflix QCon SF 2025
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Enhancing Reliability Using Service-Level Prioritized Load Shedding at Netflix QCon SF 2025
News

Enhancing Reliability Using Service-Level Prioritized Load Shedding at Netflix QCon SF 2025

News Room
Last updated: 2025/11/20 at 5:20 AM
News Room Published 20 November 2025
Share
Enhancing Reliability Using Service-Level Prioritized Load Shedding at Netflix QCon SF 2025
SHARE

At the recent QCon San Francisco, Netflix Staff Software Engineers Anirudh Mendiratta and Benjamin Fedorka shared insights into the company´s reliability strategy, detailing the evolution of its load-shedding techniques toward a sophisticated Service-Level-Prioritized Load-Shedding. Moreover, this approach was designed to maintain a seamless viewing experience for millions of users, particularly during unpredictable traffic spikes that overwhelm reactive autoscaling and capacity buffers.

The fundamental problem, according to Mendiratta and Fedorka, that Netflix faces is sudden traffic spikes during major content launches, which often exceed the provisioned server capacity. They state that relying on auto-scaling is insufficient; it is too slow to react to a sudden, massive spike, and proactively scaling for the theoretical maximum peak is prohibitively expensive.

To address this, Netflix introduced a conceptual model to quantify resilience using two key buffers:

  • Success Buffer: The amount of traffic a service can handle above the baseline without latency degradation.
  • Failure Buffer: The capacity reserved to gracefully reject excess requests, preventing cascading failure and allowing the service to maintain stability until the spike subsides.

The goal of effective load shedding is to utilize the Failure Buffer to gracefully degrade service, ensuring the system handles some requests rather than collapsing entirely.

The critical breakthrough was the realization that not all requests are equally valuable during an overload event. Previously, Netflix employed “equal opportunity” load shedding, which indiscriminately dropped all traffic. The new approach drops low-priority requests first, preserving the system’s Success Buffer for high-priority, user-critical requests.

Key Prioritization Scenarios:







Priority Type

Example Request

Impact

High

User-initiated playback

Critical for user experience. Preserved under load.

Low

Prefetch requests, Background tasks

Non-critical. Dropped first to free up capacity.

Data Gateways

Writes (over Reads)

Prevents data loss; reads are easily retryable.

Crucially, Netflix shifted the load-shedding decision from the centralized API Gateway down to the individual service level. This allows critical requests to re-purpose, or steal dynamically, non-critical capacity within the application instance, maximizing resource utilization during duress. This granular control also provides efficacy for backend-to-backend and batch traffic, which bypasses the API gateway.

To manage load shedding across hundreds of microservices, Netflix developed an automated platform focusing on three pillars: Priority Assignment, Central Configuration, and Automated Validation.

  1. Priority Assignment: Request priority is determined early (e.g., via request headers) and is propagated downstream. The system is designed to prevent services from escalating priority but allows them to degrade it.
  2. Configuration: Utilization metrics (CPU, latency, concurrency) are aggregated, and a per-cluster, unique load-shedding function is automatically generated, mapping utilization and priority to a rejection probability. For instance, non-critical shedding may start at 60% CPU utilization, while critical shedding begins at 80%.
  3. Validation: The Chaos Automation Platform (CHAP) and failure injection testing are used to experiment with, validate, and safely roll out configurations. This ensures that every cluster has the requisite Success and Failure buffer before major content releases.

A person standing on a podium in front of a group of peopleAI-generated content may be incorrect.

To prevent the “thundering herd” problem caused by clients retrying shed requests, Netflix introduced prioritized retry strategies. The system scales back or halts all retries when server-side shedding is active, but allows only high-priority retries under heavy load. This prevents client behavior from amplifying the overload while ensuring critical requests have a chance to succeed once the system stabilizes.

At the end of the talk, Mendiratta and Fedorka shared the following key takeaways:

  • Load Shedding is a Safety Buffer: It protects the system from total collapse by ensuring service degradation rather than failure.
  • Prioritization is Paramount: By shedding low-priority requests, reliability is maximized for the user’s core experience (e.g., watching a show).
  • Automation is Key to Scale: Centralized tooling automates configuration and validation of unique service-level load-shedding functions across a massive microservice fleet.

Lastly, Mendiratta and Fedorka shared a link to resources (including slides).

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article From GANs to Diffusion: GDA for Perception Tasks | HackerNoon From GANs to Diffusion: GDA for Perception Tasks | HackerNoon
Next Article ‘We excel at every phase of AI’: Nvidia CEO quells Wall Street fears of AI bubble amid market selloff ‘We excel at every phase of AI’: Nvidia CEO quells Wall Street fears of AI bubble amid market selloff
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

B2B Sales Isn’t Broken (Yet), But Trust Is
B2B Sales Isn’t Broken (Yet), But Trust Is
News
How to Hack the Instagram Reels Algorithm in 2025 |
How to Hack the Instagram Reels Algorithm in 2025 |
Computing
AWS claims cloud-provider first with DWDM transponder | Computer Weekly
AWS claims cloud-provider first with DWDM transponder | Computer Weekly
News
You Can Watch Netflix On Apple CarPlay – Here’s How – BGR
You Can Watch Netflix On Apple CarPlay – Here’s How – BGR
News

You Might also Like

B2B Sales Isn’t Broken (Yet), But Trust Is
News

B2B Sales Isn’t Broken (Yet), But Trust Is

8 Min Read
AWS claims cloud-provider first with DWDM transponder | Computer Weekly
News

AWS claims cloud-provider first with DWDM transponder | Computer Weekly

5 Min Read
You Can Watch Netflix On Apple CarPlay – Here’s How – BGR
News

You Can Watch Netflix On Apple CarPlay – Here’s How – BGR

4 Min Read
Top UK Black Friday picks 2025: 7 gadget gifts for tech lovers
News

Top UK Black Friday picks 2025: 7 gadget gifts for tech lovers

14 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?