Anderson Parra, Staff Software Engineer at SeatGeek, presented Shielding the Core: Architecting Resilience with Multi-Layer Defenses at QCon London 2026, where he discussed strategies on how to handle significant traffic spikes in systems that can overwhelm an even well-designed infrastructure.
Parra kicked off his presentation by describing the environment in which SeatGeek operates. This included what Parra characterized as a “traffic stampede.” The problem isn’t the traffic, he maintained, it’s when the traffic arrives faster than a system can adapt.
As shown in the picture below, there are several signals that indicate when a system may collapse.
The Noisy Neighbor Problem refers to a multi-tenant system where one tenant disproportionately consumes shared resources that degrades performance for other tenants.
The Scaling Gap is defined as the period when scaling lags behind demand. Systems must survive the scaling gap, Parra maintained, and this is where shielding the core begins.
The strategy to shield the core is threefold: Absorb the Burst by handling sudden traffic spikes before they reach core systems; Control the Flow that applies fairness, rate limits and admission control; and Protect the Core to keep critical services stable during demand spikes.
The defense layer deployed by SeatGeek uses a multi-shield approach:
- Edge Shied
- Gateway Shield
- Platform Shield
Edge Shield
The responsibilities of the Edge Shield include: a Cache that serves requests without hitting the origin; a Queue to absorb sudden traffic bursts; and a Filter to detect bots and invalid traffic.
Using the Cache as a resilience mechanism solves the issues of: fewer cache responses as a function of increasing failures; more origin traffic when there are fewer cache hits; and an increase in failures when there is an increase in traffic.
Parra maintained that everything changes with a combined use of the cache with rate limiting. The service remains stable, the cache warms up safer, and there is a decrease in origin load.
SeatGeek also implements a Virtual Waiting Room that absorbs the traffic and controls the flow.
Gateway Shield
The responsibilities of the Gateway Shield include: a Rate Limit that controls the rate of requests; Fair Access that protects legitimate users; and Validation that rejects invalid traffic.
The use of rate limiting involves a Rate Limit Gate that protects the platform from overload. This allows client requests during normal traffic, but triggers an HTTP 429, Too Many Requests, response when there are high spikes in traffic.
Sources of traffic include: humans, fans who legitimately want to purchase tickets; and automated agents consisting of sophisticated bots and distributed automation. The SeatGeek Fair Access Policy involves rate limits by: users and their respective accounts; and consumers with their respective API keys. Limits by IP address are used as a fallback.
Platform Shield
The responsibilities of the Platform Shield include: Resource Isolation that applies CPU limits, schedules priorities and prevents noisy neighbors; Prioritization that protects critical paths; and Observability Signals that utilizes a queue, CPU saturation and uses scaling signals.
Parra described a scenario of three services (labeled A | B | C) and compared them with, and without, isolation and the subsequent cascading events (or Noisy Neighbor Problem) when service A is affected. Without isolation, when service A suffers a significant increase in CPU time, service B suffers from an increase in latency followed by a collapse in service C. Conversely, limiting CPU time in service A provides stability in both service B and service C.
Mapping the Flow of Signals and Scaling includes:
Spike in traffic –> Increase in queue size (a signal) –> Reaction by the scaling mechanism (invocation of the Horizontal Pod Autoscaler (HPA)) –> Increase in capacity (more available pods) –> a decrease in queue size.
Signals originate from all three layers of the SeatGeek defense system. Parra stated that a resilient system depends on early signals, and that every system needs signals. This provides a faster drain of the queue size shown in the Flow of Signals and Scaling.
The Four Core Principles include: Composition where resilience is layered; Protect the Core to preserve critical paths; Observe Pressure because signals reveal stress; and Controlled Failure to fail gracefully, if necessary.
The best signals appear before failure, and Parra concluded by stating “Internet stampedes are inevitable; system collapse, however, is not.“
More details on this topic may be found in this white paper.
