Building Streaming Infrastructure That Scales: Because Viewers Won't Wait Until Tomorrow

Key Takeaways

The Hub and Spoke pattern provides clear service boundaries and solves data consistency issues by creating a single interface for all internal and external communication.

Cell-based architecture reduces blast radius by splitting traffic across regions, user types, and platforms, multiplying scaling capacity exponentially.

Multi-layer caching can reduce database load to under ten percent, enabling smaller clusters and more cost-effective multi-region strategies.

Multi-region requires transparent cost-benefit communication: builders must present tradeoffs clearly to stakeholders who ultimately own the risk and business impact of downtime decisions.

Serverless represents delegation, not complexity: managed services enable small teams to focus on business logic rather than infrastructure maintenance.

In streaming, the challenge is immediate: customers are watching TV right now, not planning to watch it tomorrow. When systems fail during prime time, there is no recovery window; viewers leave and may not return. One and a half years ago, at ProSiebenSat.1 Media SE, we faced the challenge of scaling streaming applications for international users.

The task fell to a team of two developers with no prior AWS experience. Through iterative improvements, the team completely transformed core services, removed single points of failure, and significantly increased availability and scalability. The reality of such transformations is that no blueprint exists, only continuous iteration and learning to make systems progressively better.

The original architecture was straightforward at a high level: a worker subscribing to topics on Kafka, performing transformations, and storing data in the database.

Another component, an API behind GraphQL, handled frontend requests. There is nothing inherently wrong with this architecture when implemented correctly.

However, this particular implementation hadn’t grown with the business. Servers were constantly overloaded, and the database struggled under pressure. Every traffic spike caused system-wide crashes.

The database ran on a single node with no cache, and data remained inconsistent across services. Adding to the complexity, six services operated without standards. This situation demanded a fundamental change in approach.

The solution involved moving to serverless, not because serverless is trendy, but because teams could then focus on what matters most: the code. Over eighteen months, the architecture evolved from basic services to different types of multi-regional configurations, including active-passive setups in varying degrees depending on service severity.

This serverless architecture resolved the core problems because it scales automatically. Leveraging AWS managed services meant that availability and resiliency concerns became AWS’s responsibility. The team also solved a major pain point: deployment time dropped from one and a half hours to minutes. This foundation enabled further evolution toward more affordable multi-region capabilities.

Background

ProSiebenSat.1 Media SE is one of Europe’s largest broadcasting conglomerates, operating television channels and streaming services across multiple countries. The company’s streaming application, Joyn, serves the DACH regions: Austria, Switzerland, and Germany, continuously handling millions of requests. Management expectations typically appear simple on paper: everything available, everything scalable, and cheap. The reality is that these are conflicting goals. The quality of infrastructure directly reflects user experience, and in the streaming business, downtime is immediately visible. Highly available and scalable services cannot be achieved at minimal cost, a fact that must be clear to decision-makers.

User expectations are straightforward: open the app and play a video. Users don’t care whether a major football league match is being broadcast in Germany or another live event is happening elsewhere; they expect seamless access regardless of traffic spikes. Meeting these expectations proves extremely difficult when underlying systems don’t function properly.

This discussion addresses two major issues. First, data quality, situations where users would see a video on one page, check details, navigate to a different page, and find the video unavailable. Data consistency problems plagued the system. Second, improving the scalability and resiliency of the overall system.

Data Consistency and Quality

Examining the original architecture revealed a critical problem: one team owned multiple services with no standards. Each service subscribed to the same Kafka topic but performed completely different validations and transformations. Some were saved to the database, others weren’t. Some threw errors, others didn’t. This erratic behavior resulted in situations where the same video appeared unavailable on different pages. Investigating issues that should have taken minutes, instead consumed hours.

Another problem emerged with the company bus architecture: internal services communicating with each other exposed internal state to the company bus, a recognized antipattern. This exposure occurred because using Kafka properly requires additional infrastructure, and teams often take shortcuts rather than implementing solutions correctly from the start.

The Hub and Spoke Pattern

The solution required establishing clear boundaries through an infrastructure pattern called Hub and Spoke, or what we called the “bus mesh”.

Three actors participate in this pattern. Kafka serves as the company bus and event store where all events reside. EventBridge handles message fan-out. EventBridge Pipe acts as a point-to-point middleman intercepting messages for validation, transformation, and processing.

The elegance of this pattern lies in each service interfacing only with its local bus, EventBridge. Whether communicating internally within microservices, with other microservices, or with the company bus, there is only one interface: EventBridge. Messages route through rules similar to Pub/Sub, but subscribers, whether SQS, SNS, or other services, remain hidden behind this abstraction layer. This pattern also solved the single source of truth problem by ensuring everything passes through EventBridge before fanning out to internal teams.

Sparse vs. Full State Messages

Event-driven applications require careful consideration of tradeoffs between two message types: sparse and full state. Sparse messages contain basic information, requiring subscribers to fetch additional data. This need for additional data requires building APIs capable of handling massive request volumes. Full state messages contain everything needed, but a couple of publishers and subscribers, property changes create breaking changes. However, having complete information simplifies processing. Network considerations become important when moving megabytes instead of kilobytes.

For streaming media, the choice was clear. Kafka supports messages up to thirty to forty megabytes, which is normal for media streaming. EventBridge handles only 256 kilobytes, representing a completely different constraint.

The Claim Check Pattern

The claim check pattern solved this limitation. Leveraging Amazon EventBridge Pipe’s enrichment feature, messages are intercepted for transformation and validation, then stored in S3. The S3 key moves to EventBridge, which fans out to all consumers who access S3 to fetch data. This approach provides an API that scales automatically without custom development and maintenance.

The beauty of this solution lies in its simplicity: S3 handles the storage and retrieval of large payloads while EventBridge manages the lightweight message routing. Consumer services retrieve only the data they need, when they need it, without overwhelming any single component. These two patterns resolved the primary data consistency issues that had plagued the system for years.

Data Replication as an Alternative

An alternative approach worth considering is data replication rather than event-driven architecture. Using Postgres pglogical replication, data from Kafka can be normalized in a database, with other services receiving only partial data through table replication, perhaps two tables out of twenty. Both approaches are valid depending on the company’s requirements.

Tradeoffs exist between the approaches. Event-driven architecture provides decoupling: events are sent, subscribers receive them, and rebuild data in any format: Aurora, DynamoDB, text files, or spreadsheets. Data replication creates a single database governing all services. Starting with Postgres means every service must use Postgres.

The source database becomes a bottleneck; all subscribers need databases of equal or greater size, and schema changes break everything. Teams return to coordinating deployments.

Data replication also introduces significant operational complexity: managing subnets, security groups, and CIDR blocks because databases must reside in separate networks.

Scalability and Resilience

With data consistency resolved, focus shifted to resiliency and availability. When all users arrive simultaneously, only two outcomes exist: the architecture scales gracefully, or fails spectacularly. The real problems weren’t Lambda or clusters; missing autoscaling rules, absent caching, and ignored best practices were the culprits. This realization drove the move to serverless and managed services.

Service Architecture and SLAs

The service combination includes Amazon Route 53 for DNS, though relying solely on this combination exposes systems to DNS issues on the public internet. CloudFront or Global Accelerator provides better solutions by routing requests through edge points into AWS’s private network. CloudFront functions as a CDN with edge caching capabilities. The front door, either an application load balancer or API Gateway, routes requests to computational services.

API Gateway operates at a higher networking level, providing CORS and compression automatically. These routes to Lambda and Fargate are both serverless options. Lambda scales from zero to one thousand instances in milliseconds with minimal configuration, just correct code and appropriate memory settings. Fargate offers more control but introduces more failure modes. Behind these sit caching layers and databases: Aurora, DynamoDB for NoSQL, and RDS.

Building for availability requires understanding SLAs. Surprisingly, API Gateway and Lambda don’t offer the highest availability. The optimal combination is an application load balancer with Lambda. These numbers are theoretical foundations, but without implementing best practices, circuit breakers, retries, timeouts, even a 99.99% available service like Lambda will fail. Best practices remain essential regardless of infrastructure choices. The architecture provides the foundation, but the code determines whether these availability numbers are actually achieved in production.

Database Selection

Database selection presents similar considerations. AWS’s fully serverless databases, currently only DynamoDB and Aurora DSQL, function as fire-and-forget APIs without VPC or subnet concerns. These services handle all the underlying complexity, allowing developers to focus entirely on data modeling and access patterns. Aurora or RDS requires VPC and subnet management, adding operational overhead but providing more traditional relational database capabilities.

Database selection is typically the hardest decision in any architecture. Relational databases can solve any problem, but the tradeoff involves operational complexity versus cost and reliability versus simplicity. Each service offers distinct benefits. RDS single node with replicas can reach production, which works until it doesn’t. Considerations include replication lag, failover, and split-brain scenarios. The fundamental question becomes: how much downtime is acceptable when problems occur?

Builders must present these problems to decision-makers and require decisions. Both high availability and low cost solutions cannot coexist. The difference between single-region and multi-region can mean the difference between satisfied customers and angry customers on social media. These theoretical numbers become critically important when building international applications.

The service template incorporates the discussed services plus Momento Cache instead of Redis or Valkey, delegating cache management to a third party. The infrastructure design philosophy centers on never going down and recovering gracefully. Global tables from DynamoDB or Aurora replicate data to another region for disaster recovery. Regional services, application load balancer, Lambda, and Fargate are all managed.

Cell-Based Architecture

Before implementing multi-region, several iterations improved application availability and scalability. Cell-based architecture represents one key pattern.

With three countries, Germany, Austria, and Switzerland, a single Fargate service or Lambda could serve all requests. If that service fails or hits its limits, entire applications go down, affecting all users across all regions simultaneously. Splitting traffic by country and user type (paid versus free) transforms one Lambda into six separate instances. With Lambda scaling from zero to one thousand instances in milliseconds, this multiplication provides a concurrent capacity of six thousand rather than one thousand.

Further splitting by platform, five platforms, including iOS, Android, web, and smart TV applications, yield thirty Lambdas, enabling thirty thousand requests per millisecond. The code is written once and deployed through CI pipelines, making the number of Lambda functions irrelevant from a development perspective.

The key benefit is a reduced blast radius: if one cell experiences problems, the others continue operating normally. This architecture also enables targeted deployments, releasing, for example, to iOS free users in Germany first, and allowing testing and monitoring before a broader rollout.

Cell-based architecture can extend to databases. With fully serverless databases like DynamoDB and DSQL, no extra cost is incurred. With RDS, Aurora, or OpenSearch, costs multiply quickly. For many scales, splitting databases doesn’t make sense. Instead, implementing caching, absent from the previous architecture, proves more effective.

Multi-Layer Caching Strategy

Streaming applications serve repetitive content. Profile information for actors remains constant across millions of requests. Using databases as expensive caches serves no purpose and wastes resources.

Three cache layers provide the solution:

CloudFront at the edge for repetitive requests that can be served without reaching origin servers, in-memory storage within Lambda or Fargate instances for hot keys that are accessed frequently within short time windows, and Momento as a dedicated cache service in front of the database for everything else.

This layered approach reduced database utilization to under ten percent, and even as low as five percent, during prime time viewing hours. Smaller database clusters become viable instead of massive ones sized purely for request handling capacity. This cost reduction enables serverless scalability and opens the door to active-active strategies that would otherwise be prohibitively expensive.

With Aurora global tables, writes occur only in one region, a suboptimal setup. Active-active configurations like DynamoDB, allowing writes in multiple regions, prove preferable.

Data Plane Automation

Investment in data plane automation followed. Application load balancer monitoring enables automatic failover, when regional problems occur, Amazon Route 53 switches to a different country. Alarms monitoring CPU and memory emit events that shift traffic between Fargate and Lambda. Traffic serves through multiple services simultaneously, with decisions based on current traffic patterns.

Multi-Region Strategy

Multi-region architecture isn’t necessary for everyone; the described practices make services more scalable and available while reducing blast radius. However, true resiliency requires multi-region deployment. Not every service needs it.

Services requiring multi-region active-active are those whose failure brings down entire applications. If bookmarks fail, the impact is minimal. All services should be prepared for multi-region deployment with appropriate strategies: backup and restore, pilot light, warm standby, or active-active.

The primary obstacle to multi-region adoption is often the organizational mindset. Resistance to infrastructure complexity creates cultures that brace for problems and wait for them to pass. Today, multi-region implementation has become straightforward as cloud providers continue improving deployment capabilities across regions.

The builder’s responsibility is transparency. Technical staff members don’t decide business priorities, they present options clearly to management. When managers understand and accept responsibility for potential downtime during prime time, infrastructure decisions become their accountability. If multi-region costs X and only one issue occurs annually, perhaps X exceeds the revenue loss.

The discussion isn’t about catastrophic failures like Frankfurt’s eight-hour outage in 2021. Constantly smaller issues matter: DNS problems preventing CloudFront from reaching sources, Lambda recycling causing cold starts with domino effects, and Fargate tasks disappearing unexpectedly.

Everything auto-recovers, but incidents trigger calls involving multiple people, VPs, and CTOs. The cost of that response time adds up. The question isn’t whether multi-region is appropriate, but how to make it affordable.

Making Multi-Region Affordable

Multi-region inherently costs more than single-region deployments. Databases require data replication across regions, and all replication costs money through both storage and data transfer charges. Making multi-region affordable means evolving services strategically rather than simply duplicating everything. Switching from API Gateway to application load balancer yielded ninety percent savings on routing costs, requiring only code-level CORS headers and compression implementation, straightforward changes that paid significant dividends.

Another technique showing sixty percent cost reduction involves dynamic switching between Fargate and Lambda based on traffic patterns. While conventional wisdom suggests Fargate is cheaper than Lambda when examining raw pricing, costs depend heavily on scale and use patterns. Analysis revealed that under thirty to fifty million daily requests, Fargate actually costs more than Lambda due to the baseline costs of running containers continuously.

Traffic calculations determine how many requests each Fargate task can handle, enabling dynamic shifting between compute services. Lambda remains ready for overflow at all times with zero cost when idle. Instead of scaling Fargate during traffic spikes, a process that takes five to six minutes to provision new tasks, traffic shifts immediately to Lambda.

Lambda may experience some cold starts, but the response is instantaneous compared to waiting for container scaling. During night hours with minimal viewership, Fargate scales to zero while Lambda handles all remaining traffic efficiently. This approach optimizes costs while maintaining availability guarantees.

Automation is essential for managing this complexity: tracking everything through metrics and alarms, then automating response actions, builds an infrastructure requiring no human intervention during incidents. The goal is self-healing systems that detect problems and respond before engineers even receive notifications. By the time incident emails arrive and engineers log in to investigate, issues have already been resolved automatically.

This automation transforms the economics of multi-region deployment: instead of requiring dedicated operations staff to monitor dashboards around the clock, the infrastructure manages itself. The bottom line is clear: costs aren’t eliminated but become reasonable relative to the protection gained. The investment in automation pays for itself through reduced operational overhead and faster incident resolution.

Conclusion

The journey from a fragile, monolithic architecture to a resilient multi-region system demonstrates that transformation is achievable even with limited resources and experience. A team of two developers, starting with no AWS expertise, successfully eliminated single points of failure, achieved data consistency, and built infrastructure that scales automatically without human intervention. The key insight is that serverless and managed services aren’t about following trends; they represent strategic delegation that frees engineers to focus on business value.

Multi-region architecture, once considered prohibitively complex and expensive, becomes attainable when approached incrementally: first solving data consistency with event-driven patterns, then implementing cell-based isolation and intelligent caching, and finally automating traffic management between compute services. The cost isn’t eliminated, but becomes reasonable relative to the protection gained. Most importantly, technical decisions must be made transparent to business stakeholders who ultimately own the risk and reward tradeoffs.

Building Streaming Infrastructure That Scales: Because Viewers Won’t Wait Until Tomorrow

Key Takeaways

Background

Data Consistency and Quality

The Hub and Spoke Pattern

Sparse vs. Full State Messages

The Claim Check Pattern

Data Replication as an Alternative

Scalability and Resilience

Service Architecture and SLAs

Database Selection

Cell-Based Architecture

Multi-Layer Caching Strategy

Data Plane Automation

Multi-Region Strategy

Making Multi-Region Affordable

Conclusion

Leave a Reply Cancel reply

Stay Connected

Latest News

Galaxy S25 series gets fourth One UI 8.5 beta release with a new voicemail feature

ByteDance suspends Seedance 2.0 feature that turns facial photos into personal voices over potential risks · TechNode

Bad Bunny's Super Bowl Decoded: What Halftime Meant to Proud Puerto Ricans Like Me

EU tells Meta rival chatbots must be available on WhatsApp – News

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Key Takeaways

Background

Data Consistency and Quality

The Hub and Spoke Pattern

Sparse vs. Full State Messages

The Claim Check Pattern

Data Replication as an Alternative

Scalability and Resilience

Service Architecture and SLAs

Database Selection

Cell-Based Architecture

Multi-Layer Caching Strategy

Data Plane Automation

Multi-Region Strategy

Making Multi-Region Affordable

Conclusion

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News