In a recent blog post, Netflix engineers described how they scaled Muse, the company’s internal application for data-driven creative insights, to handle trillion-row datasets.
Muse helps creative and launch teams at Netflix understand which artwork and video assets resonate with audiences, and its growth required supporting advanced filtering and audience-affinity analysis at massive scale.
To meet these demands, Netflix reports that it redesigned the data serving layer, cutting query latencies by about 50% while maintaining accuracy and responsiveness.
Muse started as a Spark-powered dashboard backed by a modest Apache Druid cluster. Over time, creative teams asked for features like outlier detection, notifications, and media comparisons, while data volumes grew to trillions of rows a year. Meeting these needs required lower latency and a more capable serving layer.
A major challenge came with audience affinities: algorithmically inferred labels grouping members by tastes such as “Character Drama fans” or “Pop Culture enthusiasts”. Adding these many-to-many relationships to impression and playback data created a surge in complexity that pushed the limits of the original architecture.
Muse Application GUI
Another challenge came with two of Muse’s key metrics, impressions: the number of times an asset is shown and qualified plays, which link playback events back to impressions. Both require counting distinct users, an expensive operation at Netflix scale.
To address this, the team adopted HyperLogLog sketches from Apache DataSketches, which provide estimates within about one percent error.
Sketches are built in two places: during Druid ingest and in Spark ETL jobs that merge daily sketches into all-time aggregates. This approach reduced query latencies by roughly 50% across the organization’s common OLAP patterns.
To further reduce load on Druid, Netflix turned to Hollow, its in-house library for in-memory key/value stores. Hollow feeds are built from Iceberg tables, with producer servers pushing updates and Spring Boot consumers refreshing cached datasets. This setup allows Muse to serve precomputed aggregates such as distinct country availability, all-time asset metrics, and metadata directly from memory.
Query times dropped from hundreds of milliseconds to tens, while also shielding Druid from high-concurrency requests. The team notes that the trade-off is higher memory use and more complex request routing, but the result has been greater stability and responsiveness.
Current Muse Architecture
Finally, Netflix also spent time tuning Druid: The team adjusted how data is split across nodes, resized segments to make scans more efficient, and filtered out unused columns before storage.
They also used Druid’s ability to store multiple values in a single field to better handle audience affinities.
These changes, combined with earlier improvements, cut query times roughly in half and made the system more consistent under heavy load.
To ensure accuracy and trust, Netflix ran the legacy and new metric stacks in parallel, validating results through automated Jupyter comparisons and in-app tools that highlighted differences.
The rollout was staged segment by segment, supported by shadow testing and fine-grained feature flags for safe rollback.
Looking ahead, Netflix plans to extend Muse to support ‘Live’ and Games, incorporate synopsis data, and refine metrics that distinguish between “effective” and “authentic” promotional assets.