Optimizing Search Systems: Balancing Speed, Relevance, And Scalability

Key Takeaways

Optimizing data indexing and layout can significantly reduce retrieval times and improve storage efficiency.

Categorizing and prioritizing relevant data based on specific factors, such as location or delivery times, enhances query accuracy and speed.

Sharding techniques, like geo-sharding, help balance system load and improve search efficiency in complex, large-scale systems.

Parallelizing queries and processing multiple match types simultaneously can boost search performance and relevance.

Ensuring consistency across different discovery surfaces leads to a smoother and more intuitive user experience.

As software engineers, we are constantly striving to build systems that are not only functional but also efficient and scalable. In a world where users demand faster, more accurate results, optimizing search performance has become a key focus in modern application development.

This article is based on our presentation at QCon San Francisco 2024, where we explored the evolving landscape of data indexing, retrieval, and ranking. With platforms like Uber Eats handling complex queries across massive datasets, optimizing search is now a critical challenge, which requires advanced strategies like indexing, sharding, and parallel query processing.

The complexity of search systems continues to grow, making the balance between speed, relevance, and scalability more crucial than ever. This article explores the techniques behind these optimizations and their impact on both user experience and system performance.

Expanding Selection in Uber Eats: The nX Approach

Uber Eats Architecture: From Infrastructure to Application Layer

The architecture of Uber Eats spans multiple layers, from the infrastructure side to the application layer, ensuring seamless retrieval and placement of stores and restaurants. At the foundation is the infrastructure layer, which stores and indexes all available merchants and items. This serves as the core dataset from which relevant stores are retrieved.

Search Architecture: A Layered View of the Retrieval and Ranking Pipeline

The retrieval layer optimizes recall by fetching a wide range of potential stores, which are then refined for relevance through a “ranking” system. The first-pass ranking focuses on precision, using lexical matching to align user queries with retrieved documents. The hydration layer adds business logic, considering things such as promotions, membership benefits, and estimated delivery times. Finally, the second-pass ranking personalizes results based on the user’s order history and conversion rates to deliver the most relevant options.

As Uber Eats expanded its selection, it encountered significant scaling challenges. An initial attempt to increase the number of retrieved stores led to a fourfold increase in latency, prompting a deeper investigation into inefficiencies in data ingestion, query performance, and ranking.

Dot size reflects candidate volume; red dots indicate newly included stores based on greater Haversine distance.

One of the key challenges was indexing and querying. Uber Eats relies on H3 hexagonal geolocation indexing to determine delivery zones, but the ingestion process categorized stores inaccurately as either “close” or “far”. This misclassification led to ranking inconsistencies, where stores that were actually nearby were deprioritized.

Another issue with searching was the exponential growth in search space. Expanding the delivery range slightly caused a square increase in the search area, dramatically increasing the number of candidates being processed. This meant that even a small adjustment in retrieval parameters could result in significant performance slowdowns.

Store distribution created another challenge. As search areas broadened, distant stores were often shown higher in the feed than nearby options. This led to a dilemma: while high-converting stores are important, customer experience suffers when faraway choices overshadow more convenient local options.

Haversine Rings of Retrieval: Inner Stores Are Few, Outer Growth Is Disproportionate

To address these issues, Uber Eats needed to refine its indexing, retrieval, and ranking strategies. The focus shifted to balancing scale with efficiency, ensuring that the platform could surface the most relevant stores without compromising performance. These optimizations were critical in maintaining a smooth user experience while supporting Uber Eats’ growing selection.

Uber Eats’ Search Platform

The Search platform that powers Uber Eats, handling tens of millions of requests daily, is built on Apache Lucene and follows a Lambda architecture for data ingestion. This setup includes batch ingestion via Spark and real-time ingestion through a streaming path, ensuring up-to-date search results.

From data to discoverability: Ingestion, indexing, and Lucene-backed search orchestrated by a streaming architecture

One key feature is priority-aware ingestion, allowing high-priority requests to be processed first, maintaining data freshness. Uber also relies heavily on geo-sharding to optimize its geospatial search use cases. Additionally, custom index layouts and query operators enhance search efficiency by leveraging offline document ranking and early termination to speed up queries.

The search platform architecture consists of three main components:

Batch Indexing Pipeline – Spark jobs process data, convert them into search documents, partition them into shards, and generate Lucene indexes, which are stored in an object store.

Streaming/Real-Time Updates Path – Updates are ingested via a Streaming Service, which maps documents to specific Kafka partitions for real-time updates. There is one to one mapping between the Kafka partition and shard. Kafka acts as a write-ahead log, enabling graceful handling of ingestion spikes, priority aware ingestion, replication, and fault tolerance.

Serving Stack – The searcher node retrieves indexes from storage, catches up with streaming updates, and executes queries. A stateless aggregator service routes search requests to the appropriate searcher node, handles query fanouts, and aggregates results before returning them to the user.

Sharding Techniques

Efficiently managing geospatial search queries on Uber Eats is crucial, as users often seek outnearby restaurants or grocery stores. To achieve this, Uber Eats uses geo-sharding, a technique that ensures all relevant data for a specific location is stored within a single shard. This minimizes query overhead and eliminates inefficiencies caused by fetching and aggregating results from multiple shards. Additionally, geo sharding allows first-pass ranking to happen directly on data nodes, improving speed and accuracy. Uber Eats primarily employs two geo sharding techniques: latitude sharding and hex sharding.

Latitude sharding divides the world into horizontal bands, with each band representing a distinct shard. Shard ranges are computed offline using Spark jobs, which first divide the map into thousands of narrow latitude stripes and then group adjacent stripes to create shards of roughly equal size. Documents falling on shard boundaries are indexed in both neighboring shards to prevent missing results.

One key advantage of latitude sharding is its ability to distribute traffic efficiently across different time zones. Given that Uber Eats experiences peak activity following a “sun pattern” with high demand during the day and lower demand at night, this method helps prevent excessive load on specific shards. However, in densely populated urban areas, shards may become uneven, leading to indexing delays and increased query latencies.

Not Geo-Local, but Geo-Linear: A Visual Example Using U.S. and Europe Overlap

To address these challenges, Uber Eats also utilizes hex sharding, which is based on the H3 geospatial indexing system. Instead of dividing the world into bands, hex sharding organizes data into hexagonal tiles of varying resolutions. Choosing the right hexagon size is crucial. Uber typically opts for H3 size 2 or 3 to balance efficiency and precision. Like latitude sharding, hex sharding also incorporates buffer zones, ensuring that documents near shard boundaries are indexed in multiple hexagons to prevent search gaps. The key advantage of hex sharding is its ability to create a more balanced shard distribution, particularly in dense urban areas, where latitude sharding struggles.

Hexes Over Heat: A Sharding Strategy Built for Urban Search Load Balancing

By combining these two techniques, Uber Eats optimizes its search infrastructure to deliver fast, accurate, and scalable geospatial queries, providing a seamless experience for users no matter where they’re ordering from.

To optimize Uber Eats’ search performance, several improvements were made by leveraging query patterns and refining data layouts to enhance recall while reducing latency. One key enhancement was building specialized data layouts tailored for different use cases, such as food delivery and grocery searches. By aligning the data structure with query patterns, retrieval efficiency improved, cutting down unnecessary computations. Another major optimization was indexing Estimated Time of Delivery (ETD), which allowed the search space to be divided into non-overlapping ranges and processed in parallel, accelerating query execution while maintaining accurate ranking.

ETA as a Spatial Constraint: Structuring Candidate Retrieval for Efficiency and Isolation

Furthermore, tasks such as differentiating between nearby and distant stores were prioritized earlier in the indexing pipeline, which led to a reduction in query-time processing and an improvement in overall performance. These enhancements greatly increased search efficiency, providing quicker and more relevant results while maintaining scalability.

Data Layout Improvements for Faster and More Scalable Queries

Uber Eats improved its search efficiency by optimizing its data layout to better align with query patterns, significantly reducing latency and improving scalability. The Eats index was restructured to first group restaurants by city, followed by individual restaurants, and then their menu items. This approach allowed queries to filter out irrelevant cities, quickly reducing unnecessary processing. Additionally, because Lucene uses delta encoding, clustering similar attributes together improved compression efficiency, leading to a twenty percent reduction in index size.

Index Layout in Lucene: Geographic and Merchant-Aware Document Ordering

For grocery searches, a slightly different strategy was required due to the higher scale of grocery inventory compared to restaurants. Stores were ranked based on offline conversion rates, and their items were grouped within each store. This layout enabled the system to set a retrieval limit per store, preventing excessive results from a single store and ensuring more diverse results across multiple grocery providers. This was particularly useful for searches with generic keywords like “chicken”, where thousands of matches could appear from a single store.

Optimizing Grocery Search: Store-Level Grouping and Conversion-Based Ranking

These data layout optimizations led to a 60% improvement in retrieval latency, reducing query times from 145 milliseconds to 60 milliseconds. Additionally, sorting the documents resulted in more efficient lookups, decreasing the per-document retrieval time from ten to 60 microseconds to under 5 microseconds. The overall impact was a 50% improvement in P95 latencies, ensuring a much faster and smoother search.

Performance Wins: Average and Real-Time Latency Cut Nearly in Half

Optimizing Uber’s ETA Indexing

Uber improved its ETA (Estimated Time of Arrival) Indexing to optimize search efficiency, reduce latency, and enhance recall by incorporating metadata about restaurant delivery zones and time estimates into the search platform. Initially, the system lacked information on relative distances between delivery zones (hexagons), making it difficult for rankers to assess faraway restaurants efficiently. To address this, Uber categorized delivery times into fixed time ranges and indexed restaurants based on their proximity to each hexagon.

ETA-Aware Hex-Based Indexing: Mapping Reachable Areas for Restaurants R1 and R2

This approach introduced a tradeoff between storage and processing efficiency. Restaurants had to be stored multiple times if they fell within different time ranges for various hexagons. Although this increased storage overhead, it significantly enhanced query speed, resulting in much faster retrieval of relevant results. Uber tested alternative indexing methods, including BKD trees and gRPC-based retrieval, but ultimately determined that precomputing these relationships and storing them in the index provided the best performance.

The new system enabled query parallelization, allowing a single request to trigger multiple searches for different ETA buckets at constant latency. This resulted in a fifty percent reduction in query latency and improved recall, as rankers gained access to more relevant restaurant candidates without compromising performance.

Optimized Retrieval: Parallel Range Queries Partitioned by ETA Buckets

Another key enhancement was handling non-deliverable but discoverable stores. Restaurants sometimes appear in searches but cannot accept orders due to various factors like courier availability, location, and time of day. Instead of handling this dynamically at query time, Uber precomputed deliverability at the ingestion layer. This allowed the system to differentiate between deliverable and discoverable-only restaurants quickly, improving the user experience.

Key Aspects of Uber’s Search Optimization

Uber’s journey in optimizing search and retrieval performance showcases the critical role of well-structured indexing, efficient sharding strategies, and parallelization techniques. Initially, performance issues stemmed from inefficient data storage and retrieval. This caused delays as queries processed large volumes of documents across multiple regions. By carefully analyzing query patterns, Uber restructured its index layout to prioritize city-based and store-based clustering, ensuring that relevant results could be retrieved faster.

The reorganization reduced retrieval times by over fifty percent and improved compression ratios, resulting in a more storage-efficient system. The introduction of ETA indexing further refined the ranking process by “penalizing” faraway stores while prioritizing those within an optimal delivery range. Through strategic data ingestion and indexing, Uber optimized the balance between recall and latency, ensuring that users received more relevant store results faster.

Search Index Performance Boost: 50% Drop in Latency via Range Query Execution

Beyond restructuring the index, Uber also focused on shifting complex fallback operations from the query layer to the ingestion layer. This transition reduced unnecessary processing during query execution, streamlining the search experience. The use of parallelized range queries allowed Uber to expand the selection space without affecting latency, enhancing the diversity of search results while maintaining fast response times. The system was further optimized by removing inefficiencies, such as processing test stores alongside production data, which had previously inflated query times. By utilizing non-overlapping subqueries, Uber maximized parallelization and enabled the simultaneous execution of various match types, including strong, fuzzy, and partial keyword matches, for greater efficiency.

Conclusion

These optimizations required extensive cross-team collaboration, with teams across search, feed, ads, and suggestions aligning on changes that impacted the entire search ecosystem. Coordinating these efforts was complex, as each team needed to ensure their respective services were integrated seamlessly. The process spanned several months of benchmarking, testing, and fine-tuning, but the outcomes were transformative. Latencies were significantly reduced, recall was enhanced, and system scalability saw considerable improvements, laying the groundwork for a more responsive and intuitive search experience across Uber Eats.

As Uber continues to expand and refine its platform globally, these advancements underscore the importance of data-driven decision-making, efficient system design, and continuous iteration. By consistently evaluating and optimizing its systems, Uber ensures that it remains agile and competitive in the fast-evolving landscape of large-scale distributed systems, providing an increasingly seamless experience for users and supporting its expanding service offerings.

Optimizing Search Systems: Balancing Speed, Relevance, and Scalability

Key Takeaways

Expanding Selection in Uber Eats: The nX Approach

Related Sponsored Content

Uber Eats Architecture: From Infrastructure to Application Layer

Uber Eats’ Search Platform

Sharding Techniques

Data Layout Improvements for Faster and More Scalable Queries

Optimizing Uber’s ETA Indexing

Key Aspects of Uber’s Search Optimization

Conclusion

Leave a Reply Cancel reply

Stay Connected

Latest News

This phone has twice the battery capacity of the Galaxy S25 Edge and costs less than $200

Google’s AI can now make phone calls for you

Muon Colliders: Unprecedented Probes for Flavor-Violating New Physics | HackerNoon

UK mobile, broadband outpace delivery targets | Computer Weekly

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Key Takeaways

Expanding Selection in Uber Eats: The nX Approach

Related Sponsored Content

Uber Eats Architecture: From Infrastructure to Application Layer

Uber Eats’ Search Platform

Sharding Techniques

Data Layout Improvements for Faster and More Scalable Queries

Optimizing Uber’s ETA Indexing

Key Aspects of Uber’s Search Optimization

Conclusion

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News