Key Takeaways
- Optimizing data indexing and layout can significantly reduce retrieval times and improve storage efficiency.
- Categorizing and prioritizing relevant data based on specific factors, such as location or delivery times, enhances query accuracy and speed.
- Sharding techniques, like geo-sharding, help balance system load and improve search efficiency in complex, large-scale systems.
- Parallelizing queries and processing multiple match types simultaneously can boost search performance and relevance.
- Ensuring consistency across different discovery surfaces leads to a smoother and more intuitive user experience.
As software engineers, we are constantly striving to build systems that are not only functional but also efficient and scalable. In a world where users demand faster, more accurate results, optimizing search performance has become a key focus in modern application development.
This article is based on our presentation at QCon San Francisco 2024, where we explored the evolving landscape of data indexing, retrieval, and ranking. With platforms like Uber Eats handling complex queries across massive datasets, optimizing search is now a critical challenge, which requires advanced strategies like indexing, sharding, and parallel query processing.
The complexity of search systems continues to grow, making the balance between speed, relevance, and scalability more crucial than ever. This article explores the techniques behind these optimizations and their impact on both user experience and system performance.
Expanding Selection in Uber Eats: The nX Approach
Selection on Uber Eats is a complex problem that varies based on perspective. For the merchant onboarding team and operations, success is measured by onboarding as many restaurants and stores as possible. For consumers, selection can mean different things. Some prioritize fast delivery times, while others seek their favorite restaurants or opportunities to discover new places within the app. The challenge is to accommodate these diverse needs while ensuring a seamless discovery experience.
Beyond the conceptual aspects of selection, there are significant technical hurdles to overcome. As the business has expanded, particularly during and after the pandemic, Uber Eats has incorporated new verticals beyond restaurants, such as grocery stores, retail stores, and even item/package delivery. This expansion has introduced ranking and recommendation complexity into the platform’s selection infrastructure.
A key difference in scaling selection is between onboarding restaurants and grocery stores. Restaurant menus usually feature twenty to thirty items, while grocery stores can have over one hundred thousand stock-keeping units (SKUs). This variation requires an advanced indexing system to efficiently present relevant options to users.
Another major development in Uber Eats’ selection strategy has been expanding the geographical reach of deliveries. Previously, users could only order from restaurants within a ten-to-fifteen-minute radius. Now, the platform can enable users to order from close to an hour radius. For example, a user in one city can place an order from a restaurant or retail store in another city and have it delivered through Uber Eats. This expanded reach introduces further technical challenges, particularly in logistics and fulfillment.
The primary focus of Uber Eats’ selection strategy is to maximize the quantity of available options across all discovery surfaces within the app. Personalization, while also an important aspect, is a separate challenge that requires distinct solutions. By continuously evolving its indexing, ranking, and recommendation technologies, Uber Eats aims to create a more comprehensive and dynamic selection for users, ensuring that they can quickly and efficiently find what they want, whether it is a favorite restaurant, a new culinary experience, or a much-needed grocery item.
Discovery surfaces such as Home Feed, Search, Suggestions, and Ads are crucial in connecting users with available options. Home Feed serves as the primary entry point for orders, featuring carousels based on user history, storefront listings, and shortcut sections for promotions and cuisines. Search functionality includes restaurant, dish, and cuisine searches, while suggestions help users discover similar or alternative options dynamically. Ads enhance visibility, helping merchants to reach relevant customers effectively. Ensuring consistency across these discovery surfaces is key to delivering a seamless and intuitive user experience.
How users discover content on Uber Eats: A look at Feed, Search, and Recommendations
Uber Eats Architecture: From Infrastructure to Application Layer
The architecture of Uber Eats spans multiple layers, from the infrastructure side to the application layer, ensuring seamless retrieval and placement of stores and restaurants. At the foundation is the infrastructure layer, which stores and indexes all available merchants and items. This serves as the core dataset from which relevant stores are retrieved.
Search Architecture: A Layered View of the Retrieval and Ranking Pipeline
The retrieval layer optimizes recall by fetching a wide range of potential stores, which are then refined for relevance through a “ranking” system. The first-pass ranking focuses on precision, using lexical matching to align user queries with retrieved documents. The hydration layer adds business logic, considering things such as promotions, membership benefits, and estimated delivery times. Finally, the second-pass ranking personalizes results based on the user’s order history and conversion rates to deliver the most relevant options.
As Uber Eats expanded its selection, it encountered significant scaling challenges. An initial attempt to increase the number of retrieved stores led to a fourfold increase in latency, prompting a deeper investigation into inefficiencies in data ingestion, query performance, and ranking.
Dot size reflects candidate volume; red dots indicate newly included stores based on greater Haversine distance.
One of the key challenges was indexing and querying. Uber Eats relies on H3 hexagonal geolocation indexing to determine delivery zones, but the ingestion process categorized stores inaccurately as either “close” or “far”. This misclassification led to ranking inconsistencies, where stores that were actually nearby were deprioritized.
Another issue with searching was the exponential growth in search space. Expanding the delivery range slightly caused a square increase in the search area, dramatically increasing the number of candidates being processed. This meant that even a small adjustment in retrieval parameters could result in significant performance slowdowns.
Store distribution created another challenge. As search areas broadened, distant stores were often shown higher in the feed than nearby options. This led to a dilemma: while high-converting stores are important, customer experience suffers when faraway choices overshadow more convenient local options.
Haversine Rings of Retrieval: Inner Stores Are Few, Outer Growth Is Disproportionate
To address these issues, Uber Eats needed to refine its indexing, retrieval, and ranking strategies. The focus shifted to balancing scale with efficiency, ensuring that the platform could surface the most relevant stores without compromising performance. These optimizations were critical in maintaining a smooth user experience while supporting Uber Eats’ growing selection.
Uber Eats’ Search Platform
The Search platform that powers Uber Eats, handling tens of millions of requests daily, is built on Apache Lucene and follows a Lambda architecture for data ingestion. This setup includes batch ingestion via Spark and real-time ingestion through a streaming path, ensuring up-to-date search results.
From data to discoverability: Ingestion, indexing, and Lucene-backed search orchestrated by a streaming architecture
One key feature is priority-aware ingestion, allowing high-priority requests to be processed first, maintaining data freshness. Uber also relies heavily on geo-sharding to optimize its geospatial search use cases. Additionally, custom index layouts and query operators enhance search efficiency by leveraging offline document ranking and early termination to speed up queries.
The search platform architecture consists of three main components:
- Batch Indexing Pipeline – Spark jobs process data, convert them into search documents, partition them into shards, and generate Lucene indexes, which are stored in an object store.
- Streaming/Real-Time Updates Path – Updates are ingested via a Streaming Service, which maps documents to specific Kafka partitions for real-time updates. There is one to one mapping between the Kafka partition and shard. Kafka acts as a write-ahead log, enabling graceful handling of ingestion spikes, priority aware ingestion, replication, and fault tolerance.
- Serving Stack – The searcher node retrieves indexes from storage, catches up with streaming updates, and executes queries. A stateless aggregator service routes search requests to the appropriate searcher node, handles query fanouts, and aggregates results before returning them to the user.
Sharding Techniques
Efficiently managing geospatial search queries on Uber Eats is crucial, as users often seek outnearby restaurants or grocery stores. To achieve this, Uber Eats uses geo-sharding, a technique that ensures all relevant data for a specific location is stored within a single shard. This minimizes query overhead and eliminates inefficiencies caused by fetching and aggregating results from multiple shards. Additionally, geo sharding allows first-pass ranking to happen directly on data nodes, improving speed and accuracy. Uber Eats primarily employs two geo sharding techniques: latitude sharding and hex sharding.
Latitude sharding divides the world into horizontal bands, with each band representing a distinct shard. Shard ranges are computed offline using Spark jobs, which first divide the map into thousands of narrow latitude stripes and then group adjacent stripes to create shards of roughly equal size. Documents falling on shard boundaries are indexed in both neighboring shards to prevent missing results.
One key advantage of latitude sharding is its ability to distribute traffic efficiently across different time zones. Given that Uber Eats experiences peak activity following a “sun pattern” with high demand during the day and lower demand at night, this method helps prevent excessive load on specific shards. However, in densely populated urban areas, shards may become uneven, leading to indexing delays and increased query latencies.
Not Geo-Local, but Geo-Linear: A Visual Example Using U.S. and Europe Overlap
To address these challenges, Uber Eats also utilizes hex sharding, which is based on the H3 geospatial indexing system. Instead of dividing the world into bands, hex sharding organizes data into hexagonal tiles of varying resolutions. Choosing the right hexagon size is crucial. Uber typically opts for H3 size 2 or 3 to balance efficiency and precision. Like latitude sharding, hex sharding also incorporates buffer zones, ensuring that documents near shard boundaries are indexed in multiple hexagons to prevent search gaps. The key advantage of hex sharding is its ability to create a more balanced shard distribution, particularly in dense urban areas, where latitude sharding struggles.
Hexes Over Heat: A Sharding Strategy Built for Urban Search Load Balancing
By combining these two techniques, Uber Eats optimizes its search infrastructure to deliver fast, accurate, and scalable geospatial queries, providing a seamless experience for users no matter where they’re ordering from.
To optimize Uber Eats’ search performance, several improvements were made by leveraging query patterns and refining data layouts to enhance recall while reducing latency. One key enhancement was building specialized data layouts tailored for different use cases, such as food delivery and grocery searches. By aligning the data structure with query patterns, retrieval efficiency improved, cutting down unnecessary computations. Another major optimization was indexing Estimated Time of Delivery (ETD), which allowed the search space to be divided into non-overlapping ranges and processed in parallel, accelerating query execution while maintaining accurate ranking.
ETA as a Spatial Constraint: Structuring Candidate Retrieval for Efficiency and Isolation
Furthermore, tasks such as differentiating between nearby and distant stores were prioritized earlier in the indexing pipeline, which led to a reduction in query-time processing and an improvement in overall performance. These enhancements greatly increased search efficiency, providing quicker and more relevant results while maintaining scalability.
Data Layout Improvements for Faster and More Scalable Queries
Uber Eats improved its search efficiency by optimizing its data layout to better align with query patterns, significantly reducing latency and improving scalability. The Eats index was restructured to first group restaurants by city, followed by individual restaurants, and then their menu items. This approach allowed queries to filter out irrelevant cities, quickly reducing unnecessary processing. Additionally, because Lucene uses delta encoding, clustering similar attributes together improved compression efficiency, leading to a twenty percent reduction in index size.
Index Layout in Lucene: Geographic and Merchant-Aware Document Ordering
For grocery searches, a slightly different strategy was required due to the higher scale of grocery inventory compared to restaurants. Stores were ranked based on offline conversion rates, and their items were grouped within each store. This layout enabled the system to set a retrieval limit per store, preventing excessive results from a single store and ensuring more diverse results across multiple grocery providers. This was particularly useful for searches with generic keywords like “chicken”, where thousands of matches could appear from a single store.
Optimizing Grocery Search: Store-Level Grouping and Conversion-Based Ranking
These data layout optimizations led to a 60% improvement in retrieval latency, reducing query times from 145 milliseconds to 60 milliseconds. Additionally, sorting the documents resulted in more efficient lookups, decreasing the per-document retrieval time from ten to 60 microseconds to under 5 microseconds. The overall impact was a 50% improvement in P95 latencies, ensuring a much faster and smoother search.
Performance Wins: Average and Real-Time Latency Cut Nearly in Half
Optimizing Uber’s ETA Indexing
Uber improved its ETA (Estimated Time of Arrival) Indexing to optimize search efficiency, reduce latency, and enhance recall by incorporating metadata about restaurant delivery zones and time estimates into the search platform. Initially, the system lacked information on relative distances between delivery zones (hexagons), making it difficult for rankers to assess faraway restaurants efficiently. To address this, Uber categorized delivery times into fixed time ranges and indexed restaurants based on their proximity to each hexagon.
ETA-Aware Hex-Based Indexing: Mapping Reachable Areas for Restaurants R1 and R2
This approach introduced a tradeoff between storage and processing efficiency. Restaurants had to be stored multiple times if they fell within different time ranges for various hexagons. Although this increased storage overhead, it significantly enhanced query speed, resulting in much faster retrieval of relevant results. Uber tested alternative indexing methods, including BKD trees and gRPC-based retrieval, but ultimately determined that precomputing these relationships and storing them in the index provided the best performance.
The new system enabled query parallelization, allowing a single request to trigger multiple searches for different ETA buckets at constant latency. This resulted in a fifty percent reduction in query latency and improved recall, as rankers gained access to more relevant restaurant candidates without compromising performance.
Optimized Retrieval: Parallel Range Queries Partitioned by ETA Buckets
Another key enhancement was handling non-deliverable but discoverable stores. Restaurants sometimes appear in searches but cannot accept orders due to various factors like courier availability, location, and time of day. Instead of handling this dynamically at query time, Uber precomputed deliverability at the ingestion layer. This allowed the system to differentiate between deliverable and discoverable-only restaurants quickly, improving the user experience.
Key Aspects of Uber’s Search Optimization
Uber’s journey in optimizing search and retrieval performance showcases the critical role of well-structured indexing, efficient sharding strategies, and parallelization techniques. Initially, performance issues stemmed from inefficient data storage and retrieval. This caused delays as queries processed large volumes of documents across multiple regions. By carefully analyzing query patterns, Uber restructured its index layout to prioritize city-based and store-based clustering, ensuring that relevant results could be retrieved faster.
The reorganization reduced retrieval times by over fifty percent and improved compression ratios, resulting in a more storage-efficient system. The introduction of ETA indexing further refined the ranking process by “penalizing” faraway stores while prioritizing those within an optimal delivery range. Through strategic data ingestion and indexing, Uber optimized the balance between recall and latency, ensuring that users received more relevant store results faster.
Search Index Performance Boost: 50% Drop in Latency via Range Query Execution
Beyond restructuring the index, Uber also focused on shifting complex fallback operations from the query layer to the ingestion layer. This transition reduced unnecessary processing during query execution, streamlining the search experience. The use of parallelized range queries allowed Uber to expand the selection space without affecting latency, enhancing the diversity of search results while maintaining fast response times. The system was further optimized by removing inefficiencies, such as processing test stores alongside production data, which had previously inflated query times. By utilizing non-overlapping subqueries, Uber maximized parallelization and enabled the simultaneous execution of various match types, including strong, fuzzy, and partial keyword matches, for greater efficiency.
Conclusion
These optimizations required extensive cross-team collaboration, with teams across search, feed, ads, and suggestions aligning on changes that impacted the entire search ecosystem. Coordinating these efforts was complex, as each team needed to ensure their respective services were integrated seamlessly. The process spanned several months of benchmarking, testing, and fine-tuning, but the outcomes were transformative. Latencies were significantly reduced, recall was enhanced, and system scalability saw considerable improvements, laying the groundwork for a more responsive and intuitive search experience across Uber Eats.
As Uber continues to expand and refine its platform globally, these advancements underscore the importance of data-driven decision-making, efficient system design, and continuous iteration. By consistently evaluating and optimizing its systems, Uber ensures that it remains agile and competitive in the fast-evolving landscape of large-scale distributed systems, providing an increasingly seamless experience for users and supporting its expanding service offerings.