To improve search and recommendation user experiences, Uber migrated from Apache Lucene to Amazon OpenSearch to support large-scale vector search and better capture search intent. This transition introduced several infrastructure challenges, which Uber engineers addressed with targeted solutions.
The first step in Uber’s adoption of OpenSearch was to evaluate it against their existing Lucene-based setup using th HNSW (Hierarchical Navigable Small World) algorithm:
We found ourselves limited by the lack of algorithm options, which hindered our ability to fine-tune trade-offs for different scenarios. This meant we couldn’t always provide our users with the most accurate results or cost-efficient options.
Another key limitation of their existing architecture was the lack of native GPU support, which became a performance bottleneck for personalized recommendations and fraud detection, two complex workloads relying on high-dimensional vectors.
Amazon OpenSearch addressed both limitations by supporting multiple Approximate Nearest Neighbor (ANN) algorithms, providing flexibility for different use cases, and offering GPU acceleration through Facebook AI Similarity Search.
A final consideration was OpenSearch’s robustness and flexible API, which met Uber’s scalability and versatility requirements, ensuring a smooth and responsive user experience.
After selecting the technology, Uber built a prototype capable of handling over 1.5 billion vectors across nearly 400 dimensions to enable large-scale semantic retrieval. The prototype processed raw data to generate embeddings, which were stored using Apache Hive. The data was then batch ingested into OpenSearch using Spark and queried with FAISS.
The most significant challenges Uber engineers had to overcome were related to ingestion speed and stability, and query performance.
For ingestion, Uber engineers optimized their prototype’s baseline configuration to go from 12.5 hours to ingest the whole 1.5 billion document dataset to just 2.5 hours, with a 79% improvement. This was achieved by tuning Spark cores, executors, and partitions, as well as OpenSearch indexing threads. I/O also played a significant role in indexing delays.
The total I/O volume even surpassed the size of the index itself, suggesting unnecessary redundant writes. The elevated read I/O further pointed to frequent background segment compaction. These issues were likely due to the generation of numerous small segments during the indexing process.
This was addressed by adjusting refresh intervals and merge policies, and by disabling _source, and doc_values to reduce index size from ~11 TB to ~4 TB.
Another performance issue related to indexing was variable query latency during indexing. To address this, Uber offloaded index creation to a separate cluster, keeping the main cluster dedicated to searches, and then rerouted search traffic to the new cluster once the new index was ready.
Regarding query performance, Uber engineers reduced P99 latency at 2000 QPS from ~250 ms to under ~120 ms, with a ~52 % improvement, through several architectural improvements, including tuning shard count to match node count and improve parallelism, as well as adding replicas to better distribute load without impairing search performance.
However, concurrent segment search (CSS), which enables concurrent searches on multiple segments on multiple cores, did not yield the expected improvement.
While native GPU support was a key factor including Uber engineers’s decision to adopt OpenSearch, its full potential remains to be explored, with the promise of faster search results, improved responsiveness, and a better overall experience for users. Another area for improvement is transitioning from batch ingestion to real-time updates using Apache Flink, which is critical for more dynamic and time-sensitive use cases.
