By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Enhancing A/B Testing at DoorDash with Multi-Armed Bandits
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Enhancing A/B Testing at DoorDash with Multi-Armed Bandits
News

Enhancing A/B Testing at DoorDash with Multi-Armed Bandits

News Room
Last updated: 2026/01/25 at 8:28 AM
News Room Published 25 January 2026
Share
Enhancing A/B Testing at DoorDash with Multi-Armed Bandits
SHARE

While experimentation is essential, traditional A/B testing can be excessively slow and expensive, according to DoorDash engineers Caixia Huang and Alex Weinstein. To address these limitations, they adopted a “multi-armed bandits” (MAB) approach to optimize their experiments.

When running experiments, organizations aim to minimize the opportunity cost, or regret, caused by serving the less effective variants to a subset of the user base. Traditional A/B testing relies on fixed traffic splits and predetermined sample sizes that remain unchanged throughout the experiment. As a result, even if a clear winner emerges early, the experiment continues until it reaches its predetermined stopping condition. To make things worse, opportunity cost compounds as the number of concurrent experiments increases, encouraging teams to run experiments sequentially to reduce regret, but at the expense of significantly slower iterations.

The multi-armed bandits approach offers a way to adaptively allocate traffic based on performance, accelerating learning while reducing waste. It does so by repeatedly selecting among multiple choices whose properties are only partially known and refining those selections as the experiment progresses and more evidence is gathered:

For our purposes, this strategy allocates experimental traffic toward better-performing variants based on ongoing feedback collected during the experiment. The core idea is that an automated MAB agent continuously selects from a pool of actions, or arms, to maximize a defined reward, while simultaneously learning from user feedback in subsequent iterations.

This strategy enables a balance between exploration, i.e., learning about all candidate options, and exploitation, i.e., prioritizing the best‑performing options as they emerge, until the experiment converges on the best option.

According to Huang and Weinstein, MAB helps reduce the cost of experimentation enough to make it possible to evaluate many distinct ideas quickly.

At the core of DoorDash’s MAB approach is Thompson sampling, a Bayesian algorithm known for its strong performance and robustness to delayed feedback. In extreme synthesis, the algorithm samples from posterior reward distributions (that is, after a decision cycle) to decide allocations and updates reward expectations as new data comes in to prepare for the next decision cycle. At each decision cycle, the expected reward is used to determine choice allocation.

Adopting MAB is not without challenges, say DoorDash engineers. In particular, it makes inference on metrics not included in the reward function much harder, which in turn encourages teams to choose more complex reward metrics to capture as much insight as possible. By contrast, traditional A/B testing allows post-experiment analysis of any metric once the experiment concludes.

Moreover, because MAB adjusts allocations more aggressively, it can potentially lead to inconsistent user experiences when the same user interacts with a feature multiple times. DoorDash plans to address these limitations by adopting contextual bandits, leveraging Bayesian optimization, and implementing sticky user assignment to enhance overall user experience. 

The concept of a multi‑armed bandit comes from probability theory and machine learning. It describes a problem using a slot machine analogy: a gambler is facing multiple slot machines (which are sometimes called “one‑armed bandits”) and must which to play, how often, in what order, and when to try a different machine.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Best Vegan Meal Delivery of 2026: We Tried the Top Vegan Meal Kits Best Vegan Meal Delivery of 2026: We Tried the Top Vegan Meal Kits
Next Article Winter Storm Fern live — snowfall amounts, power outages and more Winter Storm Fern live — snowfall amounts, power outages and more
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Week in Review: Most popular stories on GeekWire for the week of Jan. 18, 2026
Week in Review: Most popular stories on GeekWire for the week of Jan. 18, 2026
Computing
I tested the Fairphone 6, and it’s exactly the type of Android phone the US needs
I tested the Fairphone 6, and it’s exactly the type of Android phone the US needs
News
New virus with genetic code written by AI ‘is first step towards lab-grown life’
New virus with genetic code written by AI ‘is first step towards lab-grown life’
News
Puppy Bowl 2026: How to Watch and Stream the Furry Showdown
Puppy Bowl 2026: How to Watch and Stream the Furry Showdown
News

You Might also Like

I tested the Fairphone 6, and it’s exactly the type of Android phone the US needs
News

I tested the Fairphone 6, and it’s exactly the type of Android phone the US needs

10 Min Read
New virus with genetic code written by AI ‘is first step towards lab-grown life’
News

New virus with genetic code written by AI ‘is first step towards lab-grown life’

5 Min Read
Puppy Bowl 2026: How to Watch and Stream the Furry Showdown
News

Puppy Bowl 2026: How to Watch and Stream the Furry Showdown

4 Min Read
Blurry rats and coyotes with mange: the oddly thrilling subreddit dedicated to identifying wildlife
News

Blurry rats and coyotes with mange: the oddly thrilling subreddit dedicated to identifying wildlife

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?