By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: The Hidden Flaws in Your A/B Testing Strategy Nobody Talks About | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > The Hidden Flaws in Your A/B Testing Strategy Nobody Talks About | HackerNoon
Computing

The Hidden Flaws in Your A/B Testing Strategy Nobody Talks About | HackerNoon

News Room
Last updated: 2025/08/11 at 8:22 PM
News Room Published 11 August 2025
Share
SHARE

Table of Links

  1. Introduction

  2. Hypothesis testing

    2.1 Introduction

    2.2 Bayesian statistics

    2.3 Test martingales

    2.4 p-values

    2.5 Optional Stopping and Peeking

    2.6 Combining p-values and Optional Continuation

    2.7 A/B testing

  3. Safe Tests

    3.1 Introduction

    3.2 Classical t-test

    3.3 Safe t-test

    3.4 χ2 -test

    3.5 Safe Proportion Test

  4. Safe Testing Simulations

    4.1 Introduction and 4.2 Python Implementation

    4.3 Comparing the t-test with the Safe t-test

    4.4 Comparing the χ2 -test with the safe proportion test

  5. Mixture sequential probability ratio test

    5.1 Sequential Testing

    5.2 Mixture SPRT

    5.3 mSPRT and the safe t-test

  6. Online Controlled Experiments

    6.1 Safe t-test on OCE datasets

  7. Vinted A/B tests and 7.1 Safe t-test for Vinted A/B tests

    7.2 Safe proportion test for sample ratio mismatch

  8. Conclusion and References

2.6 Combining p-values and Optional Continuation

Combining p-values has been a subject of debate since their origins with Pearson and Fisher [HR18]. These methods are often applied for meta-analysis for multiple experiments. Various methods exist for different contexts, and it is not always clear which method should be used in a given situation. Safe testing provides a simple, intuitive way to combine the results of many experiments.

Figure 1: False positive probability for the classical t-test for α = 0.01, 0.5, 0.1 .Figure 1: False positive probability for the classical t-test for α = 0.01, 0.5, 0.1 .

In the section on peeking, it was mentioned that experimenters may want to make a decision about the experiment results based on an intermediate observed effect size. With traditional statistical testing, the observed results are not statistically valid, and hence correct conclusions cannot be drawn. Safe testing, however, allows the experimenter to take the decision to continue a test if more results are needed to observe a significant effect.

2.7 A/B testing

A/B testing at first appears as a simple application of statistical tests; however, there are nuances that are incredibly relevant to experimenters. A typical A/B test will have automated measurements of tens or possibly hundreds of metrics. Consider a test in which an experimenter wishes to measure a new feature’s impact on the impact on sales on their website. The target metric for this experiment may be total sales per user. In addition to testing the feature’s impact on the total sales, they may wish to see more engagement from users that did not buy anything. This is because higher engagement with the platform can increase its value to users. Therefore, monitoring secondary metrics, such as the number of favourited items per user, the time spent on the platform, and the proportion of searches that lead to sales may give additional information about the performance of the feature. There may, however, be unintended consequences of the feature. There may be a bug that causes the website to crash on certain browsers, or the feature may cannibalize sales of cheaper products by showing more expensive ones. It is therefore crucial to monitor so-called guardrail metrics to ensure that the feature is working as intended.

Aside from the metrics in the experiment, there are other factors to consider when evaluating results. Most statistical tests assume data are independent and identically distributed. However, a new feature may attract interest from curious users, leading to unreliable metrics. This is known as the novelty effect, and may bias the results of a test. Another point of consideration is in the time it takes for metrics to converge. Some metrics, such as the number of items viewed after a search, give instantaneous results. A metric such as the proportion of users who make a purchase may take several days to converge. This is because they may be exposed to a test while browsing the products, and return several days later to make the purchase. This time between exposure to a test and its realization can make some metrics unreliable in the short-term.

A final challenge to large-scale A/B testing concerns the random assignment of users to variants. Each experiment has an associated probability for users to be assigned to either the control or test group. The results of the user’s session are recorded in a database before being aggregated over the course of metric calculations. Issues in this process can lead to unequal samples in the control and test group. This is known as a sample ratio mismatch (SRM) and can indicate that the test results are biased, and therefore unreliable. It is therefore important for experimenters to continuously monitor the sample ratio of their A/B tests in order to stop erroneous experiments.

Having discussed A/B testing and the inflexibility of traditional statistical testing, we now introduce safe testing and how it can be applied to solve these issues.

Author:

(1) Daniel Beasley


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article ChatGPT's New GPT-5 Model Is Supposed to Be Faster and Smarter. Not Everyone Is Satisfied
Next Article MSSQL Extension for VS Code 1.34.0 Deepens Copilot Agent Mode, Adds Colour‑Coded Connections
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

PCIC Model Design: Category-Level Repurchase Prediction and Frequency‑Recency Item Ranking | HackerNoon
Computing
Kendall Jenner secretly drops $23 million on Montecito estate with horse stables
News
SMIC becomes the world’s second largest wafer foundry · TechNode
Computing
Today's NYT Mini Crossword Answers for Aug. 12 – CNET
News

You Might also Like

Computing

PCIC Model Design: Category-Level Repurchase Prediction and Frequency‑Recency Item Ranking | HackerNoon

9 Min Read
Computing

SMIC becomes the world’s second largest wafer foundry · TechNode

4 Min Read
Computing

From One Banana to Billions: Testing PCIC’s Predictive Powers | HackerNoon

14 Min Read
Computing

Huawei and Baidu stockpile Samsung HBM chips as US export restrictions loom: report · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?