By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: 332K Orders : How Ensemble ML Cut False Positives by 35% | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > 332K Orders : How Ensemble ML Cut False Positives by 35% | HackerNoon
Computing

332K Orders : How Ensemble ML Cut False Positives by 35% | HackerNoon

News Room
Last updated: 2026/03/04 at 1:22 PM
News Room Published 4 March 2026
Share
332K Orders : How Ensemble ML Cut False Positives by 35% | HackerNoon
SHARE

Ensemble ML is everywhere – every blog post, every conference talk claims “combine multiple models for better results.” But does it actually work in production?

I built a data quality monitoring system to find out. Three ML models (Isolation Forest, LSTM, Autoencoder) are working together. 332K synthetic orders processed over 25 days.

Here’s what actually happened.

Why I Tested This

“Use ensemble methods” is the standard advice for ML in production. Combine multiple models, get better predictions, and reduce false positives.

Sounds great in theory. But I wanted to know:

  • Does it actually reduce false positives?
  • Is the complexity worth it?
  • Does it work for data quality monitoring specifically?

So I built it. Ran it continuously. Measured everything.

The Setup

Stack:

  • Apache Kafka streaming orders
  • Python processing pipeline
  • PostgreSQL for metrics
  • Three ML models in ensemble
  • Docker Compose (runs locally)

Data: Synthetic e-commerce orders with realistic quality issues injected.

Goal: Compare single model vs. ensemble. Which catches more real issues? Which has fewer false positives?

Baseline: Single Model (Isolation Forest)

Started with just Isolation Forest – the standard choice for anomaly detection:

from sklearn.ensemble import IsolationForest

# Train on 24 hours of quality metrics
historical_data = get_metrics(hours=24)
model = IsolationForest(contamination=0.1)
model.fit(historical_data)

# Predict
is_anomaly = model.predict(current_metrics)

Week 1 Results:

  • 93% accuracy
  • <10ms inference
  • Caught obvious anomalies
  • 15% false positive rate
  • Missed gradual changes

The false positive rate was the killer. Every 6-7 alerts, one was wrong. Teams started ignoring alerts.

Adding SHAP (Because Black Boxes Don’t Work)

Before adding more models, I needed to understand why the single model flagged things:

import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(current_metrics)

# Now I get:
# "Primary driver: validity_score = 45.2 (30% below baseline)"

This helped debug false positives. But didn’t reduce them.

That’s when I decided to try the ensemble approach everyone talks about.

The Ensemble: 3 Models

The theory: Different algorithms catch different problems.

Model 1: Isolation Forest (40% weight)

  • Fast, catches statistical outliers
  • Good at: Sudden spikes, obvious anomalies
  • Misses: Gradual changes, temporal patterns

Model 2: LSTM (30% weight)

  • Neural network for sequences
  • Good at: Temporal patterns, gradual degradation
  • Misses: Sudden outliers

Model 3: Autoencoder (30% weight)

  • Reconstruction-based detection
  • Good at: Unusual feature combinations
  • Misses: Single-dimension outliers

Voting strategy:

ensemble_score = (
    isolation_score * 0.4 +
    lstm_score * 0.3 +
    autoencoder_score * 0.3
)

# Flag if:
# - Combined score high, OR
# - At least 2 models agree
is_anomaly = ensemble_score > 0.5 or votes >= 2

The Results: Ensemble vs. Single Model

Live dashboard showing production metrics across 6 quality dimensions

After 25 days processing 332K orders:

Metric                  Single Model    Ensemble    Improvement
─────────────────────────────────────────────────────────────
Accuracy                93.2%          93.8%       +0.6%
False Positives         15%            9.7%        -35%
Real Anomalies Caught   baseline       +30%        +30%
Inference Time          <5ms           <5ms        same
Training Time           <1 min         3 min       acceptable

The big wins:

  • False positives dropped 35% (15% → 9.7%)
  • Caught 30% more real issues
  • Same inference speed

The cost:

  • Training takes 3 minutes instead of 1
  • More code to maintain
  • Need to monitor 3 models

What Each Model Caught

Example 1: Isolation Forest Alone

Sudden spike in missing fields. Obvious outlier.

  • IF: Detected
  • LSTM: Missed (not temporal)
  • Autoencoder: Missed (single dimension)
  • Ensemble: Detected (1 vote enough)

Example 2: LSTM Alone

Completeness scores dropping 8% over 6 hours.

  • IF: Missed (each point looked normal)
  • LSTM: Detected (temporal pattern)
  • Autoencoder: Missed (not reconstruction issue)
  • Ensemble: Detected (1 vote enough)

Example 3: Autoencoder Alone

Unusual combination: low volume + high value + weekend.

  • IF: Missed (individually normal)
  • LSTM: Missed (no temporal pattern)
  • Autoencoder: Detected (unusual combination)
  • Ensemble: Detected (1 vote enough)

Example 4: False Positive Reduction

Weekend pattern that’s unusual but valid.

  • IF: Flagged (outlier)
  • LSTM: Correctly ignored (normal for weekends)
  • Autoencoder: Correctly ignored (seen before)
  • Ensemble: Correctly ignored (2 models voted no)

This is where ensemble shines – reducing false positives through voting.

Historical quality scores over time — showing temporal patterns LSTM catches that Isolation Forest misses

Does the Complexity Pay Off?

YES, but with conditions:

When ensemble is worth it: You need low false positive rate (<10%). Different anomaly types exist (temporal, statistical, combinatorial) You can afford 3x training time You have enough data for 3 models

When single model is enough: False positives aren’t a big problem Only one type of anomaly. Need fastest possible training. Limited data

For data quality monitoring specifically, Ensemble is worth it.

Why? False positives kill trust. Teams ignore alerts if too many are wrong. Reducing 15% → 9.7% false positives makes the system actually usable

The Drift Problem (Affects All Models)

Both single model and ensemble faced the same issue: concept drift.

After 2 weeks, accuracy dropped from 93% to 78%.

Solution: Statistical drift detection with auto-retraining:

from scipy.stats import ks_2samp

reference = get_baseline(feature, hours=24)
current = get_recent(feature, hours=1)

statistic, p_value = ks_2samp(reference, current)

if p_value < 0.01:
    retrain_all_models()

This works for both single model and ensemble. Not specific to ensemble approach.

Performance: Does Ensemble Slow Things Down?

Inference (per order):

  • Single model: 4.2ms
  • Ensemble (3 models): 4.8ms
  • Difference: +0.6ms (14% slower)

Training:

  • Single model: 52 seconds
  • Ensemble (3 models): 3 minutes 18 seconds
  • Difference: ~4x slower

Training every 2 hours, so 3 minutes is acceptable. Inference is still <5ms, fast enough for real-time.

Optimization used:

python

# Run 3 models in parallel
with ThreadPoolExecutor(max_workers=3) as executor:
    if_result = executor.submit(isolation_forest.predict, data)
    lstm_result = executor.submit(lstm.predict, data)
    ae_result = executor.submit(autoencoder.predict, data)

Without parallelization: 12ms. With: 4.8ms.

Production Metrics (25 Days)

Orders Processed: 332,308

Quality Checks: 2.8M+

System Uptime: 603.7 hours (100%)

Average Latency: 4.8ms per order

Ensemble Performance:

  • Accuracy: 93.8%
  • False Positives: 9.7%
  • Precision: 94.2%
  • Recall: 91.8%

Comparison to Single Model:

  • False Positives: -35% improvement

  • Real Anomalies: +30% more caught

  • Inference Time: +14% slower (acceptable)

  • Training Time: 4x slower (acceptable)

    Issue severity breakdown: 68.6% high, 26.4% medium, 4.98% critical

What I Learned

1. Ensemble DOES reduce false positives

Not by a little. By 35%. This matters a lot for trust.

2. Different models catch different things

Not marketing fluff. Actually true. IF catches outliers, LSTM catches temporal, AE catches combinations.

3. Voting is powerful

When 2/3 models say “not anomalous,” they’re usually right. This reduces false alarms.

4. The complexity is manageable

Three models instead of one isn’t 3x the work. Maybe 1.5x once you have infrastructure.

5. But you need enough data

Each model needs reasonable training data. If you have <100 samples, single model is better.

6. Explainability still required

Ensemble without SHAP is useless. Add SHAP to all three models.

Should You Use Ensemble?

YES if:

  • False positives are a problem
  • You have different types of anomalies
  • You can handle 3-5 minute training
  • You have 500+ training samples

NO if:

  • Single model works fine

  • Training time is critical

  • Limited data (<100 samples)

  • Simple anomaly patterns

    For data quality monitoring specifically: Yes.

    Why? Data quality has multiple anomaly types (missing fields, wrong formats, unusual patterns, temporal issues). Ensemble catches more with fewer false alarms.

    The Code

    Open source: https://github.com/kalluripradeep/realtime-data-quality-monitor

    Compare single model vs. ensemble yourself:

    Clone the repo and run: docker compose up -d

Bottom Line

Ensemble ML isn’t just hype. It actually works:

  • 35% fewer false positives
  • 30% more real anomalies caught
  • Same inference speed
  • Worth the extra complexity

For data quality monitoring, it’s worth doing.

About the Author

Pradeep Kalluri is a Data Engineer at NatWest Bank and Apache Airflow/Dbt contributor.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Google&apos;s Gemini AI Drove Man Into Deadly Delusion, Family Claims in Lawsuit Google's Gemini AI Drove Man Into Deadly Delusion, Family Claims in Lawsuit
Next Article One of Android’s best PC gaming tools just got a big usability upgrade One of Android’s best PC gaming tools just got a big usability upgrade
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Samsung Wide Fold battery specs leak: There’s good and bad news
Samsung Wide Fold battery specs leak: There’s good and bad news
News
China’s GAC starts pre-sales of 4,000 flying car · TechNode
China’s GAC starts pre-sales of $234,000 flying car · TechNode
Computing
Foldable iPhone chatter starts again with new Samsung display rumor
Foldable iPhone chatter starts again with new Samsung display rumor
News
An iPhone is apparently Washington’s most important smartphone
An iPhone is apparently Washington’s most important smartphone
News

You Might also Like

China’s GAC starts pre-sales of 4,000 flying car · TechNode
Computing

China’s GAC starts pre-sales of $234,000 flying car · TechNode

1 Min Read
Alibaba Chairman Joe Tsai: DeepSeek’s rise jolted us into action · TechNode
Computing

Alibaba Chairman Joe Tsai: DeepSeek’s rise jolted us into action · TechNode

2 Min Read
Xpeng in talks with Volkswagen and more over use of its AI chip, CEO says · TechNode
Computing

Xpeng in talks with Volkswagen and more over use of its AI chip, CEO says · TechNode

1 Min Read
NetEase’s wuxia game Justice debuts on Steam, global launch expected this year · TechNode
Computing

NetEase’s wuxia game Justice debuts on Steam, global launch expected this year · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?