By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs
News

Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs

News Room
Last updated: 2025/12/20 at 6:35 AM
News Room Published 20 December 2025
Share
Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs
SHARE

Decathlon, one of the world’s leading sports retailers, recently shared why it adopted the open source library Polars to optimize its data pipelines. The Decathlon Digital team found that migrating from Apache Spark to Polars for small input datasets provides significant speed and cost savings.

Decathlon’s data platform runs PySpark workflows on cloud clusters, each with approximately 180 GiB of RAM and 24 cores across six workers. Data is stored as Delta tables in an AWS S3 data lake, with AWS Glue serving as the technical metastore.

While the solution was optimized for large data jobs, it was considered suboptimal for much smaller datasets (gigabytes or megabytes). Arnaud Vennin, Tech Lead Data Engineer at Decathlon, writes:

For data engineers in the data department, the primary tool is Apache Spark, which excels at processing terabytes of data. However, it turns out that not every workflow has terabytes of data as input; some involve gigabytes or even megabytes.

The platform uses a Medallion-style architecture (Bronze, Silver, Gold, Insight) to refine and organize data for quality and governance. Workflows are orchestrated with MWAA, a managed Apache Airflow service on AWS, and CI/CD is automated through GitHub Actions for testing and deploying code.

The data team began experimenting with Polars for lighter or mid-size workloads, initially as a replacement for existing tools like pandas that were experiencing scaling issues. Polars is an open source library for data manipulation built around an OLAP query engine implemented in Rust, using Apache Arrow Columnar Format as the memory model.

Decathlon data platform architecture running Polars. Source: Decathlon Digital Blog

As Polars’ syntax is similar to Spark’s, the team decided to migrate a Spark job to Polars, starting with a Parquet table of approximately 50 GiB, equivalent to at least a 100 GiB CSV table. Moving from a Spark cloud-hosted cluster to a single-node Kubernetes pod reduced the compute launch time from 8 to 2 minutes.

The results were even more promising after enabling Polars’ new streaming engine, which allows processing datasets larger than available memory. On a single Kubernetes pod, Decathlon reports that jobs that once required large clusters now run efficiently with modest CPU and memory, often completing before a full Spark cluster could even be cold-started.

Source: Decathlon blog

Eric Cheminot, principal architect at Schneider Electric, questions:

It’s more of an indication that deploying those jobs on Spark clusters was not a good choice. But… we see this so often that it’s representative!

Based on the experiments, the team decided to implement Polars for all new pipelines where input tables are less than 50 GiB, have a stable size over time, and do not involve multiple joins, dozens of aggregations, or exotic functions. Vennin shares some warnings, too:

Running Polars on Kubernetes presents challenges. It adds a new tool to the stack, so teams need to learn how to run the container service. It may also slow down data pipeline hopping between teams. Additionally, Kubernetes requires to be managed by Data Ops and carries specific security policies. These considerations affect how Polars is rolled out within Decathlon.

Michel Hua agrees:

Polars has Spark-compatible syntax, which facilitates code migration. The pain comes mostly from managing Resilient Distributed Dataset and clusters.

As some practitioners question why Decathlon did not extend the solution to more jobs, Vennin highlights additional constraints around when Polars cannot read the data, for example, datasets written with Liquid Clustering or Column Mapping features.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article The TechBeat: Best AI Automation Platforms for Building Smarter Workflows in 2026 (12/20/2025) | HackerNoon The TechBeat: Best AI Automation Platforms for Building Smarter Workflows in 2026 (12/20/2025) | HackerNoon
Next Article I’ve been using these 2 banded side plank exercises to improve hip stability and build core strength — here are my results I’ve been using these 2 banded side plank exercises to improve hip stability and build core strength — here are my results
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Elon Musk shares video of Unitree G1 humanoid robots pulling off backflips at concert · TechNode
Elon Musk shares video of Unitree G1 humanoid robots pulling off backflips at concert · TechNode
Computing
How to Watch Tulane vs. Ole Miss: Start Time, TV Channel for CFP 1st Round Game Today
How to Watch Tulane vs. Ole Miss: Start Time, TV Channel for CFP 1st Round Game Today
News
New FIFA Football Game Announced, Launching On Netflix Ahead Of World Cup 2026
New FIFA Football Game Announced, Launching On Netflix Ahead Of World Cup 2026
Mobile
Palo Alto Networks inks multibillion-dollar AI deal with Google Cloud –  News
Palo Alto Networks inks multibillion-dollar AI deal with Google Cloud – News
News

You Might also Like

How to Watch Tulane vs. Ole Miss: Start Time, TV Channel for CFP 1st Round Game Today
News

How to Watch Tulane vs. Ole Miss: Start Time, TV Channel for CFP 1st Round Game Today

5 Min Read
Palo Alto Networks inks multibillion-dollar AI deal with Google Cloud –  News
News

Palo Alto Networks inks multibillion-dollar AI deal with Google Cloud – News

5 Min Read
Chinese 3D-Printing Companies Are Beating the US at Its Own Game. I’m Equally Impressed and Unnerved
News

Chinese 3D-Printing Companies Are Beating the US at Its Own Game. I’m Equally Impressed and Unnerved

10 Min Read
Stop losing your stuff in the new year — this slim tracker card is just
News

Stop losing your stuff in the new year — this slim tracker card is just $24

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?