By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Make Big Data More Manageable with Smart Sampling | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Make Big Data More Manageable with Smart Sampling | HackerNoon
Computing

Make Big Data More Manageable with Smart Sampling | HackerNoon

News Room
Last updated: 2025/02/21 at 8:12 PM
News Room Published 21 February 2025
Share
SHARE

Authors:

(1) Andrew Draganov, Aarhus University and All authors contributed equally to this research;

(2) David Saulpic, Université Paris Cité & CNRS;

(3) Chris Schwiegelshohn, Aarhus University.

Table of Links

Abstract and 1 Introduction

2 Preliminaries and Related Work

2.1 On Sampling Strategies

2.2 Other Coreset Strategies

2.3 Coresets for Database Applications

2.4 Quadtree Embeddings

3 Fast-Coresets

4 Reducing the Impact of the Spread

4.1 Computing a crude upper-bound

4.2 From Approximate Solution to Reduced Spread

5 Fast Compression in Practice

5.1 Goal and Scope of the Empirical Analysis

5.2 Experimental Setup

5.3 Evaluating Sampling Strategies

5.4 Streaming Setting and 5.5 Takeaways

6 Conclusion

7 Acknowledgements

8 Proofs, Pseudo-Code, and Extensions and 8.1 Proof of Corollary 3.2

8.2 Reduction of k-means to k-median

8.3 Estimation of the Optimal Cost in a Tree

8.4 Extensions to Algorithm 1

References

6 Conclusion

In this work, we discussed the theoretical and practical limits of compression algorithms for center-based clustering. We proposed the first nearly-linear time coreset algorithm for k-median and k-means. Moreover, the algorithm can be parameterized to achieve an asymptotically optimal coreset size. Subsequently, we conducted a thorough experimental analysis comparing this algorithm with fast sampling heuristics. In doing so, we find that although the Fast-Coreset algorithm achieves the best compression guarantees among its competitors, naive uniform sampling is already a sufficient compression for downstream clustering tasks in well-behaved datasets. Furthermore, we find that intermediate heuristics interpolating between uniform sampling and coresets play an important role in balancing efficiency and accuracy.

Although this closes the door on the highly-studied problem of optimally small and fast coresets for k-median and k-means, open questions of wider scope still remain. For example, when does sensitivity sampling guarantee accurate compression with optimal space in linear time and can these conditions be formalized? Furthermore, sensitivity sampling is incompatible with paradigms such as fair-clustering [8, 15, 21, 43, 56] and it is unclear whether one can expect that a linear-time method can optimally compress a dataset while adhering to the fairness constraints.

7 Acknowledgements

Andrew Draganov and Chris Schwiegelshohn are partially supported by the Independent Research Fund Denmark (DFF) under a Sapere Aude Research Leader grant No 1051-00106B. David Sauplic has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101034413.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Meta, X approved ads containing violent anti-Muslim, antisemitic hate speech ahead of German election, study finds | News
Next Article Exciting Spider-Man 4 cast changes are in the works
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Transform Your Online Presence with Custom Web Design Services That Truly Reflect Your Brand
Gadget
General John Raymond on how China’s ‘space-enabled military’ threatens US and allied deterrence and warfighting
News
Costco discontinuing huge perk by quietly updating its website and fans are mad
News
Brilliant launches new smart home control panels
News

You Might also Like

Computing

If You’re an Amazon Ring Owner, You May Be an Accidental Spy | HackerNoon

21 Min Read
Computing

15 Best Online Collaboration Tools in 2025 (Free & Paid)

68 Min Read
Computing

Go 1.23: The New, Unique Package That Comes With It | HackerNoon

10 Min Read
Computing

wasm32-unknown-unknown – The C ABI Changes That You Need to Know About | HackerNoon

14 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?