By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: You Can’t Compare Massive Data Streams In Javascript. Or Can You? 🤔 | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > You Can’t Compare Massive Data Streams In Javascript. Or Can You? 🤔 | HackerNoon
Computing

You Can’t Compare Massive Data Streams In Javascript. Or Can You? 🤔 | HackerNoon

News Room
Last updated: 2025/02/27 at 2:45 PM
News Room Published 27 February 2025
Share
SHARE

I know, it looks impossible. Javascript is not a very performant language, mostly because of its single-threaded model. So if I tell you we’re going to compare 1.8 million votes in a fictional city election, right into the browser, you’ll probably think I’m crazy. A browser would freeze and crash — every time. Guess what? You’d be right. Here is the result with a very popular diff library:

Demo of an application that attempts to compare 1.8 million votes over two elections.Demo of an application that attempts to compare 1.8 million votes over two elections.

Clearly, it’s not wow time. But with the right techniques — efficient memory management, a streaming architecture, web workers, and batch updates — you can do it. Here’s the proof:

Demo of an application that smoothly compares 1.8 million votes over two elections.Demo of an application that smoothly compares 1.8 million votes over two elections.

As you can see, even with 1.8 millions objects processed in real time, the UI is perfectly responsive. Most importantly, by injecting small data batches over time, we’ve turned a wait for it delay into a watch it happen experience.

Alright, now let’s dive into the making process.

Challenges

We have three challenges to solve.

  • Achieving top performance.

  • Handling different input formats.

  • Being easy for developers to use.

Performance

The first thing is to avoid blocking the main thread at all costs. So we need asynchronous code, a worker to handle the heavy lifting, and a linear complexity O(n), which means that the execution time grows proportionally with the input size. Additionally, efficient memory management is crucial.

Efficient memory management requires freeing up memory as soon as some data is processed, and also progressively sending the data in chunks to minimize UI updates. For example, instead of sending one million object diffs in a row, we could send groups of a thousand. This would dramatically reduce the number of DOM updates.

Input Format

The second challenge is to handle different kind of input formats. We need to be versatile. We should accept arrays of objects, JSON files, and even data streams. So before starting the diff process, we need to convert everything into a single format: a readable stream.

Developer Experience

Finally, we need to provide a great developer experience. The function must be easy to use, easy to customize. Basically, the user will be able to give two lists to our function. We will compare them, and progressively send back the result.

The simplest way to do it is to expose an event listener, with three kind of events: ondata, onerror, onfinish. Easy peasy.

Our streamListDiff function and its event listenersOur streamListDiff function and its event listeners

The ondata event will receive an array of object diffs, called a chunk. The diff should be clear, with a previous value, a current value, an index tracker, and a status — equal, updated, moved, or deleted.

And because I know most of you love TypeScript, we’ll also have autocompletion. Icing on the cake, users will be able to specify options to refine the output. Let’s see how the algorithm works.

Usage example with options and typingUsage example with options and typing

Algorithm Walkthrough

For simplicity’ sake, the code we’re going through comes from the streamListDiff function of the @donedeal0/superdiff library.

Let’s take a look at the main function. It takes two lists to compare, a common key across all the objects, like an id for exemple, to match the objects between the two lists, and some options to refine the output.

In the function body, we can see it returns an event listener before starting the diff. The trick is to trigger the logic block asynchronously. Basically, the event loop will execute all synchronous code first, and only then start the real work.

streamListDiff: our main functionstreamListDiff: our main function

Then, we convert our two lists to readable streams, using different methods for each input type (an array, a file, etc.).

Convert file or array inputs to readable streamsConvert file or array inputs to readable streams

Once we have two valid streams, we iterate over both of them in parallel thanks to Promise.all(). At each iteration, we do two things: first, we check if the object is valid — if not, we emit an error message. Second, we check if an object with a similar reference property is already in the data buffer of the other list.

Iterating over the two streamsIterating over the two streams

What’s a data buffer? There are two buffers, one for each list. The idea is to store the unmatched objects in a hashmap so that the other list, which is being parsed at the same moment, can check in real time if there is a match for its current object. This avoids doing two full iterations, with no results and high memory consumption, before starting the real diff. We don’t lose time, and we’re efficient.

We use a hashmap to store unmatched objects because it supports any kind of value as keys, is very performant and provides an iterator out of the box.

Concurrent insertions and retrieval of matching entriesConcurrent insertions and retrieval of matching entries

Long story short, if there is a match in the buffer, we immediately remove it to free up memory and compare it to the current object. Once we have the object diff, we immediately send it to the user. If we can’t do the comparaison, we insert the object in the relevant buffer, waiting to be compared. Afterward, we simply iterate over each buffer and process the remaining objects in the same way.

Sending an object diff to the userSending an object diff to the user

Now, you’re probably wondering why we don’t do a single iteration over the first list, and find its match in the second list. Well, if you have a million objects, a find() lookup will be highly inefficient, as it would lead to a huge number of iterations. But the data buffer method lets us retrieve the match with zero performance cost thanks to the has() method.

Map() is more efficient given our constraintsMap() is more efficient given our constraints

Earlier, I said we immediately send each object diff to the user. This is partially true. Imagine the user receives a million objects in a row — it could overload the main thread. What we do instead, is store the object diffs in another buffer. Once this buffer reaches a certain size, say a thousand objects, we flush it and send a single batch of data to the user.

To do it, we use a closure. Let’s take a look at the outputDiffChunk function. It has an array to store the diffs and returns two functions: handleDiffChunk and releaseLastChunks. handleDiffChunk receives an object diff and adds it to the buffer if it’s not full yet. If it’s full, it sends the batch to the user. Since we use a single instance of handleDiffChunk, the context of outputDiffChunk is preserved, which is why we can access the diff buffer each time we process a new object diff.

Instead of sending data one at a time, we send it in batches.Instead of sending data one at a time, we send it in batches.

Finally, releaseLastChunks is self-explanatory. Once all diffs are processed, it flushes the diff buffer one last time and send the remaining data to the user.

Ultimately, we emit a finish event, and that’s all.

One more thing

The demo we saw earlier uses virtualized list to render massive amounts of items in the DOM, and also takes advantage of requestAnimationFrame to avoid updating it too often.

As a result, everything runs really smooth. It seems like John Doe has been elected, congratulation to him!

LINKS

Repository | Documentation | Npm

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article How Cloud-Based Payroll Systems Can Benefit Your Business in 2025
Next Article F.D.A. Reinstates Fired Medical Device, Food and Legal Staffers
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Gates on Musk: 'World’s richest man killing the world’s poorest children'
News
iFlytek chair delivers confident speech to employees despite projected net profit fall of over 70% for 2023 · TechNode
Computing
AI-generated video gave victim a voice at his killer’s sentencing in Arizona
News
Narwal Freo Pro robot drops to a new all-time low price!
News

You Might also Like

Computing

iFlytek chair delivers confident speech to employees despite projected net profit fall of over 70% for 2023 · TechNode

3 Min Read
Computing

Deadzone: Rogue Rockets To Steam’s Top 10 Global Sellers With 100K+ Players In Week 1 | HackerNoon

3 Min Read
Computing

Steam Deck Adds Battery Maximum Charge Limit Control In Newest Beta

1 Min Read
Computing

Alibaba reports slower growth in core businesses amid rising competition · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?