By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
Computing

ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement

News Room
Last updated: 2026/03/17 at 6:01 AM
News Room Published 17 March 2026
Share
ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
SHARE

A patch posted today to the Linux kernel mailing list provides an ARM64-optimized CRC64-NVMe implementation for nearly a 6x improvement on modern Arm SoCs.

Open-source developer Demian Shulhan added this NEON-optimized CRC64 implementation, similar to the other architecture-specific CRC64 implementations such as for x86_64 and RISC-V. The intent on this CRC64 speed-up is for benefiting NVMe and other storage devices in addressing this bottleneck.

Shulhan explained in the patch and the nearly 6x gain was for an Arm Crotex-A72 SoC. He wrote:

“Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems.

The acceleration is implemented using C intrinsics (arm_neon.h) rather than raw assembly for better readability and maintainability.

Key highlights of this implementation:
– Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers.
– Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling.
– Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128.
– Safely falls back to the generic implementation on Big-Endian systems.

Performance results (kunit crc_benchmark on Cortex-A72):
– Generic (len=4096): ~268 MB/s
– PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)”

It’s surprising it took until now to see an ARM64/NEON-optimized CRC64 implementation for the Linux kernel at just a little more than one hundred lines of code.

benchmark results of NEON CRC64 implementation

The patch is now out for review on the Linux kernel mailing list.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article These are my 7 favorite Android weather apps that I think you’ll love too These are my 7 favorite Android weather apps that I think you’ll love too
Next Article UK must learn lessons from AI race and retain its quantum computing talent, says minister UK must learn lessons from AI race and retain its quantum computing talent, says minister
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Samsung is finally deploying One UI 8.5 (Android 16) on its old models: here are the lucky ones
Samsung is finally deploying One UI 8.5 (Android 16) on its old models: here are the lucky ones
Mobile
Alexa+ is starting the public preliminary test in Germany
Alexa+ is starting the public preliminary test in Germany
Software
The Galaxy Ring 2 connected ring is in demand: see you in 2027, perhaps
The Galaxy Ring 2 connected ring is in demand: see you in 2027, perhaps
Computing
If the question is how much an employee would have to work to earn the same as a manager, we have the answer: a century
If the question is how much an employee would have to work to earn the same as a manager, we have the answer: a century
Gaming

You Might also Like

The Galaxy Ring 2 connected ring is in demand: see you in 2027, perhaps
Computing

The Galaxy Ring 2 connected ring is in demand: see you in 2027, perhaps

3 Min Read
Anthropic deploys Claude Security to transform code vulnerability detection
Computing

Anthropic deploys Claude Security to transform code vulnerability detection

5 Min Read
Why should you never answer “Hello” to an unknown number?
Computing

Why should you never answer “Hello” to an unknown number?

4 Min Read
Xiaomi is preparing a powerful 18 Ultra with its new in-house Xring O3 chip
Computing

Xiaomi is preparing a powerful 18 Ultra with its new in-house Xring O3 chip

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?