By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
Computing

ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement

News Room
Last updated: 2026/03/17 at 6:01 AM
News Room Published 17 March 2026
Share
ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
SHARE

A patch posted today to the Linux kernel mailing list provides an ARM64-optimized CRC64-NVMe implementation for nearly a 6x improvement on modern Arm SoCs.

Open-source developer Demian Shulhan added this NEON-optimized CRC64 implementation, similar to the other architecture-specific CRC64 implementations such as for x86_64 and RISC-V. The intent on this CRC64 speed-up is for benefiting NVMe and other storage devices in addressing this bottleneck.

Shulhan explained in the patch and the nearly 6x gain was for an Arm Crotex-A72 SoC. He wrote:

“Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems.

The acceleration is implemented using C intrinsics (arm_neon.h) rather than raw assembly for better readability and maintainability.

Key highlights of this implementation:
– Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers.
– Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling.
– Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128.
– Safely falls back to the generic implementation on Big-Endian systems.

Performance results (kunit crc_benchmark on Cortex-A72):
– Generic (len=4096): ~268 MB/s
– PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)”

It’s surprising it took until now to see an ARM64/NEON-optimized CRC64 implementation for the Linux kernel at just a little more than one hundred lines of code.

benchmark results of NEON CRC64 implementation

The patch is now out for review on the Linux kernel mailing list.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article These are my 7 favorite Android weather apps that I think you’ll love too These are my 7 favorite Android weather apps that I think you’ll love too
Next Article UK must learn lessons from AI race and retain its quantum computing talent, says minister UK must learn lessons from AI race and retain its quantum computing talent, says minister
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

OpenAI Leaders Reportedly Told Employees It’s Taking Fewer ‘Side Quests’
OpenAI Leaders Reportedly Told Employees It’s Taking Fewer ‘Side Quests’
News
Samsung discontinues its Galaxy Z TriFold after just three months
Samsung discontinues its Galaxy Z TriFold after just three months
News
The Folding Brompton Electric T-Line Is a Stylish Commuter Dream
The Folding Brompton Electric T-Line Is a Stylish Commuter Dream
Gadget
Forget Bose — there’s no better party speaker on the planet than the brand-new Marshall Bromley 450
Forget Bose — there’s no better party speaker on the planet than the brand-new Marshall Bromley 450
News

You Might also Like

Why AI Security Must Evolve Into Lifecycle Governance | HackerNoon
Computing

Why AI Security Must Evolve Into Lifecycle Governance | HackerNoon

5 Min Read
Blender 5.1 Released With Raycast Nodes, AMD GPU Ray-Tracing By Default
Computing

Blender 5.1 Released With Raycast Nodes, AMD GPU Ray-Tracing By Default

1 Min Read
China’s 2025 618 Shopping Festival: Simplified discounts and new incentives from Tmall, JD.com, Douyin, and PDD · TechNode
Computing

China’s 2025 618 Shopping Festival: Simplified discounts and new incentives from Tmall, JD.com, Douyin, and PDD · TechNode

3 Min Read
A Dog on the Edge of Death | HackerNoon
Computing

A Dog on the Edge of Death | HackerNoon

42 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?