By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
Computing

ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement

News Room
Last updated: 2026/03/17 at 6:01 AM
News Room Published 17 March 2026
Share
ARM NEON Accelerated CRC64 Optimization Shows Nearly 6x Improvement
SHARE

A patch posted today to the Linux kernel mailing list provides an ARM64-optimized CRC64-NVMe implementation for nearly a 6x improvement on modern Arm SoCs.

Open-source developer Demian Shulhan added this NEON-optimized CRC64 implementation, similar to the other architecture-specific CRC64 implementations such as for x86_64 and RISC-V. The intent on this CRC64 speed-up is for benefiting NVMe and other storage devices in addressing this bottleneck.

Shulhan explained in the patch and the nearly 6x gain was for an Arm Crotex-A72 SoC. He wrote:

“Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems.

The acceleration is implemented using C intrinsics (arm_neon.h) rather than raw assembly for better readability and maintainability.

Key highlights of this implementation:
– Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers.
– Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling.
– Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128.
– Safely falls back to the generic implementation on Big-Endian systems.

Performance results (kunit crc_benchmark on Cortex-A72):
– Generic (len=4096): ~268 MB/s
– PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)”

It’s surprising it took until now to see an ARM64/NEON-optimized CRC64 implementation for the Linux kernel at just a little more than one hundred lines of code.

benchmark results of NEON CRC64 implementation

The patch is now out for review on the Linux kernel mailing list.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article These are my 7 favorite Android weather apps that I think you’ll love too These are my 7 favorite Android weather apps that I think you’ll love too
Next Article UK must learn lessons from AI race and retain its quantum computing talent, says minister UK must learn lessons from AI race and retain its quantum computing talent, says minister
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

A Super Comfortable Deal: 30% Off Bose QuietComfort Ultra Headphones
A Super Comfortable Deal: 30% Off Bose QuietComfort Ultra Headphones
News
Oppo Find X9 Ultra launch event might feel pointless after this leak reveals nearly everything about the upcoming camera beast
Oppo Find X9 Ultra launch event might feel pointless after this leak reveals nearly everything about the upcoming camera beast
News
AI Won’t Fix Your Broken IAM Data | HackerNoon
AI Won’t Fix Your Broken IAM Data | HackerNoon
Computing
Tackle all your tasks on this 0 iPad
Tackle all your tasks on this $120 iPad
News

You Might also Like

AI Won’t Fix Your Broken IAM Data | HackerNoon
Computing

AI Won’t Fix Your Broken IAM Data | HackerNoon

6 Min Read
Intel Announces Core Ultra 200HX Plus Along With “Intel Binary Optimization Tool”
Computing

Intel Announces Core Ultra 200HX Plus Along With “Intel Binary Optimization Tool”

3 Min Read
Tencent Music posts higher 2025 profit as online music revenue expands · TechNode
Computing

Tencent Music posts higher 2025 profit as online music revenue expands · TechNode

1 Min Read
MTN targets up to 30 million connected homes across Africa
Computing

MTN targets up to 30 million connected homes across Africa

7 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?