In addition to the recent optional IO_uring support for the PostgreSQL database server on Linux and async I/O batch mode, another exciting performance improvement was merged this week. Landing in the PostgreSQL database server this week was support for using AVX-512 instructions for CRC32C computations.
Leveraging AVX-512 instructions on capable AMD and Intel processors can lead to some wild performance improvements for the CRC32C cyclic redundancy check code path for this popular open-source database. The commit adding this AVX-512 usage for CRC32C calculations explains:
“Compute CRC32C using AVX-512 instructions where available
The previous implementation of CRC32C on x86 relied on the native CRC32 instruction from the SSE 4.2 extension, which operates on up to 8 bytes at a time. We can get a substantial speedup by using carryless multiplication on SIMD registers, processing 64 bytes per loop iteration. Shorter inputs fall back to ordinary CRC instructions. On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs between 64 and 112 bytes, and 3x faster for 256 bytes.
The VPCLMULQDQ instruction on 512-bit registers has been available on Intel hardware since 2019 and AMD since 2022. There is an older variant for 128-bit registers, but at least on Zen 2 it performs worse than normal CRC instructions for short inputs.
We must now do a runtime check, even for builds that target SSE 4.2. This doesn’t matter in practice for WAL (arguably the most critical case), because since commit e2809e3 the final computation with the 20-byte WAL header is inlined and unrolled when targeting
that extension. Compared with two direct function calls, testing showed equal or slightly faster performance in performing an indirect
function call on several dozen bytes followed by inlined instructions on constant input of 20 bytes.”
50% to 3x faster when testing on Intel Tiger Lake with this AVX-512 CRC32C code! It will be interesting to see if there are even more pronounced gains on newer Intel Xeon server processors with better AVX-512 support or similarly with the AMD Zen 4 and Zen 5 processors with their widespread AVX-512 support.
This is another great improvement for the open-source PostgreSQL 18 database server ahead of that next major feature release due out around September.