Intel’s x86-simd-sort open-source project is a C++ template library for high performance sorting routines that can leverage AVX2 and AVX-512 for crazy fast sorting. The x86-simd-sort code in turn is used by Numpy, more recently adopted by PyTorch too, and has shown off the great performance potential of AVX-512 for very fast sorting algorithms. Out today is x86-simd-sort 7.0 and it’s even faster due to now supporting OpenMP parallelization.
With today’s x86-simd-sort 7.0 release the OpenMP support for multi-threading isn’t enabled by default but can be enabled for those wanting to use multiple CPU cores for faster sorting on top of the speedy Advanced Vector Extensions (AVX) implementations. The qsort, argsort, and keyvalue_qsort routines can all be multi-threaded with this optional OpenMP support. Sorting of medium to large arrays should be three to four times faster with this code path. This optional OpenMP support is also pulled already into Numpy when building it with OpenMP enabled.
The x86-simd-sort 7.0 release also ffixes a performance regression for 16-bit data types, improves the argsort performance, and other updates.
Downloads and more details on the x86-simd-sort 7.0 release via GitHub.