AdaptiveCpp 24.10 is out today as this implementation of SYCL and C++ standard parallelism for CPUs and GPUs across hardware vendors. This compiler for C++ heterogeneous programming models has tacked on more features and additional performance optimizations with this update.
AdaptiveCpp 24.10 has furhter ramped up performance by adding more JIT-time optimizations. The release notes mention the possibility of seeing “substantial performance improvements” for at least some kernels with AdaptiveCpp 24.10. There is also a new “ACPP_ALLOCATION_TRACKING=1” option for yielding more insight around memory use and potentially yielding more performance improvements too.
AdaptiveCpp 24.10 also adds support for SYCL 2020 group algorithms, additional C++ parallel STL algorithms for GPU/device offloading, introducing the acpp::algorithms library, and a new framework for JIT-time reflection. Plus there are new extensions and other features.
New benchmarks shared by the AdaptiveCpp 24.10 project are showing very competitive — and winning — performance in relation to NVIDIA CUDA:
More benchmark results and other information on the AdaptiveCpp 24.10 release via GitHub.