Some commits merged today to FFmpeg Git provide additional hand-tuned Assembly code for AVX-512 with capable Intel and AMD processors.
Open-source multimedia developer Niklas Haas today upstreamed some additional AVX2 and AVX-512 tuning to FFmpeg, on top of the multimedia library’s already vast array of hand-tuned code for leveraging Advanced Vector Extensions.
For FFmpeg’s avfilter scene_sad code, there is now an AVX-512 implementation added that comes in at 36.31x the speed of the plain C code, according to benchmarks run by Niklas Haas. There was already an AVX2 path that achieved 25x the performance of the common C code but now with AVX-512 is exceeding 36x the performance.
Another commit added high bit depth AVX2 and AVX-512 versions of the scene_sad avfilter code. There is around an 11x improvement over the common C code or around 22x when using AVX-512.
AVX-512 continues to pay off particularly with the latest AMD Zen 4 / Zen 5 and recent Intel Xeon processors.