FFmpeg developers are known for delivering some really wild performance gains from hand-optimized Assembly code especially around Intel/AMD AVX-512 optimizations for various features of this widely-used open-source multimedia library. Merged this week was enhancing the Bwdif deinterlacing video filter with a 23~28x speed-up over the basic C code path when using AVX-512.
Niklas Haas landed an AVX-512 implementation of the Bob Weaver deinterlacing video filter “vf_bwdif” for benefiting newer Intel and AMD processors with Advanced Vector Extensions 512 capabilities.
Compared to the unoptimized and very basic C baseline, bwdif8_avx512 is 23.28x faster and bwdif10_avx512 is at 28.27x. Or compared to the existing AVX2 implementation, just under twice as fast.
This works for Intel/AMD AVX-512 processors but is gated to prevent usage on Skylake processors that had the more notorious AVX-512 implementation that was subject to thermal/power issues and in turn CPU down-clocking.
The new AVX-512 implementation is merged ahead of FFmpeg 8.0 releasing in a few weeks.