The upcoming FFmpeg 8.0 multimedia library release continues to get more exciting almost by the day. The newest feature being squeezed into this next release is a Whisper audio filter for making use of OpenAI’s Whisper model for providing automatic speech recognition / transcription capabilities.
For those unaware, Whisper is an automatic speech recognition model trained on a very large dataset and has proven to be extremely capable. FFmpeg 8.0 can be built with the “–enable-whisper” library when the Whisper.cpp library is present on the system for having OpenAI Whisper model support. There is optional GPU acceleration and various tunables that can be used for then running automatic transcription with FFmpeg to dump the text to a SRT file, sending the output in JSON format to an HTTP web service, and other capabilities.
Those interested in this OpenAI Whisper audio filter support that was merged to FFmpeg over the weekend can be found via this Git commit.
FFmpeg 8.0 should release within a few weeks and also feature a number of Vulkan acceleration enhancements, new CPU performance optimizations, and a wide variety of other improvements for this widely-used open-source multimedia library.