Mozilla Is Now Offering Free AI Voice Training Data In 180 Languages

Mozilla is now offering free AI voice training data in 180 languages

Last updated: 2024/11/08 at 4:51 AM

News Room Published 8 November 2024

Since 2017, Mozilla’s Common Voice project has collected more than 30,000 hours of recordings of people from around the world speaking their languages.

The project’s goal is to provide a free, publicly available dataset that anyone can use for training voice recognition AI software and other projects, while ensuring that all the material is provided with the informed consent of the people being recorded. Common Voice now includes recorded material and corresponding transcripts in roughly 180 languages, all available under the public domain-like Creative Commons CC0 license, with volunteers from communities worldwide working to add their own languages to the mix.

“We don’t add languages to the platform without communities,” says EM Lewis-Jong, product director at Mozilla. “It sounds like a small thing, but I think in the current AI age, it actually is weirdly radical to be consent-centered.”

(Image: Mozilla)

And while Mozilla doesn’t disclose, or in some cases even necessarily know, exactly who’s using the data, Lewis-Jong says it’s been used by Big Tech companies, small independent operations, and plenty of projects in between. The dataset has been downloaded from Mozilla millions of times, and it’s also available through the AI development platform Hugging Face, which hosts speech recognition models trained on the Common Voice data.

In some cases, the dataset has been used by smaller projects focused on specific tasks, like delivering multilingual legal advice, providing information about governance, or building voice-powered chatbots with local agricultural information.

“I think it’s fair to say that from the largest and most famous technology organizations to really small civil society projects and individual developers, we really do see the full range,” Lewis-Jong says.

Common Voice continues to grow as new material gets recorded in existing languages and new volunteers approach Mozilla to localize the contribution for their own languages, letting contributors record, validate, and transcribe material that gets added to future releases.

Mozilla is now offering free AI voice training data in 180 languages

Leave a Reply Cancel reply

Stay Connected

Latest News

iPhone 17 and ‘iPhone 17 Air’ Expected to Lack 5x Optical Zoom Lens

Road trips, midnight snacks, backyard raccoons — capture it all on this day/night camera

Free Itinerary Excel Templates for Better Planning |

Netflix’s Spellbound really tugged at my heartstrings, and its touching message will hit kids and parents alike

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News