Voice Is Africa’s Gateway To AI, And Google Wants To Lead It

When Abdoulaye Diack, program manager at Google Research, a division of Google dedicated to advancing the state of the art in computer science and applying those breakthroughs to real-world problems, talks about the origins of WAXAL, an open-source speech dataset from Google Research Africa, he begins with a single word.

“WAXAL means ‘speaking,’” he told , noting its roots in Wolof, a widely spoken language in the Senegambia region.

The name, chosen in 2020 by a Senegalese research lead at Google, Moustaph Cisse, reflects a larger truth about Africa’s AI trajectory: on a continent with more than 2,000 languages, most of them spoken rather than written, voice is not optional; it is the entry point.

For years, digital technology has centred on literacy, keyboards and text. But in Africa, language lives in conversation, across markets, farms, clinics and homes. AI that cannot parse accents, intonation or code-switching cannot meaningfully serve most Africans. WAXAL aims to change that. Instead of focusing solely on text translation, the project is creating the foundational infrastructure for speech-to-speech AI in low-resource African languages, centred on building a vast, high-quality hub of linguistic “raw material.”

“Having AI that can speak to us in our language and understand us, whether it’s our accent or intonation, is actually quite important,” Diack said.

The data disadvantage

The challenge begins with a stark imbalance. More than 50% of all websites are in English and a handful of Western languages. Africa’s 2,000-plus languages barely register in global digital datasets. Most are underrepresented online. Many are not written extensively. Some are not standardised at all.

If AI models are trained on digital text, and digital text barely exists for African languages, then the continent begins the AI race at a structural disadvantage.

“This is not a new problem,” Diack said. “People in research are aware of this huge gap in the lack of data.”

Without data, models cannot be trained. Without trained models, AI systems mishear, mistranslate or ignore entire populations. Diack recounts a common frustration: speaking in a francophone African accent while an AI note-taking system struggles to understand him. The technology exists, but it is not tuned to the local context.

That gap is what WAXAL wants to close.

Building a speech foundation

Launched officially in February 2026 after three years of development, WAXAL produced one of the largest speech datasets for African languages to date: more than 11,000 hours of recorded speech from nearly 2 million individual recordings, covering 21 Sub-Saharan African languages, including Hausa, Yoruba, Luganda and Acholi.

Beyond general speech collection, Google said it has invested over 20 hours of high-quality studio recordings to develop natural-sounding synthetic voices for voice assistants. These “studio premium” recordings are designed to make AI responses sound less robotic and more culturally authentic.

Google structured the initiative as a partnership model. Universities such as Makerere University in Uganda and the University of Ghana led much of the data collection. Local partners retain ownership of the datasets, which have been released as open source under licences that allow commercial use.

“We’ve mostly provided guidance and funding,” Diack explained. “All of this dataset does not belong to us. It belongs to the partners we work with.”

The ambition is not merely to feed Google’s own products but to seed an ecosystem.

Within days of release, the dataset recorded over 4,000 downloads, an early sign of researcher and developer uptake, according to Diack

Why voice matters

Google already offers translation tools across many languages. So why start from scratch?

Because translation is not speech.

Traditional machine translation relies on “parallel text,” sentences written in one language that are aligned with their equivalents in another. For low-resource languages, such parallel corpora barely exist. And even when translation works, it does not solve the deeper issue: many Africans interact with technology primarily through speech.

“A lot of people actually don’t know how to read and write on the continent,” Diack said. “Voice is basically the gateway to technology.”

Imagine a farmer in Kaduna asking about weather forecasts in Hausa. Or a mother in a rural Ghanaian village seeking nutritional advice in her local language. Text-based systems assume literacy and standardised spelling. Voice systems must navigate dialects, slang, code-switching and atypical speech patterns.

In Ghana, a speech recognition project, UGSpeechData initiative, produced over 5,000 hours of audio data. That initiative later enabled the development of a maternal health chatbot operating in local languages. It also extended into work on atypical speech, helping communities of deaf individuals and stroke survivors whose speech patterns often confound mainstream AI systems.

“AI systems are not adapted to that,” Diack said. “If you have different types of speech, it’s likely the system will not understand you.”

A crowded field

Google is not alone in this race.

Masakhane, a grassroots open-source research collective, has built translation systems across more than 45 African languages and developed Lulu, a benchmark for evaluating African language models. Its philosophy is community-first and fully open.

South Africa’s Lelapa AI, founded by former DeepMind researchers, focuses on commercial Natural Language Processing (NLP) products for African businesses. Its flagship model, Vulavula, captures dialects and urban code-switching patterns in isiZulu, Sesotho and Afrikaans. Lelapa emphasises “ground truth” datasets and heavy human error analysis, a costly but high-fidelity approach.

Lesan AI in Ethiopia has built some of the most accurate translation systems for Amharic, Tigrinya and Oromo using a human-in-the-loop model to ensure cultural nuance.

Meta’s No Language Left Behind (NLLB-200) project takes a massive-scale approach, translating across 200 languages, including 55 African ones, using zero-shot learning. Microsoft, meanwhile, integrates African languages into Microsoft Translator and is investing in multi-modal agricultural datasets through projects like Gecko.

The Gates Foundation-funded African Next Voices initiative launched in late 2025, producing 9,000 hours of speech data across 18 languages.

The ecosystem is diverse: open-source collectives, commercial startups, Big Tech giants, philanthropic funders. Each approaches the problem differently: scale versus depth, text versus voice, open versus proprietary.

Google’s distinction lies in its speech-heavy, ecosystem-oriented approach.

Sovereignty versus paralysis

Yet the involvement of global tech giants inevitably raises questions about data sovereignty and dependency.

If Google coordinates the release of multilingual speech datasets, does that create structural reliance on Google products? Could local developers become dependent on tools embedded within Gemini, Search or Android?

Diack acknowledges the tension but warns against becoming so conflicted that nothing is done about the opportunity that is presented.

“What is most important is that we are not left behind,” he said. “I definitely don’t want my data misused. But this is about enabling entrepreneurs, startups and researchers to work on data that is really important.”

He draws parallels with partnerships between universities and tech companies in the United States and Europe. Collaboration, he argues, accelerates capability-building. Already, researchers involved in early projects have published papers and advanced into global research roles.

The open licencing model is central to that argument. Developers can build commercial products on top of WAXAL datasets without depending on Google’s proprietary APIs. Google has also released open-weight translation models like Translate Gemma, which can be downloaded and fine-tuned independently.

Whether that balance satisfies critics remains to be seen. But the scale of the language gap suggests that inaction may carry greater risks.

Infrastructure: the silent prerequisite

Voice AI does not exist in isolation. It requires connectivity, bandwidth and computing infrastructure.

“You can’t really train AI models without the right infrastructure,” Diack said.

Google has invested in undersea cables, including landing the Equiano cable in Nigeria and other African markets, to strengthen broadband resilience. Fibre cuts in recent years exposed the fragility of regional networks. Redundant, high-capacity infrastructure is essential not only for cloud services but also for local data centres, a key pillar of digital sovereignty.

AI development depends on three foundations: people, data and infrastructure. Africa’s youthful population, projected to account for a large share of global AI users in the coming decades, offers a demographic advantage. But without investment in research capacity and digital infrastructure, demographic potential will not translate into technological leadership.

The coordination challenge

To avoid fragmentation, Google has shifted from isolated university partnerships to more coordinated collaboration models. One such effort involves working with Masakhane’s language hub and other volunteer networks to enable researchers and startups to apply for funding and contribute to shared datasets.

“If we are all doing our own thing across the continent, it’s not effective,” Diack said. “We need a concerted effort.”

So far, WAXAL has covered 27 languages, including four Nigerian ones. Some of the languages already covered include Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga), Swahili, and Yoruba.

The ambition to address all 2,000-plus African languages is aspirational, perhaps generational.

“That’s my dream,” Diack said.

But prioritisation matters. He points to education, agriculture and health as critical domains where voice AI could deliver measurable impact aligned with sustainable development goals.

Weather forecasting integrated into Google Search, improved through African research initiatives, already demonstrates global spillover. Cassava disease detection projects like the PlantVillage Nuru developed through a partnership between Penn State University, International Institute of Tropical Agriculture (IITA) and Consultative Group on International Agricultural Research (CGIAR), have influenced agricultural AI beyond Africa. These precedents suggest that solutions built for Africa can scale globally.

The cost of indigenous-first AI

Collecting voice data in low-resource settings is expensive. Field recordings, transcription, linguistic validation and studio-quality voice synthesis require sustained funding.

Google’s investment is part of a broader industry shift from scraping available text to investing in original speech data. Lelapa AI’s human-in-the-loop verification model underscores the cost of accuracy. Meta’s FLORES-200 dataset relied on professional translators. Microsoft’s agricultural voice initiatives involve thousands of annotated videos.

Quality matters. Synthetic voices must sound natural. Recognition systems must handle code-switching. Urban speech often blends English, local languages and slang in the same sentence.

African AI cannot be built solely through automation; it would require cultural and linguistic expertise.

For Diack, success is not measured solely by product integration.

“I want to see startups leveraging the dataset to provide services in local languages,” he said. “I want to see researchers writing papers based on our languages, not only English.”

Ultimately, however, the door Google is building must lead somewhere tangible. That includes Google products; Search, Gemini, voice assistants, that interact fluently in Yoruba, Wolof, Hausa or Luganda. But it also includes independent startups building fintech tools, health chatbots or agricultural advisory systems.

If anything, Africa’s AI future hinges on whether voice becomes an equalising force or another missed opportunity. If speech remains unrecognised by global systems, billions of words spoken daily across the continent will remain digitally invisible.