Somalia has one of the smallest tech ecosystems in East Africa. Most startups stay small, and capital is limited. The country has under 500 startups, but only 53 have raised funding totaling $47.8 million. Investors have participated in 49 rounds, which also means activity is thin, with only 131 new companies formed in the past five years that have raised $12.1 million. Five startups have since been acquired, while 95 have shut down. Only eight are founded by women.
Shaqodoon, a non-profit set up in 2011, has spent years trying to strengthen the local tech space by pushing practical digital tools across Somalia and Kenya. Its teams build systems, including mobile learning tools, used by schools, NGOs, and public agencies.
From this work comes NaMaqal, a Somali-language speech-to-text system that converts spoken Somali into text. The thinking behind this product is that organisations cannot respond to communities they do not understand.
For many organisations, feedback often comes from rural and humanitarian settings, stored in audio files on phones, recorders or hotline servers. Staff might record meetings, interviews, and complaints, but struggle to process them because translation takes days.
“We noticed that much of the valuable feedback shared by communities, especially in rural and humanitarian contexts, existed only in spoken Somali. Meetings, interviews, and consultations were recorded but rarely analysed because transcribing them manually took days,” said Mustafa Othman, executive director of Shaqodoon Organisation.
Today, NaMaqal provides teams with tools to understand and use spoken feedback almost in real-time. It is currently working with UN agencies, Danish Refugee Council and World Vision International.
“Somali is underrepresented in global speech datasets and commercial tools do not support it well.”
How NaMaqal handles real speech
NaMaqal sits on top of years of work on Imaqal, Shaqodoon’s call-based feedback tool. That platform allowed people to leave voice messages with comments, complaints, and updates.
According to Othman, the volume grew fast. Daily calls ranged between 1,500 and 3,000, and staff spent up to 15 hours a day listening, transcribing, and translating. The process slowed response times. What NaMaqal does is move transcription and translation into the system itself, leaving staff to refine the output rather than handle everything manually.
The pipeline starts with raw audio, where field teams or partner media record speech through phones or audio recorders. Once the file enters the system, it is processed by reducing noise, removing silent sections, and adjusting the volume balance. The model then converts the sound waves into features that capture pitch, frequency, and other acoustic patterns. These signal feeds are then fed into a neural network trained on thousands of hours of Somali speech.
At this stage, the model predicts phonemes, then words, then full sentences. A language model checks grammar, spelling, and context to ensure accuracy. It also adds punctuation, removes fillers, and redacts sensitive details.
The end product is Somali text that is ready for review. Staff review segments flagged as low confidence, correct errors, and add notes for new slang or community-specific terms. The system logs these changes, and the reviewed data becomes part of the next training cycle.
“Somali brings its own linguistic pressure points where words carry multiple layers of meaning.”
Building the dataset was the heavy lift, Othman said, because there are no large open Somali speech datasets. The team had to create one, so Shaqodoon worked with radio stations, universities, and community media to gather diverse recordings.
They collected talk shows, interviews, lectures, news clips, and voice feedback from Imaqal. Linguists and trained annotators manually labeled the data by noting dialect, context, tone, and speaker variation. This helped the model learn how different regions speak.
Out of the three dialects, Maxaa is the most widely spoken dialect, and therefore, it dominates many existing datasets. However, Maay and coastal dialects also have large communities and ignoring them would lock entire groups out of the system.
“We then used a combination of in-house linguists and trained annotators to segment, label, and verify transcripts. A key principle was regional balance to ensure both gender and dialect diversity in the corpus,” Othman said.
The team then ensured that the distribution of gender, region, and dialect remained wide during the collection process. Despite Maxaa coverage being strong, Maay achieves a roughly 40% accuracy rate because its grammar and vocabulary differ. Work on Maay continues, Othman said.
Somali brings its own linguistic pressure points where words carry multiple layers of meaning. Speakers stretch or shorten phrases depending on the tone, and there are cases where people switch between Somali, Arabic, and English in a single sentence, a common practice among East Africans.
Rural recordings often feature wind, traffic, or crowd noise, particularly when speeches are recorded outdoors. All these factors push the model into more complex territory.
The platform attempts to detect speech patterns and route the audio to the most suitable model for processing. Each dialect group has separate acoustic and language models, after which reviewers from Maay-speaking regions provide corrections and annotations. Their input feeds into periodic retraining cycles.
Business model
NaMaqal runs on a cloud-based, container setup designed to handle large batches of audio. In this operation, the GPU (graphics processing unit) handles the heavy work during peak periods. In remote areas, recordings can be cached offline and uploaded later.
“All files are encrypted in transit and at rest,” Othman clarified.
The system needs this structure because Somali institutions record a large volume of speech across many settings. Radio stations gather public views, and humanitarian teams collect field updates. Government units run consultations, researchers record interviews, and hotlines receive complaints and questions. Manual transcription slows all of this, so NaMaqal offers a searchable record that teams can filter by topic, location, or timeframe, making it easier to trace emerging issues such as price changes or local disputes.
The workflow also shifts how staff spend their time, so instead of full manual transcription, reviewers focus on correcting segments the model flags as uncertain. They also add notes for new terms or local phrasing, which feed back into the training data, allowing the system to adjust to real speech patterns.
Organisations can export the final text or plug it into dashboards. The output becomes more useful when paired with field reports, programme data or geographic information.
Othman did not disclose how many customers use the tool, but clarified that “pricing range per use.”
Why does a small ecosystem need tools like this?
According to Othman, Somalia’s tech scene is small. Most young builders lack funding, training, or early customers, so few products reach national scale. Building speech tech here is rare because it is cumbersome.
Despite the demands, NaMaqal cuts through that by solving a grounded problem. Somali is underrepresented in global speech datasets and commercial tools do not support it well. Organisations fall back on manual translation, which cannot keep up with community feedback. NaMaqal opens the door to faster response and better visibility across sectors by making spoken content searchable
“The possibilities go far beyond converting speech to text. Once you can see and search spoken content, it unlocks new capabilities across many sectors, like processing voice feedback from communities to inform rapid response,” Othman said.
