By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Inside the Data: What Shapes Startup Deal Sizes in Africa | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Inside the Data: What Shapes Startup Deal Sizes in Africa | HackerNoon
Computing

Inside the Data: What Shapes Startup Deal Sizes in Africa | HackerNoon

News Room
Last updated: 2025/11/05 at 12:40 PM
News Room Published 5 November 2025
Share
Inside the Data: What Shapes Startup Deal Sizes in Africa | HackerNoon
SHARE

Table Of Links

ABSTRACT

INTRODUCTION

LITERATURE REVIEW

DATA AND METHODS

RESULTS

DISCUSSION

CONCLUSION AND REFERENCES

DATA AND METHODS

This section of the study provides an overview of the data sources, methodology, and techniques employed to investigate the key factors affecting deal amounts in African startup investments. The data and method section details the process of data collection, cleaning, and preparation, followed by feature grouping, exploratory data analysis, and the implementation of machine learning models to develop a predictive model. The combination of rigorous data handling and state-of-the-art analytical techniques ensures the robustness of the study’s findings and enhances their academic credibility.

Data

This study employs a dataset sourced from africathebigdeal.com to systematically investigate the key factors influencing deal amounts in African startup investments and to formulate policy recommendations that bolster the growth of the startup ecosystem on the continent. The dataset compilation adhered to the following methodology:

● Inclusion criteria stipulated that startups must either operate in Africa with their headquarters situated within the continent or possess African founders despite having their headquarters located outside Africa.

● The database exclusively captures deals that have been publicly disclosed or openly shared by investors or founders themselves.

● Deal size limitations dictate the inclusion of transactions amounting to a minimum of +$100,000 for 2023, 2022, and 2021; +$500,000 for 2020; and +$1,000,000 for 2019. The principal dataset, serving as the primary focus of this investigation, comprises 2,521 startup deals, encompassing 34 attributes, including the specific deal amount. Simultaneously, the secondary dataset comprises information regarding 1,792 investors who engaged in a minimum of one investment in African startups.

Data Preprocessing

The process of preparing and cleaning the data involved several sequential steps, as outlined below:

  1. Scrutinizing both datasets for any missing or erroneous data points and addressing them accordingly by imputing or removing the data points as appropriate, as well as rechecking the integrity of the data from media releases.
  2. Integrating the two datasets by merging them based on the investor’s name, which resulted in a comprehensive dataset containing both startup and investor information.
  3. Transformation of categorical variables such as sector and deal type into binary variables or dummy variables, to be utilized in the analysis.
  4. Standardizing numerical variables such as amount raised and valuation to facilitate comparability across diverse units of measurement.

Feature Grouping

To further understand the implications of the different key factors, the features extracted from the primary and secondary datasets were organized into three distinct categories, as outlined below:

● Founding team features (F): This group includes attributes related to the startup’s founding team, such as the number of founders, gender-mix, presence of a woman co-founder or CEO, the CEO’s university, country and continent of the university, graduation year, and the years elapsed between graduation and the startup’s launch.

● Company-related features (C): This category encompasses variables associated with the startup itself, such as the name, website, country, and region of operation, launch date, sector, number of employees, and a brief description of the business.

● Investment-related features (I): This group consists of variables related to the investment deals, including the deal year and date, type of investment, valuation, exit status, investor details, and whether the startup is a Y Combinator alumnus.

Through diligent feature grouping, the study ensured that the variables are organized in a coherent manner, ultimately enhancing the clarity and interpretability of the analysis.

Exploratory Data Analysis (EDA)

In this study, Exploratory Data Analysis (EDA) was conducted to examine the dataset and identify key factors that affect deal amounts in African startup investments. A critical aspect of EDA was assessing correlations between variables. Using Pearson’s correlation coefficient, we measured the linear association between the dependent variable (deal amount) and the independent variables (founding team, company, and investment-related features). In order to better investigate the correlation between the features, we used the three feature groups discussed earlier in 5 combinations: F, C, I, F+C, and F+C+I. This approach aimed to uncover complex relationships between variables and better understand the importance of each feature.

Models

Using the same combinations of feature groups discussed in the EDA section, four machine learning algorithms were employed: Linear Regression (LR), Support Vector Regression (SVR), Random Forest (RF), and Distributed Gradient Boosting (DGB). Each model was trained and tested using cross-validation. To evaluate the prediction models, the Mean Squared Error (MSE) metric was employed. During the cross-validation process, the performance of each model was assessed by averaging the MSE values obtained from each fold. The comparison of these averaged MSE values facilitated the selection of the most accurate and reliable algorithm for predicting funding amounts in African startups. The chosen model’s performance, along with insights gained from the EDA process, served as the foundation for policy recommendations aimed at supporting the growth of the African startup ecosystem.

:::info
Author:

Khalil Liouane

:::

:::info
This paper is available on arxiv under by-SA 4.0 Deed (Attribution-Sahrealike 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article PlayStation ‘hands out rare £40 refunds’ to players after new game complaints PlayStation ‘hands out rare £40 refunds’ to players after new game complaints
Next Article Motorola Moto G06 Review Motorola Moto G06 Review
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Apple’s slim M4 MacBook Air is cheaper than ever ahead of Black Friday
Apple’s slim M4 MacBook Air is cheaper than ever ahead of Black Friday
News
Our Favorite Cordless Vacuums for Speedy Cleaning
Our Favorite Cordless Vacuums for Speedy Cleaning
Gadget
Optimizing SAGE Net: Achieving High Performance with Shorter Input Sequences for Online Inference | HackerNoon
Optimizing SAGE Net: Achieving High Performance with Shorter Input Sequences for Online Inference | HackerNoon
Computing
I'm Still Using My TP-Link Router, Even Though It Could Be Banned in the US
I'm Still Using My TP-Link Router, Even Though It Could Be Banned in the US
News

You Might also Like

Optimizing SAGE Net: Achieving High Performance with Shorter Input Sequences for Online Inference | HackerNoon
Computing

Optimizing SAGE Net: Achieving High Performance with Shorter Input Sequences for Online Inference | HackerNoon

4 Min Read
More Intel Crescent Island Enablement Prepped For Linux 6.19
Computing

More Intel Crescent Island Enablement Prepped For Linux 6.19

2 Min Read
Do People Still Read Blogs? Is Blogging Finally Dead in 2025?
Computing

Do People Still Read Blogs? Is Blogging Finally Dead in 2025?

18 Min Read
A Graph Transformer Network for Predicting Remaining Process Time | HackerNoon
Computing

A Graph Transformer Network for Predicting Remaining Process Time | HackerNoon

21 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?