By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Machine Learning-based Vulnerability Protections For Android Open Source Project | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Machine Learning-based Vulnerability Protections For Android Open Source Project | HackerNoon
Computing

Machine Learning-based Vulnerability Protections For Android Open Source Project | HackerNoon

News Room
Last updated: 2025/11/19 at 1:25 AM
News Room Published 19 November 2025
Share
Machine Learning-based Vulnerability
Protections For Android Open Source Project | HackerNoon
SHARE

:::info
Author:

  1. Keun Soo Yim

:::

Table Of Links

ABSTRACT

I. INTRODUCTION

II. BACKGROUND

III. DESIGN

  • DEFINITIONS
  • DESIGN GOALS
  • FRAMEWORK
  • EXTENSIONS

IV. MODELING

  • CLASSIFIERS
  • FEATURES

V. DATA COLLECTION

VI. CHARACTERIZATION

  • VULNERABILITY FIXING LATENCY
  • ANALYSIS OF VULNERABILITY FIXING CHANGES
  • ANALYSIS OF VULNERABILITY-INDUCING CHANGES

VII. RESULT

  • N-FOLD VALIDATION
  • EVALUATION USING ONLINE DEPLOYMENT MODE

VIII. DISCUSSION

  • IMPLICATIONS ON MULTI-PROJECTS
  • IMPLICATIONS ON ANDROID SECURITY WORKS
  • THREATS TO VALIDITY
  • ALTERNATIVE APPROACHES

IX. RELATED WORK

CONCLUSION AND REFERENCES

ABSTRACT

This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a classifier trained to identify code changes with a high likelihood of vulnerabilities.

The online classifier leverages various types of input features to analyze the review patterns, track the software engineering process, and mine specific text patterns within given code changes. The classifier and its features are meticulously chosen and optimized using data from the submitted code changes and reported vulnerabilities in Android Open Source Project (AOSP). The evaluation results demonstrate that our Vulnerability Prevention (VP) framework identifies approximately 80% of the vulnerability-inducing code changes in the dataset with a precision ratio of around 98% and a false positive rate of around 1.7%.

We discuss the implications of deploying the VP framework in multi-project settings and future directions for Android security research. This paper explores and validates our approach to code change-granularity vulnerability prediction, offering a preventive technique for software security by preemptively detecting vulnerable code changes before submission.

I. INTRODUCTION

The free and open source software (FOSS) supply chains for the Internet-of-Things devices (e.g., smartphones and TVs) present an attractive, economic target for security attackers (e.g., supply-chain attacks [20][21][28]). It is for instance because they can submit seemingly innocuous code changes containing vulnerabilities without revealing their identities and motives. The submitted vulnerable code changes can then propagate quickly and quietly to the end-user devices.

Targeting specific, widely used open source projects (e.g., OS kernels, libraries, browsers, or media players) can maximize the impact, as those projects typically underpin a vast array of consumer products. The fast software update cycles of those products can quickly take vulnerabilities in the latest patches of their upstream FOSS projects if rigorous security reviews and testing are not implemented before each software update or release. As a result, those vulnerable code changes can remain undetected and thus unfixed, reaching a large number of end-user devices.

From a holistic societal perspective, the overall security testing cost can be optimized by identifying such vulnerable code changes early at pre-submit time, before those changes are submitted to upstream, open source project repositories. Otherwise, the security testing burden is multiplied across all the downstream software projects that depend on any of the upstream projects.

Those downstream projects cannot rely on the first downstream projects to find and fix the merged, upstream vulnerabilities because the timeframe for such fixes and their subsequent upstreaming is unpredictable (e.g., in part due to the internal policies [22]). Thus, it is desirable to prevent vulnerable code submissions in the upstream projects.

A naïve approach of requiring comprehensive security reviews for every code change cause an unrealistic cost for many upstream open source project owners. It is especially true for FOSS projects receiving a high volume of code changes or requiring specialized security expertise for reviews (e.g., specific to the domains). To this end, this paper presents a Vulnerability Prevention (VP) framework that automates vulnerability assessment of code changes using a machine learning (ML) classifier.

The classifier model estimates the likelihood that a given code change contains or induces at least one security vulnerability. Code changes exceeding a threshold mean likely-vulnerable. The model is trained on the historical data generated by using a set of associated analysis tools. The model uses the common features used for software defect prediction as well as four types of novel features that capture:

(1) the patch set complexity,

(2) the code review patterns,

(3) the software development lifecycle phase of each source code file, and

(4) the nature of a code change, as determined by analyzing the edited source code lines. In total, this study comprehensively examines 6 types of classifiers using over 30 types of feature data to optimize the accuracy of the ML model.

To generate the training and test data, we leverage the security bugs discovered and fixed in the Android Open Source Project (AOSP)1 . It specifically targets the AOSP media project2 (i.e., for multimedia data processing) that was extensively fuzz-tested and thus revealed many security defects. A set of specialized tools is designed and developed as part of this study to:

(1) identify vulnerability-fixing change(s) associated with each target security bug, and

(2) backtrack vulnerability-inducing change(s) linked to each of the identified vulnerability-fixing changes. All the identified vulnerability-inducing changes are then manually analyzed and verified before being associated with the respective security bugs. The associated vulnerability-inducing changes are labeled as ‘1’, while all the other code changes submitted to the target media project are labeled as ‘0’ in the dataset.

The N-fold evaluation using the first year of data identifies random forest as the most effective classifier based on its accuracy. The classifier identifies ~60% of the vulnerabilityinducing code changes with a precision of ~85%. It also identifies ~99% of the likely-normal code changes with a precision of ~97% when using all the features for the training and testing.

The VP framework is then used as an online model retrained monthly on data from the previous month. When it is applied to about six years of the vulnerability data3 , the framework demonstrates an approximately 80% recall and an approximately 98% precision for vulnerability-inducing changes, along with a 99.8% recall and a 98.5% precision for likely-normal changes. This accuracy result surpasses the results achieved in the N-fold validation in large part because the online deployment mode can better utilize the underlying temporal localities, casualties, and patterns within the feature data.

In summary, 7.4% of the reviewed and merged code changes are classified as vulnerability-inducing. On average, the number of likely-normal changes requiring additional attention during their code reviews is around 7 per month. This manageable volume (less than 2 code changes per week) justifies the cost, considering the high recall (~80%) and precision (~98%) for identifying vulnerability-inducing changes.

The main contributions of this study include:

  1. We explore and confirm the possibility of code change-granularity vulnerability prediction that can be used to prevent vulnerabilities by flagging likelyvulnerable code changes at pre-submit time.

  2. We present the Vulnerability Prevention (VP) framework that automates online assessment of software vulnerabilities using a machine learning classifier.

  3. We devise novel feature types to improve the classifier accuracy and reduces the feature data set by evaluating the precision and recall metrics.

  4. We present the specialized tools to label code changes in AOSP, facilitating robust training and testing data collection.

  5. We demonstrate a high precision (~98%) and recall (~80%) of the VP framework in identifying vulnerability-inducing changes, showing the potential as a practical tool to reduce security risks.

  6. We discuss the implications of deploying the VP framework in multi-project settings. Our analysis data suggests two focus areas for future Android security research: optimizing the Android vulnerability fixing latency and more efforts to prevent vulnerabilities.

    The rest of this paper is organized as follows. Section II provides the background information. Section III analyzes the design requirements and presents the VP framework design. Section IV details the design of the ML model, including the classifier and features for classifying likelyvulnerable code changes. Section V describes the tools developed to collect vulnerability datasets for model training and testing.

    Section VI describes the data collection process using the tools, and characterizes the vulnerability issues, vulnerability-fixing changes, and vulnerability-inducing changes in an AOSP sub-project. Section VII presents the evaluation of the VP framework using an N-fold validation. Section VIIII extends the framework for real-time, online classification. Section IX discusses the implications and threats to validity. Section IX reviews the related works before concluding this paper in Section X.

II. BACKGROUND

This section outlines the code review and submission process of an open source software project, using AOSP (Android Open Source Project) as a case study. AOSP is chosen, considering its role as an upstream software project with the significant reach, powering more than 3 billion, active enduser products.

==Code Change====.== A code change (simply, change) consists of a set of added, deleted, and/or edited source code lines for source code files in a target source code repository (e.g., git). A typical software engineer sends a code change to a code review service (e.g., Gerrit4 ) for mandatory code reviews prior to submission. A code change is attributed to an author who has an associated email address in AOSP. The change can also have one or more code reviewers. Both the author and reviewers have specific permissions within each project (e.g., project ownership status and review level).

During the code review process, a code change can undergo multiple revisions, resulting in one or more patch sets. Each patch set uploaded to the code review service represents an updated version of the code change. The final, approved patch set of the change can then be submitted and merged into the target source code repository.

==Code Review.== The code change author can revise and resend the change as a new patch set for further review or approval by designated code reviewer(s). The key reviewer permissions include: a score of +1 to indicate the change looks good to the reviewer, a score of +2 to approve the code change, a score of -1 to tell that the change does not look good (e.g., a minor issue), and a score of -2 to block the code change submission.

Projects (e.g., git repositories or subdirectories in a git repository) can have custom permissions and review rules. For example, a custom review rule is to enable authors to mark their code changes ready for presubmit testing because often authors upload non-final versions to the code review service (e.g., to inspect the diffs5 and preliminary feedback).

:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Save  on this impressive Anker Prime Power Bank Save $50 on this impressive Anker Prime Power Bank
Next Article Mophie Launches Four New Charging Stands For iPhone Users – BGR Mophie Launches Four New Charging Stands For iPhone Users – BGR
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

How to Share Instagram Feed Posts & Stories With Close Friends
How to Share Instagram Feed Posts & Stories With Close Friends
Computing
This ,999 MacBook Pro now costs 0
This $1,999 MacBook Pro now costs $440
News
Vaping Is ‘Everywhere’ in Schools—Sparking a Bathroom Surveillance Boom
Vaping Is ‘Everywhere’ in Schools—Sparking a Bathroom Surveillance Boom
Gadget
As Stablecoins Boom, Brazil Considers a New Tax on Crypto Transfers | HackerNoon
As Stablecoins Boom, Brazil Considers a New Tax on Crypto Transfers | HackerNoon
Computing

You Might also Like

How to Share Instagram Feed Posts & Stories With Close Friends
Computing

How to Share Instagram Feed Posts & Stories With Close Friends

6 Min Read
As Stablecoins Boom, Brazil Considers a New Tax on Crypto Transfers | HackerNoon
Computing

As Stablecoins Boom, Brazil Considers a New Tax on Crypto Transfers | HackerNoon

1 Min Read
China fines JD.com payment unit .3 million for compliance failures · TechNode
Computing

China fines JD.com payment unit $1.3 million for compliance failures · TechNode

1 Min Read
Best Time to Post on LinkedIn in 2025 to Maximize Engagement
Computing

Best Time to Post on LinkedIn in 2025 to Maximize Engagement

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?