By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Data-Driven Ranking Reveals Where Privacy Risks Actually Live in Java and JavaScript Code | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Data-Driven Ranking Reveals Where Privacy Risks Actually Live in Java and JavaScript Code | HackerNoon
Computing

Data-Driven Ranking Reveals Where Privacy Risks Actually Live in Java and JavaScript Code | HackerNoon

News Room
Last updated: 2026/01/22 at 7:27 AM
News Room Published 22 January 2026
Share
Data-Driven Ranking Reveals Where Privacy Risks Actually Live in Java and JavaScript Code | HackerNoon
SHARE

Table Of Links

Abstract

1 Introduction

2 Background

3 Privacy-Relevant Methods

4 Identifying API Privacy-relevant Methods

5 Labels for Personal Data Processing

6 Process of Identifying Personal Data

7 Data-based Ranking of Privacy-relevant Methods

8 Application to Privacy Code Review

9 Related Work

Conclusion, Future Work, Acknowledgement And References

Data-based Ranking of Privacy-Relevant Methods

Our data-based ranking is designed to identify and prioritize privacy-relevant methods in Java and JavaScript applications. This ranking process comprises several stages, as depicted in Fig. 3, using

the Java ranking as an example. By analyzing data from real-world applications, we aim to provide a practical guide for identifying methods that are most relevant for privacy concerns.

7.1 Library Selection for Data-based Ranking

To focus our data-based ranking on the most relevant libraries, we selected the top 25 libraries from NPM for JavaScript and Maven for Java, shown below in Table 2. Our selection criteria were based on the libraries’ relevance to personal data processing, as aligned with our set of labels for personal data processing activities. This selection was made through a systematic review of each library’s documentation, specifically targeting functionalities that are related to personal data processing.

7.2 Method Invocation Analysis

We employed static analysis tools to identify method invocations and analyze data flows within the code. For Java, we used Soot [14] to construct call graphs and trace method invocations. In the case of JavaScript, we used ESLint 1 for its capabilities in Abstract Syntax Tree (AST) analysis. Our analysis matched these invocations to our list of native privacy-relevant methods, providing a view of how these methods are used in practice.

7.3 Selecting Open-source Applications

To rank privacy-relevant methods, we selected 30 popular open-source GitHub projects with over 100 stars in Java and JavaScript. We focused on applications processing personal data rather than frameworks and libraries. The selection included 15 Java applications such as the e-commerce software Shopizer, and 15 JavaScript applications like the chat application RocketChat. We also included projects predominantly in Java/JavaScript that use other languages like TypeScript for some modules.

Criteria were: popularity (applications with high stars, indicating broader relevance), data sensitivity (applications processing personal or sensitive data, highly relevant for privacy reviews), diversity (applications from different domains and languages, showing wide applicability), and public availability (open source code enables reproducibility and transparency). The details of these selected projects are provided in Table 4. Table 2. Selected popular libraries: 25 for each language

7.4 Efficient Analysis of Library Imports

To make the analysis efficient, we first identified the libraries imported by each application. For standard libraries, we assumed their presence in most applications. For API libraries, we examined import statements and configuration files to narrow down our focus to the top 50 pre-selected libraries, 25 each for Java and JavaScript.

7.5 Ranking Privacy-relevant Methods in Top 30 Applications

We employed Semgrep to monitor the flow of personal data into privacy-relevant methods invoked by application code. Utilizing Semgrep’s DeepSemgrep 2 capability for cross-file analysis, we were able to comprehensively analyze data flows across entire applications, as opposed to only examining isolated code snippets. This provided a holistic perspective of how personal data propagates across different components. Using Semgrep’s taint analysis and the rules outlined in Section 6, we traced personal data flows to privacy-relevant methods.

To assess the practical relevance of our identified privacy-relevant methods, we introduce the following usage-based metrics, presented in Table 3: We ranked privacy-relevant methods by analyzing their usage in the 30 popular GitHub projects introduced above, with an average of 358 application methods processing personal data per application. This varied by language and type: Java applications averaged 288 methods, while JavaScript had 363. The higher average in JavaScript was likely due to its more diverse front-end processing, reflecting the complexity and multifaceted nature of these applications. Table 3. Usage-Based Metrics for Ranking Privacy-relevant Methods

To better focus our approach, we calculated the proportion of application methods that both invoke a privacy-relevant method and process a concrete flow of personal data (there is confirmed personal data flow into the method). This is relative to the total number of methods in the application. This metric indicates the level of focus in identifying privacy-relevant methods, allowing developers to narrow their efforts to a more relevant subset of the code. In essence, our approach aims to minimize the code sections that need scrutiny, saving both time and resources. For more details on these proportions in selected open-source Java and JavaScript/TypeScript applications, see Table 4.

Table 4. List of 30 selected open-source applications written in Java and JavaScript (JS)/TypeScript(TS), along with their descriptions. And the calculated percentages of application methods that invoke

7.6 Findings

Our study reveals that, on average, only 4.2% of the total codebase is made up of methods that are privacy-relevant and involved in personal data processing. This result highlights the precision of our approach in pinpointing privacy-relevant methods in applications.

Usage Patterns of Privacy-Relevant Methods In Java applications, we observed a more conservative use of privacy-relevant methods, particularly those from popular Maven libraries. Native Java methods, along with methods from Apache Commons and the Spring framework, were frequently used for handling personal data. Libraries such as slf4j for logging and auth0 for authentication were also commonly used, indicating their importance in the flow and protection of personal data.

In contrast, JavaScript applications exhibited a diverse range of library usage. While lodash was commonly used, frameworks like Angular, React, and Vue.js played a significant role in personal data processing, particularly in front-end applications. Table 5 presents the top five packages in both Java and JavaScript that contain methods relevant to privacy concerns.

Table 5. Top 5 packages defining privacy-relevant methods

Categories of Privacy-relevant Methods We categorized privacy-relevant methods into types to gain insights into their roles in personal data processing. Our analysis identified several Java classes and categories that are frequently involved in personal data processing. For example, common Java classes like org.slf4j.Logger and auth0.client.Auth0Client are often used in operations that handle personal data.

In terms of categories, Data Processing and Transformation, Network Communication, and Logging Methods were most prevalent. These categories indicate areas where privacy-relevant methods are most commonly used, suggesting that they are key to understanding how personal data is processed in codebases (Table 6). Identity and Access Management, Data Encryption and Cryptography, and Data Storage and Database Management were also highly involved in personal data flows, with involvement percentages of 92%, 78%, and 85%, respectively.

Conversely, categories like Data Processing and Transformation, Network Communication, and Logging Methods were less involved, with percentages of 67%, 44%, and 28%. Table 7 lists Java classes that are frequently involved in personal data processing, serving as key indicators for identifying privacy-relevant methods in applications.

:::info
Authors:

  1. Feiyang Tang
  2. Bjarte M. Østvold

:::

:::info
This paper is available on arxiv under CC BY-NC-SA 4.0 license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Upgrade to Microsoft Visual Studio Pro 2026 for Just .99 Upgrade to Microsoft Visual Studio Pro 2026 for Just $49.99
Next Article Volvo’s Electric EX60 SUV Has a 400-Mile Range—and Rethinks the Humble Seat Belt Volvo’s Electric EX60 SUV Has a 400-Mile Range—and Rethinks the Humble Seat Belt
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Grok AI generated about 3m sexualised images this month, research says
Grok AI generated about 3m sexualised images this month, research says
News
AMD Announces Ryzen 7 9850X3D Pricing Of 9 USD
AMD Announces Ryzen 7 9850X3D Pricing Of $499 USD
Computing
Optalysys readies for US expansion with £23m funding injection – UKTN
Optalysys readies for US expansion with £23m funding injection – UKTN
News
Lenovo wants other companies to make accessories for its modular laptops.
Lenovo wants other companies to make accessories for its modular laptops.
News

You Might also Like

AMD Announces Ryzen 7 9850X3D Pricing Of 9 USD
Computing

AMD Announces Ryzen 7 9850X3D Pricing Of $499 USD

1 Min Read
Flutterwave taps Turnkey to launch stablecoin wallets for merchants
Computing

Flutterwave taps Turnkey to launch stablecoin wallets for merchants

3 Min Read
Essential Cybersecurity Measures Every Modern Business Should Take | HackerNoon
Computing

Essential Cybersecurity Measures Every Modern Business Should Take | HackerNoon

7 Min Read
ThreatsDay Bulletin: Pixel Zero-Click, Redis RCE, China C2s, RAT Ads, Crypto Scams & 15+ Stories
Computing

ThreatsDay Bulletin: Pixel Zero-Click, Redis RCE, China C2s, RAT Ads, Crypto Scams & 15+ Stories

23 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?