By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: How Static and Hybrid Analysis Can Cut Privacy Review Effort by 95% | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > How Static and Hybrid Analysis Can Cut Privacy Review Effort by 95% | HackerNoon
Computing

How Static and Hybrid Analysis Can Cut Privacy Review Effort by 95% | HackerNoon

News Room
Last updated: 2026/01/22 at 2:22 PM
News Room Published 22 January 2026
Share
How Static and Hybrid Analysis Can Cut Privacy Review Effort by 95% | HackerNoon
SHARE

Table Of Links

Abstract

1 Introduction

2 Background

3 Privacy-Relevant Methods

4 Identifying API Privacy-relevant Methods

5 Labels for Personal Data Processing

6 Process of Identifying Personal Data

7 Data-based Ranking of Privacy-relevant Methods

8 Application to Privacy Code Review

9 Related Work

Conclusion, Future Work, Acknowledgement And References

Related Work

Research in source code analysis for privacy is extensive, yet specific approaches for identifying personal data processing are limited. Ullah et al. [13] introduced an approach for extracting control and data dependencies in source code, potentially applicable for locating personal data processing methods, but not directly designed for this purpose. Hjerppe et al. [2] proposed an annotationbased static analysis for data protection, but its effectiveness is contingent on accurate developer annotations, a challenge in large projects.

Dynamic analysis has been explored for sensitive data flow detection, with DAISY [15] focusing on Android apps and ConDySTA [16] combining dynamic taint analysis with static analysis. However, these methods have limitations, such as platform specificity or the need for executing projects. Automated assistance in code review has been explored by Li et al. [3] with their pre-trained model CodeReviewer, but it lacks a focus on personal data processing.

SWANAssist [5] offers a semi-automated approach for identifying security-relevant Java code methods, which could potentially be adapted for privacy purposes. Other studies, like [1, 12], attempt to align GDPR compliance with static analysis. Novikova et al. [4] provided insights into privacy-enhancing technologies but did not focus on personal data processing in source code.

These studies mark great progress in source code analysis, yet a gap exists in automated identification and categorization of personal data processing. Our work addresses this by proposing an automated approach for identifying personal data processing in real-world applications, enhancing efficiency in privacy code reviews.

Conclusion

In conclusion, our study introduces a method for identifying and categorizing privacy-relevant methods in source code, focusing on personal data processing. We have successfully narrowed the analysis scope to just 4.2% of methods across 100 popular open-source applications, offering a practical starting point for developers, data protection officers, and reviewers.

This approach not only simplifies code reviews but also facilitates compliance with data protection regulations like GDPR, helping organizations align their software development with legal requirements. For future work, we aim to enhance the precision of our privacy-relevant method identification algorithms, possibly integrating machine learning for more accurate predictions of personal data processing activities.

Expanding our approach to additional programming languages and integrating it into common development tools for real-time feedback are also key goals. These advancements will broaden the impact and applicability of our approach. Ultimately, our research paves the way for more focused and efficient privacy assessments in software development, contributing to the creation of software that is efficient, robust, and respectful of user privacy.

Acknowledgement

This work is part of the Privacy Matters (PriMa) project. The PriMa project has received funding from European Union’s Horizon 2020 research and innovation program under the Marie Sk lodowskaCurie grant agreement No. 860315.

References

  1. Ferrara, P., Olivieri, L., Spoto, F.: Tailoring taint analysis to GDPR. In: Privacy Technologies and Policy: 6th Annual Privacy Forum, APF 2018, Barcelona, Spain, June 13-14, 2018, Revised Selected Papers 6. pp. 63–76. Springer (2018)
  2. Hjerppe, K., Ruohonen, J., Lepp¨anen, V.: Annotation-based static analysis for personal data protection. In: Privacy and Identity Management. Data for Better Living: AI and Privacy, pp. 343–358. Springer International Publishing (2020)
  3. Li, Z., Lu, S., Guo, D., Duan, N., Jannu, S., Jenks, G., Majumder, D., Green, J., Svyatkovskiy, A., Fu, S., Sundaresan, N.: Automating code review activities by large-scale pre-training (2022)
  4. Novikova, E., Fomichov, D., Kholod, I., Filippov, E.: Analysis of privacy-enhancing technologies in open-source federated learning frameworks for driver activity recognition. Sensors 22(8), 2983 (2022)
  5. Piskachev, G., Do, L.N.Q., Johnson, O., Bodden, E.: SWANAssist: Semi-Automated Detection of Code-Specific, Security-Relevant Methods. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. p. 1094–1097. ASE’19, IEEE Press (2020). https://doi.org/10.1109/ASE.2019.00110
  6. van der Plas, N.: Detecting PII in Git commits (2022), http://resolver.tudelft.nl/uuid: fe195c17-ecf5-4811-a987-89f238a6802f
  7. Ren, J., Rao, A., Lindorfer, M., Legout, A., Choffnes, D.: ReCon: Revealing and Controlling PII Leaks in Mobile Network Traffic. In: Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. p. 361–374. MobiSys ’16, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2906388.2906392
  8. Tang, F., Østvold, B.M.: Assessing Software Privacy Using the Privacy Flow-Graph. In: Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security. p. 7–15. MSR4P&S 2022, Association for Computing Machinery, New York, NY, USA (2022)
  9. Tang., F., Østvold., B., Bruntink., M.: Identifying Personal Data Processing for Code Review. In: Proceedings of the 9th International Conference on Information Systems Security and Privacy – ICISSP. pp. 568–575. INSTICC, SciTePress (2023). https://doi.org/10.5220/0011725700003405
  10. Tang, F., Østvold, B.M., Bruntink, M.: Helping Code Reviewer Prioritize: Pinpointing Personal Data and Its Processing. IOS Press (Sep 2023). https://doi.org/10.3233/faia230228
  11. Thongtanunam, P., Hassan, A.E.: Review dynamics and their impact on software quality. IEEE Transactions on Software Engineering 47(12), 2698–2712 (2020)
  12. Tokas, S., Owe, O., Ramezanifarkhani, T.: Static checking of GDPR-related privacy compliance for object-oriented distributed systems. Journal of Logical and Algebraic Methods in Programming 125, 100733 (2022)
  13. Ullah, F., Wang, J., Jabbar, S., Al-Turjman, F., Alazab, M.: Source code authorship attribution using hybrid approach of program dependence graph and deep learning model. IEEE Access 7, 141987– 141999 (2019)
  14. Vall´ee-Rai, R., Co, P., Gagnon, E., Hendren, L., Lam, P., Sundaresan, V.: Soot: A java bytecode optimization framework. In: CASCON First Decade High Impact Papers, pp. 214–224 (2010)
  15. Zhang, X., Heaps, J., Slavin, R., Niu, J., Breaux, T., Wang, X.: DAISY: Dynamic-Analysis-Induced Source Discovery for Sensitive Data. ACM Trans. Softw. Eng. Methodol. 32(4) (May 2023)
  16. Zhang, X., Wang, X., Slavin, R., Niu, J.: ConDySTA: Context-Aware Dynamic Supplement to Static Taint Analysis. In: 2021 IEEE Symposium on Security and Privacy (SP). pp. 796–812 (2021). https://doi.org/10.1109/SP40001.2021.00040

:::info
Authors:

  1. Feiyang Tang
  2. Bjarte M. Østvold

:::

:::info
This paper is available on arxiv under CC BY-NC-SA 4.0 license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Everything we know about Samsung Galaxy S26: Leaks, specs, prices Everything we know about Samsung Galaxy S26: Leaks, specs, prices
Next Article What to Expect at Samsung’s Galaxy S26 Unpacked Event What to Expect at Samsung’s Galaxy S26 Unpacked Event
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

AT&T ‘Turbo Live’ Offers Priority Service at Crowded, Maxed-Out Venues
AT&T ‘Turbo Live’ Offers Priority Service at Crowded, Maxed-Out Venues
News
ASUS Chromebook CZ for education
ASUS Chromebook CZ for education
Mobile
HuskWgss29%,buynFnsSn2020,ysnkFns
News
He-Man is back in live-action. See ‘Masters of the Universe’ trailer.
He-Man is back in live-action. See ‘Masters of the Universe’ trailer.
Software

You Might also Like

Seattle startup Overland AI partners with CAL FIRE to use self-driving 4-wheelers for wildfire response
Computing

Seattle startup Overland AI partners with CAL FIRE to use self-driving 4-wheelers for wildfire response

3 Min Read
Linux GPU Driver Loophole Being Fixed For Unprivileged Users Being Able To Tap Unbounded Kernel Memory
Computing

Linux GPU Driver Loophole Being Fixed For Unprivileged Users Being Able To Tap Unbounded Kernel Memory

1 Min Read
Agentic AI Is Forcing Organizations to Rethink How Work Is Designed | HackerNoon
Computing

Agentic AI Is Forcing Organizations to Rethink How Work Is Designed | HackerNoon

0 Min Read
What 100 GitHub Projects Reveal About Personal Data in Modern Software | HackerNoon
Computing

What 100 GitHub Projects Reveal About Personal Data in Modern Software | HackerNoon

7 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?