By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Privacy-Preserving Data Publishing: k-Anonymity, t-Plausibility, and More | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Privacy-Preserving Data Publishing: k-Anonymity, t-Plausibility, and More | HackerNoon
Computing

Privacy-Preserving Data Publishing: k-Anonymity, t-Plausibility, and More | HackerNoon

News Room
Last updated: 2025/04/28 at 9:06 PM
News Room Published 28 April 2025
Share
SHARE

Authors:

(1) Anthi Papadopoulou, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway and Corresponding author ([email protected]);

(2) Pierre Lison, Norwegian Computing Center, Gaustadalleen 23A, 0373 Oslo, Norway;

(3) Mark Anderson, Norwegian Computing Center, Gaustadalleen 23A, 0373 Oslo, Norway;

(4) Lilja Øvrelid, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway;

(5) Ildiko Pilan, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway.

Table of Links

Abstract and 1 Introduction

2 Background

2.1 Definitions

2.2 NLP Approaches

2.3 Privacy-Preserving Data Publishing

2.4 Differential Privacy

3 Datasets and 3.1 Text Anonymization Benchmark (TAB)

3.2 Wikipedia Biographies

4 Privacy-oriented Entity Recognizer

4.1 Wikidata Properties

4.2 Silver Corpus and Model Fine-tuning

4.3 Evaluation

4.4 Label Disagreement

4.5 MISC Semantic Type

5 Privacy Risk Indicators

5.1 LLM Probabilities

5.2 Span Classification

5.3 Perturbations

5.4 Sequence Labelling and 5.5 Web Search

6 Analysis of Privacy Risk Indicators and 6.1 Evaluation Metrics

6.2 Experimental Results and 6.3 Discussion

6.4 Combination of Risk Indicators

7 Conclusions and Future Work

Declarations

References

Appendices

A. Human properties from Wikidata

B. Training parameters of entity recognizer

C. Label Agreement

D. LLM probabilities: base models

E. Training size and performance

F. Perturbation thresholds

2.3 Privacy-Preserving Data Publishing

PPDP approaches to text sanitization rely on a privacy model specifying formal conditions that must be fulfilled to ensure the data can be shared without harm to the privacy of the registered individuals. The most prominent privacy model is k-anonymity (Samarati and Sweeney, 1998), which requires that an individual/entity be indistinguishable from k -1 other individuals/entities. This model was subsequently adapted to text data by approaches such as k- safety (Chakaravarthy et al., 2008) and k-confusability (Cumby and Ghani, 2011).

t-plausibility (Anandan et al., 2012) follows a similar approach, using already detected personal information and ensuring that those are sufficiently generalized to ensure that at least t documents can be mapped to the edited text. Sanchez and Batet (2016) present C-sanitized, which relies on an information-theoretic approach that computes the point-wise mutual information (using co-occurrence counts from web data) between the person or entity to protect and the terms of the document. Terms whose mutual information ends up above a given threshold are then masked.

k-anonymity was also employed in Papadopoulou et al. (2022) in combination with NLP-based approaches, where based on an assumption of an attacker’s knowledge, the optimal set of masking decisions was found to ensure k-anonymity.

Finally, Manzanares-Salor et al. (2022) provided an approach to the evaluation of disclosure risks that relies on training a text classifier to assess the difficulty of inferring the identity of the individual in question based on the sanitized text.

2.4 Differential Privacy

Differential privacy (DP) is a framework for ensuring the privacy of individuals in datasets (Dwork et al., 2006). It essentially operates by producing randomized responses to queries. The level of artificial noise introduced in each response is optimized such as to provide a guarantee that the amount of information that can be learned about any individual remains under a given threshold.

Fernandes et al. (2019) applied DP to text data, in combination with ML techniques by adding noise to the word embeddings of the model. Their work focused on removing stylistic cues from the text as a way to ensure that the author of the text could not be identified by it. Feyisetan et al. (2019) also apply noise to word embeddings in a setting where the geolocation data of an individual is to be protected.

More recently, Sasada et al. (2021) tried to address the issue of the noise needed for DP causing utility loss in the resulting text by creating duplicates first, and then adding noise, thus reducing the amount of noise needed. Krishna et al. (2021) sought to address the same issue using an algorithm based on auto-encoders to transform text without losing data utility. Finally, Igamberdiev and Habernal (2023) introduced DPBART, a DP rewriting system based on pre-trained BART model, and which seeks to reduce the amount of artificial noise needed to reach a given privacy guarantee.

DP-oriented approaches generally lead to complete transformations of the text, at least for reasonable values of the privacy threshold. Those approaches are therefore well suited to the generation of synthetic texts, in particular to collect training data for machine learning models. However, they are difficult to apply to text sanitization, as most text sanitization problems are expected to retain the core content of the text and only edit out the personal identifiers. This is particularly the case for court judgments and medical records, as the sanitized documents should not alter the wording and semantic content conveyed in the text.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Amazon launches its first internet satellites to compete against SpaceX’s Starlinks
Next Article Are Students Cheating When They Use A.I. for Their Schoolwork?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Sonos is offering a refurbished Era 100 for just $119
News
Paid message | Why affordable software for e-signatures is good for business
News
I Love the Look of the Nothing Phone 3
News
The Last Rank We Need? QDyLoRA’s Vision for the Future of LLM Tuning | HackerNoon
Computing

You Might also Like

Computing

The Last Rank We Need? QDyLoRA’s Vision for the Future of LLM Tuning | HackerNoon

10 Min Read
Computing

China’s GAC shares details on EV partnership with Huawei · TechNode

1 Min Read
Computing

Algeria and Cameroon are powering Africa’s next startup wave |

15 Min Read
Computing

Visual Task Management: Tools and Strategies |

22 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?