Microsoft Research Develops Novel Approaches To Enforce Privacy In AI Models

A team of AI researchers at Microsoft introduces two novel approaches for enforcing contextual integrity in large language models: PrivacyChecker, an open-source lightweight module that acts as a privacy shield during inference, and CI-CoT + CI-RL, an advanced training method designed to teach models to reason about privacy.

Contextual integrity defines privacy as the appropriateness of information flows within specific social contexts, that is, disclosing only the information strictly necessary to carry through a given task, such as booking a medical appointment. According to Microsoft’s researchers, today’s LLMs lack this kind of contextual awareness and can potentially disclose sensitive information, thereby undermining user trust.

The first approach focuses on inference-time checks, i.e., safeguards applied when a model generates its response. These checks constitute a protective shield, evaluating the information at multiple stages of an agent’s request lifecycle. The researchers provide a reference implementation of a library, PrivacyChecker, that integrates with the global system prompt as well as with specific tool calls, and can be used as a gate when invoking external MCP tools to prevent sensitive information from being shared with them.

PrivacyChecker follows a relatively simple pipeline. First, it extracts information from the user’s request; next, it classifies it according to a privacy judgement; and, optionally, it injects privacy guidelines into the prompt to ensure the model knows how to handle detected sensitive information.

PrivacyChecker is model-agnostic and can be used with existing models without retraining.

On the static PrivacyLens benchmark, PrivacyChecker was shown to reduce information leakage from 33.06% to 8.32% on GPT4o and from 36.08% to 7.30% on DeepSeekR1, while preserving the system’s ability to complete its assigned task.

The second approach explored by Microsoft’s researchers aims to enhance contextual integrity using chain-of-thought prompting (CI-CoT). While the chain-of-thought technique is typically employed to improve a model’s problem-solving capabilities, the researchers applied it with a twist:

We repurposed CoT to have the model assess contextual information disclosure norms before responding. The prompt directed the model to identify which attributes were necessary to complete the task and which should be withheld.

While CI-CoT proved effective at reducing information leakage on the PrivacyLens benchmark, it also tended to produce more conservative responses, occasionally withholding information that was necessary for the given task. To addressed this issue, Microsoft’s researchers introduced a reinforcement learning stage (CI-RL):

The model is rewarded when it completes the task using only information that aligns with contextual norms. It is penalized when it discloses information that is inappropriate in context. This trains the model to determine not only how to respond but whether specific information should be included.

The combined approach, CI-CoT + CI-RL, was as effective as CI-CoT in reducing leakage while preserving the original model’s performance.

Contextual integrity is a novel concept that has been pioneered at Google DeepMind and Microsoft in the context of LLM research. Originally proposed by Helen Nissenbaum, it defines privacy not as a blanket right to secrecy, but as the “appropriate flow of information in accordance with contextual information norms”.