UnPII: PII Removal & Privacy Preservation

Updated 12 January 2026

UnPII is a comprehensive framework for selective forgetting of personally identifiable information through advanced token-level sequence labeling, risk quantification, and model editing.
It integrates regulatory mandates like GDPR, CCPA, and PIPL with formal operations that decompose, remove, and replace sensitive identifiers across various modalities.
Practical implementations on LLMs and multimodal systems demonstrate improved accuracy, utility, and scalability in PII redaction with minimal performance overhead.

UnPII denotes a class of frameworks, algorithms, and data processing methodologies for the identification, removal, or selective forgetting of personally identifiable information (PII) across diverse modalities and systems, with a particular focus on compliance with privacy regulations, risk-driven prioritization, and preservation of downstream utility. It encompasses technically rigorous measures grounded in privacy risk quantification, advanced token-level sequence labeling, domain-adaptive learning, gradient-based model editing, and formal information-theoretic transformations. The UnPII paradigm is instantiated in both pre- and post-training contexts for LLMs, multimodal architectures, and structured databases, operationalizing de-identification at scale.

1. Problem Motivation and Regulatory Backdrop

UnPII arises in response to the regulatory imperative for selective forgetting and data minimization encoded in statutes such as the European Union's General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and China's Personal Information Protection Law (PIPL), which mandate data controllers to erase PII upon request and to limit exposure during model deployment (Jeon et al., 5 Jan 2026). LLMs trained on massive, heterogeneous corpora often internalize sensitive identifiers (names, SSNs, addresses, account numbers), creating risk vectors for unauthorized disclosure via inference-time memorization. Traditional database deletion is insufficient; effective compliance demands model-level unlearning or dynamic redaction at or before inference.

2. Formalization of PII, NII, and UnPII Operations

The fundamental distinction between PII and Non-Identifiable Information (NII) is rendered via information-theoretic infons (0909.4196). Let $\mathrm{INF}$ denote the universe of information pieces; PII is defined as $\mathrm{PII} \subset \mathrm{INF}$ , where each pinfon references at least one unique individual. NII comprises $\mathrm{NII} = \mathrm{INF} \setminus \mathrm{PII}$ . The transformation "UnPII" consists of atomic splitting (decomposition of compound PII into owner-centric atoms) and identifier projection (removal of all mapping elements to real-world identities), implemented as $UnPII(\sigma) = (\sigma \text{ split to atoms}) \to (\text{strip IDs}) \to \text{NII multiset}$ .

A dedicated PII database ("PIIDB") maintains proprietor spheres, system-management policies, and physical separation of pinfons, deploying UnPII for automated anonymization prior to query serving or data export, thereby ensuring that all non-internal outputs consist solely of NII (0909.4196).

3. Quantitative Risk-Driven Unlearning: The PII Risk Index

Recent work establishes the need to triage PII remediation via the PII risk index (PRI), a composite metric over seven normalized dimensions—identifiability, sensitivity, usability, linkability, permanency, exposability, and compliancy—with practitioner-defined weights $w_j$ satisfying $\sum_{j=1}^7 w_j=1$ (Jeon et al., 5 Jan 2026). For a forget batch of attributes $a_{ij}$ , the aggregate risk $r$ is:

$r = \lambda k l + \sum_{i=1}^l \prod_{j=1}^k (w_j a_{ij}),$

with PRI given by $R = \tanh(r) \in (0, 1)$ . Risk-weighted scaling is then applied to the loss function for each sample in gradient-based unlearning algorithms, yielding:

$\mathcal{L}_{\mathrm{UnPII}}(x_f, y_f) = (1 + R)\, \mathcal{L}_{\text{base}}(x_f, y_f)$

where $\mathcal{L}_{\text{base}}$ is prescribed by methods such as Gradient Ascent, Negative Preference Optimization (NPO), or Direct Preference Optimization (DPO).

In experimental scenarios (1–10% target PII removal in LoRA-fine-tuned LLaMA 2 7B), UnPII risk-driven unlearning achieves gains—accuracy by up to 11.8%, utility by up to 6.3%, generalizability by up to 12.4%—with a modest average fine-tuning overhead of 27.5% (Jeon et al., 5 Jan 2026).

4. Model-Level Eradication of PII: Proactive Privacy Amnesia

The Proactive Privacy Amnesia (PPA) protocol introduces a two-phase, token-sensitive mechanism for excising PII traces from LLMs (Kuo et al., 24 Feb 2025). Phase 1 locates the most "memorable" token (highest gradient in cross-entropy loss) within each PII sequence; gradient ascent is executed to erase predictive certainty for this pivotal token. Phase 2 implants a synthetic but valid substitute of matching format, restoring the model's utility by minimizing standard next-token log-loss on the replacement sequence.

Evaluated on LLaMA2-7b/LLaMA3-8b, the approach eliminates extracted phone numbers ( $\mathrm{Risk}_{\rm phone} = 0.0$ ) and reduces address leakage (down to $7.3$ versus $59.4$ baseline), with negligible increase in perplexity (from $16.2$ to $16.0$) (Kuo et al., 24 Feb 2025). Adaptively tuning the number of erased tokens allows direct control over the privacy–utility trade-off, and the minimal update approach circumvents full-model retraining.

5. Token-Level Detection and Redaction in Textual Data

UnPII pipelines in textual domains deploy advanced NER models, such as transformer–GCN hybrids and fine-tuned lightweight LLMs, for direct PII entity extraction (Liu et al., 2021, Shen et al., 14 Jan 2025). The DTL-PIIE framework employs deep transfer learning, leveraging multi-domain pretraining with domain-adaptive graph convolution over dependency parses, achieving strong results on noisy and scarce-labeled social media corpora (F₁=71.1%). GPT-4o-mini, either via prompting or fine-tuning with delimiter-based span tagging, attains recall 0.9589 and precision 3× higher than Azure AI Language at one-sixth cost on educational datasets (Shen et al., 14 Jan 2025).

The "Hidden-in-Plain-Sight" (HIPS) paradigm mandates replacement of detected PII spans with contextually plausible surrogates (not blanking), maximizing data utility for analytics while guaranteeing compliance. Bias analysis demonstrates robust generalization and uniform recall across gender and cultural cohorts, unlike rule-based systems.

6. Selective Masking and Query-Aware Redaction

UnPII is further instantiated in query-aware masking strategies, as formalized in PII-Bench, which defines the joint objective to mask only PII deemed irrelevant to the user's query, based on fine-grained entity relevance mapping (Shen et al., 25 Feb 2025). Each prompt $p=(d,q)$ is dissected to extract all entities $\mathcal{E}$ , partitioned into $\mathcal{E}_q$ (query-relevant) and $\mathcal{E} \setminus \mathcal{E}_q$ (irrelevant). Only the latter are masked with type-placeholders, preserving informative spans necessary for accurate task completion.

Empirical assessment with LLMs (GPT-4o, Claude-3.5, DeepSeek-V3, Llama-3.1) reveals high accuracy for base PII detection ( $F_1 \approx 0.89$ ), but query-relevance mapping lags (GPT-4o naive $F_1 \approx 0.63$ , oracle $F_1 \approx 0.84$ ). Chain-of-thought prompting boosts masking fidelity, yet complex, multi-subject contexts remain challenging. Failure modes include subject disambiguation errors and type misclassification, prompting future multi-task and domain-specific architecture enhancements.

7. Unimodal Identity Inference and Multimodal Risks

In multimodal setups, notably CLIP-like architectures, UnPII covers purely textual identity inference attacks (TUNI) (Li et al., 2024). By converting identity detection to anomaly classification in a CLIP-feature domain—extracted via CLIP-guided image optimization for textual queries—TUNI eliminates the need for image exposure and costly shadow model training. Feature vector $f(t) = (S, D)$ is classified via ensemble anomaly detectors and, optionally, enriched by internal photo-based clustering.

TUNI achieves precision up to $0.98$ and recall $0.982$ (ResNet-50 CLIP, 1 photo/person regime) with text-only queries, detecting risky memberships without secondary privacy leaks or significant compute overhead. Integration into CLIP-based APIs entails real-time query screening and adaptive threshold tuning, with direct defense implications for real-world deployment.

UnPII constitutes a technically grounded, multifaceted regime for privacy-preserving PII removal, balancing strict regulatory compliance with model/data utility, enabling scalable de-identification in textual, multimodal, and database settings. It leverages principled risk modeling, minimally invasive model editing, advanced sequence labeling, and context-adaptive masking strategies, while ongoing research addresses task generalization, system interpretability, and cross-modal extension.