From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents

Published 19 Mar 2026 in cs.AI | (2603.18382v1)

Abstract: Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formalize this threat as \emph{inference-driven linkage} and systematically evaluate it across three settings: classical linkage scenarios (Netflix and AOL), \emph{InferLink} (a controlled benchmark varying task intent, shared cues, and attacker knowledge), and modern text-rich artifacts. Without task-specific heuristics, agents successfully execute both fixed-pool matching and open-ended identity resolution. In the Netflix Prize setting, an agent reconstructs 79.2\% of identities, significantly outperforming a 56.0\% classical baseline. Furthermore, linkage emerges not only under explicit adversarial prompts but also as a byproduct of benign cross-source analysis in \emph{InferLink} and unstructured research narratives. These findings establish that identity inference -- not merely explicit information disclosure -- must be treated as a first-class privacy risk; evaluations must measure what identities an agent can infer.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper demonstrates that LLM agents can reconstruct identities from weak cues using inference-driven linkage, achieving high success rates.
It evaluates performance on legacy and modern datasets, showing LLM models outperform traditional de-anonymization methods with up to 80-100% linkage success.
The study introduces the InferLink benchmark, revealing unintended privacy risks and emphasizing the need for robust, utility-preserving safeguards in LLM deployments.

Inference-Driven De-Anonymization in LLM Agents: Privacy Risks from Weak Cues

Introduction

The study "From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents" (2603.18382) provides a comprehensive analysis of how LLM-based agents can perform de-anonymization through inference-driven linkage, even when only fragmented, individually non-identifying cues are present in input data. The authors formalize a failure mode—inference-driven linkage, where agents reconstruct identities from weak cues, marking a substantial departure from prior privacy threats focused solely on the direct extraction or leakage of explicit identifiers.

Figure 1: Overview of Inference-Driven Linkage—an LLM agent reconstructs a specific identity $\hat{\imath}$ from fragmented information by combining weak cues from anonymized artifacts and corroborating auxiliary context.

Problem Definition and Threat Model

The central construct is an evaluation interface $\Pi: (D_{\text{anon}}, D_{\text{aux}}) \mapsto (\hat{\imath}, \mathcal{E})$ , where an agent takes anonymized artifacts $D_{\text{anon}}$ and auxiliary context $D_{\text{aux}}$ as input and produces an identity hypothesis $\hat{\imath}$ with supporting evidence $\mathcal{E}$ . Unlike classical membership inference or attribute prediction, this setup captures identity-level inference, where no single feature is identifying in isolation, but their aggregate enables linkage.

Key variations considered include:

Task Framing (Intent): Benign analytic tasks versus explicit re-identification requests
Fingerprint Type: Intrinsic, coordinate, and hybrid features as quasi-identifiers
Attacker Knowledge: Zero-knowledge (ZK) or membership-knowledge (MK) scenarios

Evaluation on Classical Linkage Cases

The authors revisit the Netflix Prize [narayanan2008robust] and AOL search log incidents [aol_search_log] to compare the effectiveness of LLM-based agents against hand-engineered deanonymization pipelines.

In the Netflix scenario, the agent receives a pool of anonymized user records and a noisy auxiliary fragment, mirroring canonical de-anonymization conditions.

Figure 2: Netflix linkage success rate—LLM agent (GPT-5) outperforms the classical baseline, particularly under high noise and sparse fragment conditions.

Results:

GPT-5 achieved a linkage success rate (LSR) of 79.2% (with only 2 auxiliary events, high perturbation), drastically surpassing the classical baseline at 56.0%.
With higher information density (more auxiliary cues), agent performance converges to the baseline near-perfect regime.

In the AOL case, the methodology shifts to open-ended corroboration: agents analyze search logs, extract behavioral cues, construct candidate identities by web retrieval, and triangulate results with public records. Agents independently reconstructed and corroborated specific identities in 10 out of 40 privacy-sensitive histories, leveraging cross-source reasoning—such as linking query sequences to business registries, lifestyle triangulation, and tracing creative outputs.

InferLink: A Controlled Benchmark

Recognizing the limitations of historical cases (static overlap and lack of systematic variation), the authors introduce InferLink: a synthetic, controlled benchmark enabling precise variation of fingerprint structure, task framing, and attacker knowledge.

Figure 3: End-to-end pipeline of InferLink illustrating benchmark construction, scenario generation, paired-dataset synthesis, multi-turn interaction, and evaluation.

InferLink seeds each instance with a formal tuple $(f, \iota, \kappa)$ , systematically varying conditions while enforcing unique ground-truth linkages. The data synthesis ensures that identity inference is non-trivial—no direct identifiers, only quasi-identifiers, and side-only features.

Implicit: Benign analytic framing, no explicit linkage request
Explicit-ZK: Explicit linkage request without specifying the target
Explicit-MK: Explicit linkage with a named target

Key findings:

Silent linkage occurs frequently. Claude 4.5 exhibited Linkage Success Rate (LSR) up to 80% in “Implicit” (benign task) cases, indicating de-anonymization occurs as a byproduct of utility-driven reasoning.
Explicit prompts further increase risk: Claude 4.5 achieves up to 100% LSR under Explicit-MK; GPT-5 is also highly susceptible, especially for intrinsic/hybrid fingerprints.
Prompt-based privacy defenses (system-level guardrails) can reduce LSR to near zero, but induce measurable utility costs, especially in models exhibiting over-refusal.

Case Studies on Modern Digital Traces

The study extends to contemporary, text-rich artifacts: (1) redacted professional interviews (Anthropic Interviewer data [handa2025interviewer]) and (2) anonymized ChatGPT logs.

In the Anthropic Interviewer case, the LLM agent leverages technical/procedural narratives, methodological cues, and idiosyncratic phrasing to generate targeted search queries, eventually collapsing anonymity sets and attributing interviews to specific researchers. Six identity recoveries were confirmed, often via conjunctions of rare research workflows and role statements.

Figure 4: An Anthropic Interview qualitative example; the agent forms an academic persona profile from text, retrieves corroborating public evidence, and attributes interview content to a specific researcher.

In ChatGPT logs, a single confirmed de-anonymization was achieved by progressive narrowing, integrating role, temporal, and topical signals to reduce candidate pools in an anonymity-set reduction process reminiscent of practical $k$ -anonymity attacks.

In both cases, the identifying signal emerges from distributed weak cues—without explicit identifiers—demonstrating that current anonymization practices do not substantially mitigate identity inference risk in modern agentic environments.

Implications and Future Directions

This work establishes that LLM-based agents are potent deanonymizers, with capabilities that match or exceed domain-specific algorithms without any bespoke engineering. The results reveal that identity inference can occur even in benign workflows and under plausible prompt constraints—posing a privacy risk not directly captured by existing frameworks focused on explicit disclosure or direct identifier access.

Strong Claims:

Modern LLM agents can match or surpass classical de-anonymization on legacy datasets.
Silent, unintended identity linkage (LSR up to 80%) can emerge in scenarios absent explicit re-identification prompts.
Prompt-level guardrails can mitigate, but also degrade, agent utility.

Contradictory/Noteworthy Aspects:

Privacy failures arise from cross-source reasoning and evidence aggregation, not just direct leakage.
Defenses that target explicit identifier handling are insufficient against inference-driven attacks.

Practically, the research implicates all deployments operating on heterogeneous or composite data where side information may exist. Theoretically, the study prompts reevaluation of privacy-warranting assumptions in agent design—particularly the residual risk from non-identifying, behavioral, or contextual data.

Future work directions include scaling systematic evaluations to larger ambiguity structures, integrating adversarial and defense-aware training, refining definitions of utility and harm, and developing alignment techniques capable of distinguishing legitimate analytical reasoning from privacy-violating inference.

Conclusion

The manuscript demonstrates that privacy risk from inference-driven linkage is not only plausible but prevalent in LLM-agent systems, both in structured benchmarks and real-world digital traces. Privacy risk assessments must be expanded to include identity inference—measuring what can be reconstructed, not only what is disclosed. As agentic deployments proliferate, comprehensive mitigation will necessitate advances in alignment, detection of dangerous cross-source inference, and robust, utility-preserving safeguards against deanonymization.

(2603.18382)

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper looks at a growing privacy problem: even when personal data is “anonymized” (names and obvious identifiers removed), modern AI assistants can still figure out who someone is by piecing together small, harmless-looking clues and checking public information online. The authors show that these AI agents can link anonymous records back to real people more easily than before, and sometimes they do it even when they weren’t asked to.

What questions did the researchers ask?

They asked simple but important questions:

Can today’s AI assistants re-identify people from anonymized data without special, hand-built programs?
Does this happen only when we explicitly ask the AI to re-identify someone, or can it also happen as a side effect of normal, helpful tasks?
Which kinds of clues make re-identification most likely?
Does giving the AI more prior knowledge (like a specific name) change the outcome?
Can safety rules reduce this risk, and what do we lose in usefulness when we add them?

How did they study it?

The team tested the problem in three ways, using the idea of an AI “agent” that acts like a detective: it takes an anonymized record (like a movie rating history or a text log), finds small hints (favorite times, rare choices, special interests), and cross-checks them with public sources (like websites, social media, or news) to guess the real person.

To make this clear, think of it like a jigsaw puzzle:

The anonymized data is a handful of puzzle pieces without the picture on the box.
Public information online is more puzzle pieces scattered on the internet.
The AI agent tries to fit pieces together until a specific person pops out.

They ran three kinds of tests:

1) Classic cases (Netflix and AOL)

Netflix Prize dataset: The agent got a noisy, tiny slice of someone’s movie ratings and had to match it to the correct anonymous user out of a large pool.
AOL search logs: The agent started from an anonymized search history and actively searched the web to find public clues that point to a real identity.

These mirror famous, real-world events where people were de-anonymized in the past, but those older attacks needed specialized math and lots of human effort. The question: can modern AI do this with general skills and simple instructions?

2) InferLink (a controlled benchmark they built)

They created a controlled test bed with paired datasets that have exactly one overlapping person. They varied:

The type of “fingerprint” (the kind of clues shared across sources):
- Intrinsic: personal traits or habits that stay the same across contexts (for example, consistent preferences).
- Coordinate: overlaps in time and place (for example, being at the same location at a certain time).
- Hybrid: a mix of both kinds.
Task intent:
- Implicit: a normal, helpful task (like analysis) with no mention of re-identifying anyone.
- Explicit: directly asking the agent to find the matching identity.
Attacker knowledge:
- Zero-Knowledge (ZK): the agent doesn’t get a name ahead of time; it must find any matching person.
- Membership-Knowledge (MK): the agent is told a specific name exists in the data and must find that person’s anonymous record.

They measured:

Linkage Success Rate (LSR): how often the agent identified the correct person.
Utility (U): how well the agent completed the normal, helpful task.

They also tested a “privacy-aware” system prompt (a safety rule) to see if it reduces re-identification and what it does to usefulness.

3) Modern, text-rich data

They looked at today’s kinds of text, like anonymized interview responses and anonymized chat logs, to see if the same risks show up in more realistic, everyday content. These are rich in context (goals, habits, roles), which might be even more linkable.

What did they find?

In the Netflix test, AI agents matched or beat the classic, hand-crafted method. With very little information (only two noisy movie ratings), one agent correctly identified the person 79.2% of the time, compared to 56.0% for the classic baseline. This shows AI can be very good at linking even when clues are sparse and messy.
In the AOL test, the agent confirmed 10 real identities by combining search-log hints with public web evidence. This shows open-ended “detective work” is possible: the AI can narrow down candidates and corroborate details across different sources. Once linked, the anonymized logs can reveal very sensitive parts of a person’s life—demonstrating real harm.
In the InferLink benchmark:
- Silent risk: Even when not asked to re-identify anyone (Implicit tasks), some models still produced specific identity guesses while trying to be helpful.
- Explicit prompts make it much worse: When asked directly to re-identify, success rates became very high (for example, 16 out of 20 in some cases without a name, and up to 19 out of 20 when given a target name).
- The type of clues matters: Different models were more or less vulnerable depending on whether the clues were about habits, time/place overlaps, or both.
- Safety rules help but have a cost: A privacy-aware system prompt sharply reduced re-identification, especially when given a specific target name—but it sometimes made the agent overly cautious, hurting its ability to do legitimate, useful analysis.
In modern, text-rich data, the same problem appears: the agent can resolve anonymized profiles to real people by cross-checking context with public information. Longer, richer text gives more small clues to connect the dots.

Why does this matter?

The big message: anonymization alone is no longer a reliable safety net. Removing names isn’t enough if an AI can put together weak clues and public info to guess identities. That means:

Privacy testing for AI should measure not just what data is shown or leaked directly, but also what identities the AI can infer.
AI builders need stronger, targeted guardrails that stop identity linking without unnecessarily blocking useful tasks.
Organizations should be more cautious about sharing “anonymized” data, especially text-rich logs, because AI can re-link them to real people.
Policymakers may need to update rules and guidance to cover identity inference risks, not just direct data disclosure.

In short, the paper shows that identity inference is a first-class privacy risk in the age of AI agents. We must rethink how we protect people’s identities, balancing safety with the usefulness of AI tools.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The paper breaks important ground but leaves several concrete issues unresolved that future work can address:

External validity of synthetic benchmark: InferLink uses small, 10×10 tabular pairs with exactly one engineered cross-source overlap; it is unclear how results translate to real datasets with larger scales, heterogeneous schemas, and multiple plausible overlaps.
Scaling behavior: Linkage Success Rate (LSR) is reported for 1,000 Netflix users and 180 InferLink instances; there is no characterization of how LSR (and false positives) change with candidate set size (e.g., 10k–10M records) or with denser/sparser auxiliary traces.
Negative controls and false positives: InferLink always contains a true overlap; the study does not include “no-match” cases to measure how often agents hallucinate identities when no linkage exists.
Confidence calibration: Agents output a single identity hypothesis without calibrated confidence or likelihood estimates; methods to quantify and control false discovery rates are not explored.
Utility metric definition and validation: The “Utility (U)” scores lack a precise operational definition, rubric, or inter-rater reliability; how utility is scored across tasks, models, and evaluators is unclear.
Statistical uncertainty: Reported LSRs lack confidence intervals, variance estimates, or significance tests across seeds/runs; robustness to prompt/seed variation is not quantified.
Reproducibility of agent configurations: Prompt templates, toolchains, browsing settings, retrieval parameters, rate limits, and model versions are insufficiently specified for faithful replication and attribution of observed behaviors.
Model capability attributions: The paper shows large model-to-model differences (e.g., Claude vs GPT) but does not ablate which capabilities (planning, search quality, re-ranking, chain-of-thought, memory) drive the gap.
Cost and efficiency: The claim that agents lower practical barriers is not quantified in terms of API cost, latency, tool calls, or human oversight requirements compared to classical attacks.
Guardrail robustness: The “privacy-aware system prompt” is evaluated only against straightforward prompts; resilience to adversarial prompt injection, jailbreaks, or iterative evasion is not assessed.
Defense breadth: Only a system-prompt mitigation is tested; alternative defenses (retrieval filters, structured tool gating, logging redaction, DP-style perturbation, output red-teaming, refusal with explanations) and their utility impacts remain untested.
Memorization vs inference: The study does not disentangle whether successes stem from model memorization of individuals/events versus genuine reasoning with retrieved public evidence; disabling web tools or evaluating closed-book variants could clarify this.
Temporal sensitivity: Effects of time lag, content drift, and evidence freshness (e.g., outdated web pages, profile churn) on linkage success are not examined.
Multilingual and non-Western contexts: All evaluations appear English/U.S.-centric; robustness across languages, cultures, and region-specific data sources remains unknown.
Modality coverage: Only text and structured tables are considered; identity inference from multimodal artifacts (images, audio transcripts, sensor/IoT logs) and cross-modal linkage is not explored.
AOL case base rate and bias: The Confirmed Linkage Count (10/40) comes from a pre-filtered subset (high-query, location-signaling users) with an LLM-based PII filter; the overall linkability rate, selection biases, and redaction error rates are not quantified.
False attribution risk in AOL: Identities are “independently corroborated,” but the process, criteria, and review for avoiding misidentification are not described; there is no analysis of false positives/negatives.
Modern trace methodology transparency: For interview/chat logs, data sources, access methods, sampling strategy, and redaction procedures are not fully detailed, limiting reproducibility and bias assessment.
Multiple-candidate scenarios: Real environments often yield several near-indistinguishable candidates; the benchmark does not test ranking quality, top-k recall, or tie-breaking under ambiguity.
Obfuscation and anonymization robustness: The study focuses on removal of direct identifiers; it does not evaluate how stronger anonymization techniques (generalization, k-anonymity/l-diversity/t-closeness, synthetic data, controlled perturbations) affect agentic linkage.
Minimal-cue analysis: The minimum number/type of cues required for successful linkage and the marginal utility of each cue type (e.g., intrinsic vs coordinate) are not systematically quantified.
Multi-agent/tool ecosystems: Experiments use single agents; real attackers may orchestrate multiple specialized tools or agents (people search, record linkage, graph mining); the compounding risk is unmeasured.
Lifecycle and memory effects: How persistent agent memories, scratchpads, and cross-task state influence cumulative linkage risk over time is not evaluated.
Policy and governance integration: The paper argues for treating identity inference as a first-class risk but does not propose concrete operational testing standards, auditing procedures, or mappings to regulatory requirements (e.g., GDPR re-identification risk thresholds).
Ethical safeguards and IRB oversight: While names and trails are withheld, the ethical framework, approvals, and protections for individuals in reconstructed AOL/modern cases are not detailed.
Domain breadth: Benchmarked scenarios focus on commercial/enterprise settings; high-stakes domains (healthcare, education, legal case files, government records) are not covered, leaving sector-specific risk/defense gaps.
Retrieval variability and reproducibility: Web search outcomes can vary by time, geography, personalization, and engine; the study does not report controls for determinism (e.g., cached corpora) or sensitivity analyses.
Adversary knowledge granularity: Only ZK vs MK are considered; intermediate priors (e.g., partial names, location-only priors, social graph hints) and their effects on risk are not mapped.
Turn-level emergence: The point in a dialogue when identities are inferred (early vs late turns) and how interaction length/tools influence emergence is not analyzed.
Ranking metrics in Netflix-like tasks: Only exact-match LSR is reported; top-k accuracy, mean reciprocal rank, and near-miss analysis could better characterize partial success and ambiguity.
Heterogeneous schemas: Real-world linkage spans differing schemas and missingness patterns; the benchmark enforces fixed shared attributes and equal table sizes, limiting applicability to messy data.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

These applications can be deployed today using existing LLMs, retrieval pipelines, and the InferLink benchmark to improve privacy posture, testing, and governance.

Enterprise AI privacy red-teaming with inference-driven linkage tests
- Sectors: software, healthcare, finance, retail, education, HR, legal
- Use case: Integrate InferLink-style tests into AI deployment pipelines to detect when agents infer specific identities from anonymized artifacts (e.g., CRM data + review logs, support tickets + forum posts).
- Tools/workflows/products:
- “Agent privacy auditor” CI step for RAG/agentic apps that runs Implicit, Explicit-ZK, and Explicit-MK scenarios across Intrinsic/Coordinate/Hybrid fingerprints.
- Risk dashboards reporting Linkage Success Rate (LSR) by task framing, model, and fingerprint type.
- Assumptions/dependencies: Requires access to representative paired datasets and model variants used in production; assumes test fixtures approximate real overlaps; may incur compute cost.
Pre-release anonymized data risk scoring (beyond direct PII checks)
- Sectors: healthcare (de-identified EHRs, physician notes), finance (transaction logs), education (LMS traces), government open data portals
- Use case: Before publishing or sharing “anonymized” datasets, run agentic linkage simulations to quantify residual identity risk from weak cues.
- Tools/workflows/products:
- “Linkage-risk meter” that scores datasets by fingerprint composition (Intrinsic, Coordinate, Hybrid).
- Automated reports recommending suppression/generalization of high-risk quasi-identifiers.
- Assumptions/dependencies: Availability of aux data representative of what an adversary might access; organizational policy allowing simulated attacks.
Privacy-aware prompt and policy tuning for agent deployments
- Sectors: software platforms, customer support, productivity tools, copilots
- Use case: Apply the paper’s demonstrated system-prompt guardrails to suppress identity hypothesis generation in both benign and explicit tasks, while monitoring utility.
- Tools/workflows/products:
- Prompt libraries with tested refusals for identity inference; intent detectors that route “who is this?” tasks to safe paths.
- A/B testing harness to quantify privacy–utility trade-offs (as in Table results).
- Assumptions/dependencies: Effectiveness is model-specific; guardrails may reduce task utility or over-refuse benign cross-source reasoning.
Runtime detection and logging of identity-inference attempts
- Sectors: enterprise SaaS, regulated industries (HIPAA/GDPR/GLBA)
- Use case: Add telemetry hooks that flag when an agent starts synthesizing identity hypotheses from multiple sources (e.g., “candidate generation,” “corroboration” patterns).
- Tools/workflows/products:
- Policy engines that intercept outputs containing named entities + evidence chains.
- “Inference firewall” that masks or quarantines identity-level outputs for human review.
- Assumptions/dependencies: Requires robust NER and justification tracing; risks false positives/negatives; must respect logging minimization policies.
Update of DPIAs/IRBs and data-sharing agreements to include inference-driven linkage
- Sectors: academia, hospitals, NGOs, public sector
- Use case: Extend Data Protection Impact Assessments and Institutional Review Board checklists to test “identity inference” (not just direct disclosure) using benchmarked scenarios.
- Tools/workflows/products:
- Standard clauses that require agentic linkage testing prior to dataset release.
- IRB templates that document fingerprint types present and mitigations applied.
- Assumptions/dependencies: Requires staff training and adoption by oversight bodies.
Secure RAG and retrieval configuration to limit corroboration
- Sectors: software, knowledge management, customer support
- Use case: Restrict agent tool-use (web search, cross-index joins) when tasks do not require identity-level retrieval; enforce least-privilege tool policies.
- Tools/workflows/products:
- Tool-permission frameworks that disable cross-source joins by default; query rewriting to exclude identity-resolving searches.
- Assumptions/dependencies: Must balance utility vs. risk; depends on accurate intent classification.
Workforce privacy training and “privacy linting”
- Sectors: daily life, SMEs, journalism, research labs
- Use case: Educate users that “weak cues” (e.g., rare job + small town + niche hobby) in chats or docs can enable re-identification; provide in-editor warnings.
- Tools/workflows/products:
- “Privacy linter” plugins for docs/chats that flag combinations resembling high-risk fingerprints.
- Assumptions/dependencies: Usability and false-positive tolerance; multilingual/locale adaptation.
Vendor and procurement standards for agent privacy certification
- Sectors: enterprise IT, public procurement
- Use case: Require suppliers of agentic systems to report InferLink-style LSRs under implicit and explicit framings, with mitigation performance.
- Tools/workflows/products:
- RFP annexes and scorecards for identity-inference risk metrics.
- Assumptions/dependencies: Market coordination; shared benchmark acceptance.
Incident response and red-team playbooks for identity inference
- Sectors: all
- Use case: Add scenarios where agents produce identity hypotheses unintentionally; define containment and notification procedures.
- Tools/workflows/products:
- Playbooks that disable identity-corroborating tools and roll back logs; forensic analysis templates capturing evidence chains.
- Assumptions/dependencies: Organizational readiness; legal obligations differ by jurisdiction.

Long-Term Applications

These applications require further research, scaling, standardization, or engineering, but are directly motivated by the paper’s methods and findings.

Model-level objectives to suppress identity hypothesis formation
- Sectors: foundation models, AI platforms
- Use case: Train or fine-tune models with losses that penalize “identity hypothesis + evidence synthesis” behaviors while preserving benign reasoning.
- Tools/workflows/products:
- RLHF/RLAIF pipelines using InferLink-style scenarios as negative examples; controllable decoding that suppresses identity-resolving plans.
- Assumptions/dependencies: Availability of high-quality negative signals; risk of over-regularization harming utility.
Formal anonymization standards incorporating agentic linkage audits
- Sectors: policy/regulation, standards bodies (NIST, ISO), healthcare (HIPAA), EU (GDPR)
- Use case: Update de-identification guidance to treat identity inference as a first-class risk; require empirical agent-based audits prior to release.
- Tools/workflows/products:
- Standard test suites (public InferLink extensions) and certification regimes with tiered LSR thresholds.
- Assumptions/dependencies: Regulatory consensus; reproducible, auditable evaluation protocols.
Privacy-preserving data transformations optimized for agent robustness
- Sectors: healthcare, finance, mobility, marketing analytics
- Use case: Design transformations (generalization, suppression, perturbation) targeted at high-risk fingerprints (Intrinsic/Coordinate/Hybrid) to reduce LSR with minimal utility loss.
- Tools/workflows/products:
- Automated transformation search guided by benchmark feedback; risk-aware synthetic data generation.
- Assumptions/dependencies: Need to preserve downstream analytic fidelity; may require domain-specific heuristics.
Proof-carrying agent workflows and “privacy contracts”
- Sectors: enterprise AI, cloud providers
- Use case: Agents attach verifiable attestations (e.g., cryptographic proofs or certified logs) that no identity-level linkage steps occurred in a task.
- Tools/workflows/products:
- Execution sandboxes with formally specified prohibited operations (e.g., cross-source joins); zero-knowledge proofs for policy compliance.
- Assumptions/dependencies: Research into practical proofs for complex agent traces; performance overhead.
Retrieval brokers and privacy-preserving corroboration
- Sectors: search, knowledge platforms
- Use case: Route agent queries through brokers that enforce policies (e.g., block identity-corroborating combinations or restrict retrieval granularity).
- Tools/workflows/products:
- Query classifiers for “corroboration intent”; aggregation services that return cohort-level signals rather than individual-level evidence.
- Assumptions/dependencies: Broker adoption by content providers; potential utility degradation.
OS/browser-level anti-linkage controls for consumer assistants
- Sectors: consumer software, mobile/desktop OS
- Use case: Provide system-level controls that prevent assistants from combining local traces (emails, calendars) with web evidence to form identity hypotheses about third parties.
- Tools/workflows/products:
- Consent gates, cross-app data fences, and “no-identity-inference” modes.
- Assumptions/dependencies: Platform support and ecosystem cooperation.
Multimodal and IoT extensions of InferLink
- Sectors: smart homes, wearables, mobility, robotics
- Use case: Extend benchmarks and defenses to audio, video, sensor, and location traces where Coordinate/Hybrid fingerprints are common (e.g., repeated routes + event attendance).
- Tools/workflows/products:
- Benchmarks and simulators for cross-modal linkage; privacy filters for sensor streams.
- Assumptions/dependencies: Data availability and consent; new evaluation metrics for multimodal evidence chains.
Continuous certification and monitoring services (“risk-as-a-service”)
- Sectors: SaaS, MSSPs, cloud
- Use case: Managed services that run periodic agentic linkage audits, track drift in model behavior, and alert when LSR rises.
- Tools/workflows/products:
- Hosted InferLink variants with sector-specific scenarios; integration to SIEM/GRC tools.
- Assumptions/dependencies: Access to production-like data and models; privacy-preserving audit arrangements.
Legal frameworks clarifying identity inference as personal data processing
- Sectors: legislators, regulators, legal tech
- Use case: Codify that reconstructing an identity from weak cues is processing of personal data; define duties for controllers/processors using agentic systems.
- Tools/workflows/products:
- Compliance checklists; model cards that disclose linkage behavior and mitigations.
- Assumptions/dependencies: Jurisdictional harmonization; enforcement approaches.
Cooperative “identity firewall” agents
- Sectors: enterprise AI, consumer assistants
- Use case: Deploy a companion agent that reviews plans and outputs from task agents, blocking steps indicative of narrowing candidates or corroborating evidence toward specific identities.
- Tools/workflows/products:
- Plan-review APIs; pattern detectors for candidate generation and evidence triangulation; escalation workflows to humans.
- Assumptions/dependencies: Reliable detection of latent plans; latency/throughput trade-offs.
Curriculum and competency frameworks for privacy-aware AI development
- Sectors: academia, professional training
- Use case: Formalize competencies around inference-driven linkage, with labs using the benchmark to teach mitigation and audit practices.
- Tools/workflows/products:
- Open courseware; capstone toolkits for building safe agents.
- Assumptions/dependencies: Faculty adoption; maintenance of open benchmarks.

Notes on feasibility and dependencies:

Effectiveness varies by model capability and task framing (implicit vs explicit) as shown in the paper; guardrails can impose measurable utility costs.
Risks are amplified when agents can retrieve and corroborate public evidence; offline-only agents may exhibit reduced risk.
Real-world aux data availability and overlap structure (Intrinsic/Coordinate/Hybrid) critically affect linkage risk; assessments must mirror operational context.
Organizational governance and procurement practices must evolve to recognize inference-driven linkage as distinct from direct disclosure.

View Paper Prompt View All Prompts

Glossary

Agentic inference: A mode of reasoning where an AI agent actively combines cues and evidence to form conclusions or identities. "Agentic Inference: The agent combines weak cues from anonymized artifacts with corroborating auxiliary context to form a coherent identity hypothesis."
Agentic systems: AI systems that act autonomously with tools and reasoning capabilities, posing unique privacy risks. "Taken together, these results suggest that identity inference---not only information disclosure---should be treated as a first-class privacy risk in agentic systems."
Anonymization: The process of removing explicit identifiers from data to make linking records to individuals difficult. "Anonymization is widely treated as a practical safeguard: once names and other explicit identifiers are removed, records are presumed to be difficult to trace back to specific individuals."
Auxiliary context: External, often public, information used to corroborate or link anonymized records to identities. "Auxiliary Context ($D_{\text{aux}$): Auxiliary information may be either provided directly or retrieved from public sources (e.g., the web, social media, and news), and serves as corroborating evidence."
Confirmed Linkage Count (CLC): A metric counting cases where a produced identity can be independently corroborated. "Instead, we report Confirmed Linkage Count (CLC), defined as the number of cases in which the agent produces a specific identity hypothesis that can be independently corroborated using publicly available evidence consistent with the search history."
Context hijacking: An attack where adversaries manipulate or seize control of an agent’s contextual inputs. "A third line examines adversarial settings for tool-integrated agents, including prompt injection and context hijacking attacks."
Contextual features: Shared, broad-level attributes that allow alignment and comparison across data sources. "Contextual features are shared attributes that align the two sources at a broad level and make cross-source comparison possible"
Coordinate (fingerprint type): A class of identifying overlaps based on spatiotemporal intersections across sources. "The parameter $f$ denotes the fingerprint type and takes one of three values: Intrinsic, Coordinate, or Hybrid."
Corroborating evidence: Independent information that supports and validates a proposed identity link. "Auxiliary information may be either provided directly or retrieved from public sources (e.g., the web, social media, and news), and serves as corroborating evidence."
Data minimization: The principle of limiting data access and use to what is strictly necessary for a task. "AgentDAM operationalizes data minimization."
Deanonymization: The process of reversing anonymization to re-identify individuals within anonymized datasets. "In the Netflix Prize deanonymization setting, agents reconstruct 792 out of 1{,}000 identities (79.2\%), compared to 56.0\% for a classical matching baseline."
Eccentricity thresholds: Heuristic rules used to decide when a top match is sufficiently separated from others. "including rarity-based weights that assign more importance to less common movies, exact tolerance parameters (i.e., hand-tuned rules specifying how much date and rating mismatch to allow), and eccentricity thresholds (i.e., rules for deciding when the top match is sufficiently separated from competing candidates)"
Explicit re-identification: A task framing where the agent is directly asked to identify individuals from anonymized data. "In the classical linkage setting, we study explicit re-identification."
Fingerprint type: The structural pattern of shared cues (e.g., Intrinsic, Coordinate, Hybrid) that enable linkage. "We therefore construct InferLink to isolate three factors: (i) fingerprint type, (ii) task intent, and (iii) attacker knowledge."
Fixed-pool matching: A linkage task where the matching is performed within a predefined set of candidates. "Thus, the shared interface is realized as a fixed-pool matching task in which the auxiliary fragment is provided explicitly and no retrieval is involved."
Ground-truth linkage: A known, true identity overlap used for evaluation. "We enforce a unique ground-truth linkage: exactly one individual appears in both sources, while all other records are non-overlapping."
Hybrid (fingerprint type): A fingerprint that merges both intrinsic and spatiotemporal cues. "The parameter $f$ denotes the fingerprint type and takes one of three values: Intrinsic, Coordinate, or Hybrid."
Identity hypothesis: A specific proposed real-world identity linked to anonymized data, supported by evidence. "the agent produces an identity hypothesis together with supporting evidence."
Identity inference: Inferring who someone is from indirect signals, even without explicit identifiers. "Taken together, these results suggest that identity inference---not only information disclosure---should be treated as a first-class privacy risk in agentic systems."
Identity reconstruction: The end-to-end process of piecing together weak cues to identify a specific person. "we evaluate identity reconstruction across three settings."
Inference-driven linkage: A privacy failure where identities are reconstructed by aggregating non-identifying cues and auxiliary data. "We term this failure mode inference-driven linkage: a privacy failure in which an agent reconstructs a specific real-world identity by combining non-identifying cues from anonymized artifacts with corroborating signals from auxiliary context."
Inference-time privacy risks: Privacy threats arising during model inference when sensitive attributes or identities are deduced. "Recent work also studies inference-time privacy risks, where models infer latent user attributes from text."
Intrinsic (fingerprint type): A fingerprint based on context-invariant personal attributes shared across sources. "The parameter $f$ denotes the fingerprint type and takes one of three values: Intrinsic, Coordinate, or Hybrid."
LLM judge: A LLM used to automatically review and filter data or outputs in an evaluation pipeline. "an LLM judge~\citep{openai2025gpt5} removes logs containing explicit self-identification (e.g., full names, pasted resumes)."
Linkage attacks: Techniques that connect anonymized records to real identities using overlapping auxiliary information. "Classical linkage attacks, exemplified by the Netflix Prize~\citep{narayanan2008robust} and AOL search log incidents~\citep{aol_search_log}, showed that sparse behavioral traces can be identifying when matched against auxiliary data."
Linkage Success Rate (LSR): The proportion of instances where the agent correctly links anonymized data to the true identity. "We report linkage success rate (LSR), defined over $N$ evaluation instances as"
Membership inference: An attack determining whether a specific record was part of a model’s training data. "Representative threats include membership inference"
Membership-Knowledge (MK): An evaluation condition where the agent is told a specific named individual exists in the anonymized data. "The Membership-Knowledge (MK) condition, conversely, guarantees that a specifically named individual from the auxiliary source appears in the anonymized dataset, challenging the agent to uncover the corresponding anonymous identifier."
Open-ended linkage: A linkage process without a fixed candidate set, involving dynamic narrowing and corroboration. "In the AOL setting, the agent performs open-ended linkage by moving from {\color{anonblue}anonymized queries ($D_{\text{anon}$)} to {\color{auxpurple}corroborating public evidence ($D_{\text{aux}$)} and ultimately to a {\color{hypred}specific identity hypothesis ($\hat{\imath$)}."
Privacy-aware system prompt: A system-level instruction designed to prevent or reduce identity linkage by an agent. "We also show that privacy-aware system prompts can suppress linkage in the controlled benchmark, but at a measurable cost to task performance, revealing a concrete privacy--utility trade-off."
Privacy–utility trade-off: The balance between protecting privacy and maintaining useful task performance. "revealing a concrete privacy--utility trade-off."
Prompt injection: An adversarial technique that inserts malicious instructions to manipulate an agent’s behavior. "A third line examines adversarial settings for tool-integrated agents, including prompt injection and context hijacking attacks."
Quasi-identifiers: Non-unique attributes that, when combined, can uniquely identify individuals. "This validation ensures the business setting naturally requires integrating both sources and supports inference-driven linkage through realistic quasi-identifiers."
Rarity-based weights: Scoring weights that give more importance to rare features during matching. "including rarity-based weights that assign more importance to less common movies"
Shared deanonymization interface: A common input–output mapping for evaluating identity reconstruction across settings. "We formalize the evaluation through a shared deanonymization interface"
Side-only attributes: Features that appear in only one of the two data sources, not shared across them. "side-only attributes appear in only one source."
Sparse identification anchors: Rare, distinctive shared attributes that sharply narrow candidate identities. "sparse identification anchors are rarer and more distinctive shared attributes that sharply narrow the candidate set"
Spatiotemporal intersections: Overlaps defined by time and location that can link records between sources. "A Coordinate schema captures spatiotemporal intersections."
Training-data extraction: Recovering verbatim training data from a model’s outputs or behavior. "Representative threats include ... training-data extraction"
Training-time exposure: Risks arising from sensitive data memorized during model training. "Prior work on LLM privacy has focused primarily on training-time exposure."
Web-capable agents: Agents with the ability to browse and retrieve information from the web to support tasks. "Li~\citep{li2026agentic} shows that web-capable agents can re-identify anonymized interview participants by retrieving and corroborating public web evidence."
Zero-Knowledge (ZK): An evaluation condition where the agent is told only that some intersection exists, without a specific named target. "Under the Explicit framing, the Zero-Knowledge (ZK) condition informs the agent only that an undefined intersection exists between the sources, requiring it to discover any overlapping records."
Tolerance parameters: Hand-tuned limits that specify acceptable mismatches during matching (e.g., on dates or ratings). "including rarity-based weights that assign more importance to less common movies, exact tolerance parameters (i.e., hand-tuned rules specifying how much date and rating mismatch to allow), and eccentricity thresholds"

From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents

Summary

Inference-Driven De-Anonymization in LLM Agents: Privacy Risks from Weak Cues

Introduction

Problem Definition and Threat Model

Evaluation on Classical Linkage Cases

InferLink: A Controlled Benchmark

Case Studies on Modern Digital Traces

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions did the researchers ask?

How did they study it?

1) Classic cases (Netflix and AOL)

2) InferLink (a controlled benchmark they built)

3) Modern, text-rich data

What did they find?

Why does this matter?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Collections

Tweets