Papers
Topics
Authors
Recent
Search
2000 character limit reached

ppRAG: Privacy-Preserving RAG Framework

Updated 25 January 2026
  • ppRAG is a framework that securely integrates sensitive external data into LLM responses via on-device anonymization, federated orchestration, and encryption.
  • The framework leverages edge-cloud hybrid architectures, supporting multi-modal data retrieval and three-tier caching to optimize latency and retrieval fidelity.
  • It employs robust privacy mechanisms—rule-based masking, context-aware erasure, and homomorphic encryption—to mitigate data leakage in privacy-sensitive applications.

Privacy-Preserving Retrieval-Augmented Generation (ppRAG) frameworks are architectural and algorithmic designs that enable LLMs to incorporate external, potentially sensitive knowledge sources during response generation, while rigorously protecting privacy and mitigating information leakage. These systems are motivated by urgent needs in healthcare, legal, financial, and enterprise environments, where conventional RAG pipelines are unsuitable due to privacy regulations, heterogeneous data modalities, and adversarial threat models. Recent frameworks integrate on-device anonymization, federated orchestration, encryption (including homomorphic and symmetric schemes), differential privacy, and caching strategies to achieve high retrieval fidelity, practical system efficiency, and empirically strong privacy guarantees (Qian et al., 8 Sep 2025).

1. System Architectures and Edge–Cloud Federation

ppRAG frameworks employ multi-layered architectures blending edge devices and cloud infrastructure for distributed, privacy-respecting reasoning (Qian et al., 8 Sep 2025). In HyFedRAG, local (edge) clients—such as hospital systems—execute lightweight retrieval over SMS notes, SQL patient tables, or Neo4j knowledge graphs. Each client runs local LLMs (e.g. Llama-3.1-8B-Instruct), anonymizes summaries via dedicated modules (rule-based, context-aware, homomorphic encryption), and caches privacy-preserved response features. A central server, managed by the Flower federated learning platform, orchestrates global inference without aggregating raw data.

The architectural pattern is:

  • Edge Layer: Local retrievers (text, SQL, KG), client-side anonymization using Presidio, Eraser4RAG, and TenSEAL, tier-1 cache for summary features.
  • Middleware Layer: Caching/scheduling proxy; tier-2 cache for LLM input representations; tier-3 cache for cloud-generated results.
  • Cloud Layer: Federated controller (Flower), global LLM engine fusing edge-supplied anonymized summaries, final answer/report generator.

This design is optimized for heterogeneous modalities and distributed, privacy-sensitive environments, notably enabling federated answer synthesis while local modalities remain isolated (Mao et al., 27 Apr 2025, Addison et al., 2024).

2. Data Modalities, Retrieval, and Standardization

ppRAG systems support multiple data modalities, with each client handling either unstructured text (clinical notes), structured SQL tables, or semi-structured KGs (Qian et al., 8 Sep 2025). Retrieval comprises:

  • Text: Precomputed TF-IDF/FAISS embeddings, hybrid reranking (BGE).
  • SQL: Extraction of entities via NER, fusion scoring (exact match, boolean, NL search, semantic similarity), reranking via cross-encoders.
  • Knowledge Graphs: Neo4j triple storage, NER-based entity extraction and flag-based reranking, multi-hop statement traversal.

Post-retrieval, a local LLM generates standardized, privacy-preserving summaries, which are mapped as σ(d)\sigma(d) via an implicit transformation fθ()f_\theta(\cdot). Vector embeddings for cross-client reasoning use BGE or cross-encoders and are encrypted if globally transmitted.

All retrievers follow a two-stage pipeline: lightweight candidate generation (e.g., index filtering) followed by deep reranking (FlagReranker). Objective functions blend sparse cosine similarity and dense neural scores, e.g. Score(q,d)=αcostfidf(q,d)+(1α)sreranker(q,d)Score(q,d)=\alpha\cdot\cos_{tfidf}(q,d)+(1-\alpha)\cdot s_{reranker}(q,d).

No client-side retriever fine-tuning or additional training losses are reported in (Qian et al., 8 Sep 2025).

3. Privacy Mechanisms: Anonymization, Encryption, and Differential Privacy

Edge-side privacy preservation is implemented via three main approaches:

  • Rule-Based Masking (Presidio): Detection of PII via regex/model, replacement with type-specific placeholders.
  • Context-Aware Erasure (Eraser4RAG): Identification and masking of non-relevant spans, robust against side-channel attacks.
  • Homomorphic Encryption (TenSEAL): Encrypted embeddings for server-side aggregation, protecting intermediate representations.

No explicit (ε,δ)-differential privacy guarantees, parameters, or formal proofs are presented in HyFedRAG; empirical privacy–utility curves are plotted, showing significant improvement in de-identification utility scores (G-Eval ≈ 0.9 for de-identified vs. ≈ 0.6 for unprotected outputs).

The threat model assumes honest-but-curious servers, where adversaries may observe encrypted summaries and cache traffic, but never access raw data.

4. Federated Caching and Latency Optimization

Three-tier caching is employed to minimize latency and redundant computation:

  • Tier 1: Client-side cache of summary features (σ(d)\sigma(d)), pure LRU eviction.
  • Tier 2: Middleware cache of transformed LLM inputs, LRU plus one-hop neighbor prefetch.
  • Tier 3: Middleware cache of server inference outputs, static hotspot plus two-hop prefetch.

Reported cumulative hit rates are approximately 84%, resulting in cross-client latency reductions of about 80% (Qian et al., 8 Sep 2025).

5. Federated Learning and Aggregation Protocols

HyFedRAG leverages federated orchestration (Flower), enabling multi-client inference and caching without raw data aggregation (Mao et al., 27 Apr 2025, Addison et al., 2024). While the system focuses on inference (not federated training), federated aggregation controllers synchronize edge-generated, privacy-preserved representations for global LLM reasoning. There is no formal model weight aggregation (e.g., FedAvg) or convergence analysis provided, noting the need for future research to support differential privacy–based formal guarantees and federated training protocols.

6. Experimental Evaluation: Retrieval, Generation, and Privacy Metrics

HyFedRAG is evaluated on the PMC-Patients dataset, comprising ~50K patient records and 400K articles across modalities. Metrics include:

  • Retrieval Quality: MRR (text: 39.63%, SQL: 23.01%, KG: 9.79%), P@10, nDCG@10, plus Hit@k for structured queries.
  • Generation Consistency: G-Eval (GPT-4o) normalized to [0,1]; de-identified ouputs score ≈0.9.
  • System Efficiency: End-to-end federated simulation shows ~80% latency reduction.
  • Privacy-Utility Trade-off: Minimal drop in GEval utility for maximal privacy gains.

7. Limitations and Research Directions

HyFedRAG (ppRAG) presents a scalable architecture but lacks formal differential privacy mechanisms—no explicit ε\varepsilon-DP, δ\delta, or embedding noise injection as in compositional DP-RAG frameworks (Koga et al., 2024, Wu et al., 10 Nov 2025). Extensions required for strict guarantees include:

  • Incorporation of explicit DP routines (noise injection, privacy accounting).
  • Specification of weight-aggregation rules and convergence proofs for federated learning.
  • Formal modeling of privacy trade-offs under adaptive or adversarial query settings.

A plausible implication is that researchers wishing for rigorous, theoretically sound privacy guarantees or full federated training should extend HyFedRAG with formal DP and aggregation components.


Key Reference: "HyFedRAG: A Federated Retrieval-Augmented Generation Framework for Heterogeneous and Privacy-Sensitive Data" (Qian et al., 8 Sep 2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Privacy-Preserving RAG Framework (ppRAG).