(n,ε)-DistanceDP: Metric-Based Differential Privacy
- (n,ε)-DistanceDP is a framework that extends traditional differential privacy by quantifying privacy loss based on the metric distance between inputs.
- It employs noise mechanisms like Laplace and Gamma distributions to adapt privacy guarantees across Euclidean, Hamming, and edit distance spaces.
- The framework supports applications including private graph shortest path release, privacy-preserving nearest neighbor search, and efficient DP string distance data structures.
The -DistanceDP framework generalizes differential privacy to settings where proximity is measured in a metric space, rather than via small, discrete data changes. This construct provides a unified method for quantifying and controlling privacy leakage as a function of the distance between inputs, with scaling the allowable divergence in output distributions. Current research leverages -DistanceDP for private release of all-pairs shortest path distances in graphs, privacy-preserving nearest neighbor search via embeddings, and fast, differentially private string distance data structures. This framework introduces new algorithmic strategies, privacy-utility trade-offs, composition mechanisms, and application paradigms, enabling privacy guarantees calibrated to Euclidean, Hamming, or edit distance.
1. Formal Definition and Basic Principles
The -DistanceDP property is defined for a randomized mechanism as follows: for all and for all measurable outputs ,
with the equivalent log-likelihood ratio condition:
In standard -differential privacy, "neighboring" datasets differ by one entry; in -DistanceDP, privacy degrades gracefully with the metric distance between inputs. This property is applicable across Euclidean vector spaces, edge-weighted graphs (where adjacency is defined via norm), Hamming spaces, and similar settings (Cheng et al., 2024, Ghazi et al., 2022, Hu et al., 2024).
2. Core Mechanisms and Algorithms
For Euclidean spaces, the Laplace-Distance mechanism achieves -DistanceDP by adding noise with density proportional to . Sampling proceeds by generating a radius and a random direction, yielding where is uniformly random on the unit sphere. The expected perturbation norm is .
In graph settings, -DistanceDP is instantiated for weight-release tasks. Here, two edge-weight vectors are neighbors if . Mechanisms apply Laplace or Gaussian noise to edge weights or derived shortest-path distances, yielding additive error bounds that depend sublinearly on :
- Pure -DP: additive error (Ghazi et al., 2022).
- Approximate -DP: additive error .
- Specialized for feedback vertex set size : error (Fan et al., 2022).
In string distance tasks, sketch-and-flip approaches use layered hash-based sketches with randomized response bit flipping, enabling -DP release of Hamming or edit distance tables with polylogarithmic error scaling and sublinear query time when the query radius is moderate (Hu et al., 2024).
3. Theoretical Properties: Composition and Post-processing
-DistanceDP satisfies key theoretical properties analogous to classical DP:
- Post-processing invariance: If satisfies -DistanceDP, so does any function .
- Sequential composition: Joint mechanisms with privacy budgets satisfy -DistanceDP.
- Parallel composition: If decomposes as and , act independently, releasing satisfies DistanceDP (Cheng et al., 2024).
4. Privacy–Utility Trade-offs and Lower Bounds
The privacy–utility trade-off in -DistanceDP is controlled by the noise magnitude, scaling with for Euclidean embeddings. High-dimensional noise exhibits sharp concentration, allowing accurate estimation of induced perturbation scales (Cheng et al., 2024). In graph distance release, additive errors for the all-pairs shortest path task are shown to be polynomially sublinear in :
- Main upper bounds: for pure-DP, or better with structural restrictions (Ghazi et al., 2022, Fan et al., 2022).
- Lower bound: Any (ε,δ)-DP algorithm for APSD requires additive error at least (Ghazi et al., 2022) (via reduction from linear query discrepancy).
For string tasks, error scales as or , so increasing by halves the additive error (Hu et al., 2024).
5. Applications in Algorithms and Systems
5.1 Private Graph Distance Release
The -DistanceDP framework underpins the first sublinear-error algorithms for the private release of all-pairs shortest path distances in weighted undirected graphs. Key approaches include hub sampling combined with noise mechanisms for both edge weights and a subset of node pairs, canonical path decompositions via shortcuts, and advanced analysis of synthetic graph construction. For graphs with small feedback vertex sets, specialized mechanisms further improve accuracy (Ghazi et al., 2022, Fan et al., 2022). Allowing multiplicative stretch (e.g., via Thorup–Zwick spanners) interpolates between additive and multiplicative guarantees (Ghazi et al., 2022).
5.2 Private Embeddings and Nearest Neighbor Search
In privacy-preserving cloud retrieval pipelines, -DistanceDP offers a natural mechanism for perturbing vector embeddings, such as those used in retrieval-augmented LLMs (RAG). The two-stage retrieval process (coarse selection via noised embedding, refinement via encrypted computation) leverages the guarantee to bound privacy leakage and maintain retrieval accuracy while reducing server workload and transmission sizes (Cheng et al., 2024).
5.3 Differentially Private String Distance Data Structures
For Hamming and edit distances, -DistanceDP is realized through sketch-and-flip data structures that are -DP in the function-release sense. One-time publication of the DP synopsis enables sublinear per-query processing and ensures accuracy for all queries within specified radius (Hu et al., 2024).
6. Extensions and Open Directions
Current research points to several open problems, including closing the gap between upper and lower bounds for graph distance release error (notably, between and the best known upper bounds), developing improved lower bounds under approximate DP or multiplicative stretch, and further refining mechanisms for high-dimensional and structured data regimes (Ghazi et al., 2022). In the context of embedding perturbation and secure retrieval, adaptation to other metric spaces and adversarial threat models is ongoing (Cheng et al., 2024).
7. Implementation Complexity and Efficiency
Algorithmic realizations of -DistanceDP mechanisms are efficiently computable:
- Hub-based graph algorithms operate in overall polynomial time, typically to (Ghazi et al., 2022, Fan et al., 2022).
- Embedding mechanisms require only sampling from a and unit sphere, enabling scalable client-side implementation (Cheng et al., 2024).
- Sketch-and-flip string data structures are built in time (for database strings), with query run-time or (Hu et al., 2024).
By decoupling privacy loss from discrete record-edit operations and instead calibrating noise magnitude to geometric distance, -DistanceDP expands the design space for differentially private algorithms and practical data analysis systems.