Papers
Topics
Authors
Recent
Search
2000 character limit reached

(n,ε)-DistanceDP: Metric-Based Differential Privacy

Updated 17 January 2026
  • (n,ε)-DistanceDP is a framework that extends traditional differential privacy by quantifying privacy loss based on the metric distance between inputs.
  • It employs noise mechanisms like Laplace and Gamma distributions to adapt privacy guarantees across Euclidean, Hamming, and edit distance spaces.
  • The framework supports applications including private graph shortest path release, privacy-preserving nearest neighbor search, and efficient DP string distance data structures.

The (n,ϵ)(n,\epsilon)-DistanceDP framework generalizes differential privacy to settings where proximity is measured in a metric space, rather than via small, discrete data changes. This construct provides a unified method for quantifying and controlling privacy leakage as a function of the distance between inputs, with ϵ\epsilon scaling the allowable divergence in output distributions. Current research leverages (n,ϵ)(n,\epsilon)-DistanceDP for private release of all-pairs shortest path distances in graphs, privacy-preserving nearest neighbor search via embeddings, and fast, differentially private string distance data structures. This framework introduces new algorithmic strategies, privacy-utility trade-offs, composition mechanisms, and application paradigms, enabling privacy guarantees calibrated to Euclidean, Hamming, or edit distance.

1. Formal Definition and Basic Principles

The (n,ϵ)(n,\epsilon)-DistanceDP property is defined for a randomized mechanism K:RnYK: \mathbb{R}^n \to \mathcal{Y} as follows: for all x,xRnx, x' \in \mathbb{R}^n and for all measurable outputs SYS \subseteq \mathcal{Y},

Pr[K(x)S]exp(ϵxx2)Pr[K(x)S)\Pr[K(x) \in S] \le \exp(\epsilon \|x - x'\|_2) \Pr[K(x') \in S)

with the equivalent log-likelihood ratio condition:

lnPr[K(x)=y]Pr[K(x)=y]ϵxx2,yY.\ln \frac{\Pr[K(x)=y]}{\Pr[K(x')=y]} \le \epsilon \|x-x'\|_2, \quad \forall y \in \mathcal{Y}.

In standard (ϵ,δ)(\epsilon,\delta)-differential privacy, "neighboring" datasets differ by one entry; in (n,ϵ)(n,\epsilon)-DistanceDP, privacy degrades gracefully with the metric distance between inputs. This property is applicable across Euclidean vector spaces, edge-weighted graphs (where adjacency is defined via 1\ell_1 norm), Hamming spaces, and similar settings (Cheng et al., 2024, Ghazi et al., 2022, Hu et al., 2024).

2. Core Mechanisms and Algorithms

For Euclidean spaces, the Laplace-Distance mechanism achieves (n,ϵ)(n,\epsilon)-DistanceDP by adding noise with density proportional to exp(ϵyx2)\exp(-\epsilon\|y-x\|_2). Sampling proceeds by generating a radius rGamma(n,1/ϵ)r \sim \mathrm{Gamma}(n, 1/\epsilon) and a random direction, yielding y=x+rv^y = x + r\hat v where v^\hat v is uniformly random on the unit sphere. The expected perturbation norm is E[r]=n/ϵ\mathbb{E}[r] = n/\epsilon.

In graph settings, (n,ϵ)(n,\epsilon)-DistanceDP is instantiated for weight-release tasks. Here, two edge-weight vectors w,ww, w' are neighbors if ww11\|w - w'\|_1 \le 1. Mechanisms apply Laplace or Gaussian noise to edge weights or derived shortest-path distances, yielding additive error bounds that depend sublinearly on nn:

  • Pure ϵ\epsilon-DP: additive error O~(n2/3/ϵ)\tilde{O}(n^{2/3}/\epsilon) (Ghazi et al., 2022).
  • Approximate (ϵ,δ)(\epsilon,\delta)-DP: additive error O~(n/ϵ)\tilde{O}(\sqrt{n}/\epsilon).
  • Specialized for feedback vertex set size kk: error O~(k/ϵ)\tilde{O}(k/\epsilon) (Fan et al., 2022).

In string distance tasks, sketch-and-flip approaches use layered hash-based sketches with randomized response bit flipping, enabling ϵ\epsilon-DP release of Hamming or edit distance tables with polylogarithmic error scaling and sublinear query time when the query radius kk is moderate (Hu et al., 2024).

3. Theoretical Properties: Composition and Post-processing

(n,ϵ)(n,\epsilon)-DistanceDP satisfies key theoretical properties analogous to classical DP:

  • Post-processing invariance: If KK satisfies (n,ϵ)(n,\epsilon)-DistanceDP, so does any function fKf \circ K.
  • Sequential composition: Joint mechanisms K1,K2K_1, K_2 with privacy budgets ϵ1,ϵ2\epsilon_1, \epsilon_2 satisfy (n,ϵ1+ϵ2)(n, \epsilon_1+\epsilon_2)-DistanceDP.
  • Parallel composition: If xx decomposes as (xA,xB)(x_A, x_B) and K1K_1, K2K_2 act independently, releasing (K1(xA),K2(xB))(K_1(x_A), K_2(x_B)) satisfies max{ϵ1,ϵ2}\max\{\epsilon_1, \epsilon_2\} DistanceDP (Cheng et al., 2024).

4. Privacy–Utility Trade-offs and Lower Bounds

The privacy–utility trade-off in (n,ϵ)(n,\epsilon)-DistanceDP is controlled by the noise magnitude, scaling with n/ϵn/\epsilon for Euclidean embeddings. High-dimensional noise exhibits sharp concentration, allowing accurate estimation of induced perturbation scales (Cheng et al., 2024). In graph distance release, additive errors for the all-pairs shortest path task are shown to be polynomially sublinear in nn:

  • Main upper bounds: O~(n2/3/ϵ)\tilde{O}(n^{2/3}/\epsilon) for pure-DP, O~(n1/2/ϵ)\tilde{O}(n^{1/2}/\epsilon) or better with structural restrictions (Ghazi et al., 2022, Fan et al., 2022).
  • Lower bound: Any (ε,δ)-DP algorithm for APSD requires additive error at least Ω(n1/6)\Omega(n^{1/6}) (Ghazi et al., 2022) (via reduction from linear query discrepancy).

For string tasks, error scales as O~(k/eϵ/logk)\tilde{O}(k/e^{\epsilon/\log k}) or O~(k/eϵ/(logklogn))\tilde{O}(k/e^{\epsilon/(\log k\log n)}), so increasing ϵ\epsilon by clogkc\log k halves the additive error (Hu et al., 2024).

5. Applications in Algorithms and Systems

5.1 Private Graph Distance Release

The (n,ϵ)(n,\epsilon)-DistanceDP framework underpins the first sublinear-error algorithms for the private release of all-pairs shortest path distances in weighted undirected graphs. Key approaches include hub sampling combined with noise mechanisms for both edge weights and a subset of node pairs, canonical path decompositions via shortcuts, and advanced analysis of synthetic graph construction. For graphs with small feedback vertex sets, specialized mechanisms further improve accuracy (Ghazi et al., 2022, Fan et al., 2022). Allowing multiplicative stretch (e.g., via Thorup–Zwick spanners) interpolates between additive and multiplicative guarantees (Ghazi et al., 2022).

In privacy-preserving cloud retrieval pipelines, (n,ϵ)(n,\epsilon)-DistanceDP offers a natural mechanism for perturbing vector embeddings, such as those used in retrieval-augmented LLMs (RAG). The two-stage retrieval process (coarse selection via noised embedding, refinement via encrypted computation) leverages the (n,ϵ)(n,\epsilon) guarantee to bound privacy leakage and maintain retrieval accuracy while reducing server workload and transmission sizes (Cheng et al., 2024).

5.3 Differentially Private String Distance Data Structures

For Hamming and edit distances, (n,ϵ)(n,\epsilon)-DistanceDP is realized through sketch-and-flip data structures that are ϵ\epsilon-DP in the function-release sense. One-time publication of the DP synopsis enables sublinear per-query processing and ensures accuracy for all queries within specified radius kk (Hu et al., 2024).

6. Extensions and Open Directions

Current research points to several open problems, including closing the gap between upper and lower bounds for graph distance release error (notably, between Ω(n1/6)\Omega(n^{1/6}) and the best known upper bounds), developing improved lower bounds under approximate DP or multiplicative stretch, and further refining mechanisms for high-dimensional and structured data regimes (Ghazi et al., 2022). In the context of embedding perturbation and secure retrieval, adaptation to other metric spaces and adversarial threat models is ongoing (Cheng et al., 2024).

7. Implementation Complexity and Efficiency

Algorithmic realizations of (n,ϵ)(n,\epsilon)-DistanceDP mechanisms are efficiently computable:

  • Hub-based graph algorithms operate in overall polynomial time, typically O(n2+o(1))O(n^{2+o(1)}) to O(n3)O(n^3) (Ghazi et al., 2022, Fan et al., 2022).
  • Embedding mechanisms require only sampling from a Gamma(n,1/ϵ)\mathrm{Gamma}(n,1/\epsilon) and unit sphere, enabling scalable client-side implementation (Cheng et al., 2024).
  • Sketch-and-flip string data structures are built in O(mn)O(mn) time (for mm database strings), with query run-time O~(mk+n)\tilde{O}(mk+n) or O~(mk2+n)\tilde{O}(mk^2+n) (Hu et al., 2024).

By decoupling privacy loss from discrete record-edit operations and instead calibrating noise magnitude to geometric distance, (n,ϵ)(n,\epsilon)-DistanceDP expands the design space for differentially private algorithms and practical data analysis systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to $(n,\epsilon)$-DistanceDP Framework.