Papers
Topics
Authors
Recent
Search
2000 character limit reached

Humanitarian Data Exchange Overview

Updated 4 February 2026
  • Humanitarian Data Exchange is a platform for sharing high-value mobility data, such as origin–destination matrices, while ensuring rigorous privacy protection.
  • It employs differentially private mechanisms like the Private-OD algorithm, which adds Laplace noise, applies cell suppression, and maintains strong privacy guarantees.
  • Empirical case studies from Afghanistan and Rwanda demonstrate HDX's capability to balance data utility and privacy for effective humanitarian policy and aid distribution.

The Humanitarian Data Exchange (HDX) serves as a platform for sharing high-value mobility data, such as origin–destination (O–D) movement matrices derived from mobile phone data, to inform policymaking in humanitarian crises—ranging from pandemics to natural disasters. A persistent challenge is reconciling the utility needs of response agencies with the formal privacy protections required for sensitive personal mobility traces. Recent developments introduce differentially private mechanisms for such data dissemination, implementing rigorous privacy guarantees without prohibitive accuracy costs (Kohli et al., 2023).

1. Differentially Private Mobility Matrices

Mobility data are typically summarized as O–D matrices, M(d)Nk×kM(d) \in \mathbb{N}^{k \times k}, where each Ma,b(d)M_{a,b}(d) denotes the number of trips from region aa to bb (with Ma,a(d)=0M_{a,a}(d) = 0). The Private-OD algorithm enables the release of such matrices under pure ϵ\epsilon-differential privacy, parameterized by a privacy budget ϵ\epsilon, a per-user trip bound TT, and a cell-suppression threshold τ\tau.

The Private-OD workflow consists of:

  • Pre-processing: For individual-level privacy, each subscriber’s trip count is truncated to TT by dropping excess trips uniformly at random. For trip-level privacy, set T=1T=1.
  • Noise addition: Noise ηa,bLaplace(λ=T/ϵ)\eta_{a,b} \sim \mathrm{Laplace}(\lambda=T/\epsilon) is added independently to each off-diagonal cell.
  • Post-processing: The noisy count is rounded up and, if below τ\tau, is set to zero. This suppresses small noisy counts.

This mechanism exploits the post-processing immunity of pure differential privacy, ensuring that the entire process, including thresholding and rounding, preserves ϵ\epsilon-DP.

2. Formal Privacy and Accuracy Guarantees

Private-OD extends to both trip-level and individual-level neighboring dataset definitions, as follows:

  • Trip-level neighbors differ by one trip record.
  • Individual-level neighbors differ in all trips of a single person (up to TT trips).

The algorithm is proven to achieve pure ϵ\epsilon-differential privacy:

P[M^(d)S]eϵP[M^(d)S]P[\widehat{M}(d) \in S] \leq e^{\epsilon} P[\widehat{M}(d') \in S]

for all neighboring d,dd, d' and output event sets SS.

Accuracy guarantees are derived via high-probability bounds on absolute cell-wide error. For any α0\alpha \geq 0, whenever both Ma,bτM_{a,b} \geq \tau and M^a,bτ\widehat{M}_{a,b} \geq \tau,

P[M^a,bMa,b>α]=exp[ϵ(α+0.5)/T].P[|\widehat{M}_{a,b} - M_{a,b}| > \alpha] = \exp[-\epsilon \cdot (\alpha + 0.5)/T].

With probability at least 1β1-\beta, the error per cell is bounded by α\alpha for ϵT(α+0.5)1lnβ\epsilon \geq -T(\alpha+0.5)^{-1}\ln\beta. For time-differenced (“trend”) queries, analogous exponentially-decaying error bounds are provided, and sharp trade-off equations between ϵ\epsilon, α\alpha, and failure probability β\beta are derived using the Lambert–W function.

3. Application to Humanitarian Response: Afghanistan and Rwanda Case Studies

Private-OD was empirically validated on mobile CDR datasets across three contexts:

  • Rwanda 2008 (7 days, 541,000 subscribers, 13 million calls)
  • Afghanistan 2015 (7 days post-conflict, 2.79 million subscribers, 64 million calls)
  • Afghanistan 2020 (305 days during COVID, 7.12 million subscribers, 3.2 billion calls)

Spatial resolutions reflected admin-2 (province/district) and admin-3 (sub-district) units, with τ=15\tau=15 for small-flow suppression and ϵ{0.1,0.5,1.0}\epsilon \in \{0.1, 0.5, 1.0\}.

3.1 Pandemic Response Modeling

In Afghanistan 2020, daily O–D matrices were employed in a mobility-extended SIR model: \begin{align*} \frac{dS_i}{dt} &= -\beta S_i I_i / N_i - \alpha \beta S_i \left(\sum_j M_{i,j} I_j / N_j \right) / \left( N_i + \sum_j M_{i,j} \right) \ \frac{dI_i}{dt} &= -\frac{dS_i}{dt} - \frac{dR_i}{dt} \ \frac{dR_i}{dt} &= \mu I_i / N_i \end{align*} with policy triggers enacted when Ii/Ni20%I_i/N_i \geq 20\%. Intervention decisions derived from differentially private matrices (with ϵ=0.5\epsilon=0.5) matched those from non-private data with 97% accuracy at admin-2 (province) and 79% at admin-3 (district) resolution.

Summary of Decision Accuracy

ε (privacy) Acc. (province) Prec. (province) Rec. (province) Acc. (district) Prec. (district) Rec. (district)
Non-private 100% 100% 100% 100% 100% 100%
0.1 (strong priv.) 93% 74% 74% 71% 33% 34%
0.5 97% 89% 89% 79% 52% 53%
1.0 (weaker priv.) 98% 93% 93% 82% 58% 59%

3.2 Aid Distribution and Mobility Shocks

Following the Kunduz conflict and Lake Kivu earthquake, weekly O–D matrices measured out-migration and ranked top-3 destination regions. Even at ϵ=0.5\epsilon=0.5, total-flow errors remained <<8% and top-3 destination identification accuracy exceeded 90%, for both province/district and sub-district spatial granularities. This suggests robust utility for rapid humanitarian targeting given moderate privacy budgets.

Event Spatial unit Non-private out-mig. Out-mig. (ϵ=0.5\epsilon=0.5) Percent error (ϵ=0.5\epsilon=0.5) Top-3 accuracy (ϵ=0.5\epsilon=0.5)
Kunduz Admin-2 49,994 48,725 2.54% 90.5%
Kunduz Admin-3 87,007 82,671 4.98% 90.5%
Kivu Admin-2 51,102 47,331 7.38% 100%
Kivu Admin-3 32,627 29,930 8.27% 95.2%

4. Practical Considerations for Data Publication

When releasing differentially private mobility matrices via HDX, operational design requires several critical selections:

  • Privacy parameter selection (ϵ\epsilon, δ\delta): The studied mechanism is pure DP (δ=0\delta=0). Recommended ϵ\epsilon values balance strong privacy (ϵ=0.1\epsilon=0.1) and practical accuracy (ϵ=0.5\epsilon=0.5).
  • Heuristic tuning: For a target per-cell error α\alpha, choose ϵ2/α\epsilon \geq \sqrt{2}/\alpha. Alternately, for a failure probability β\beta at error α\alpha, set ϵT(α+0.5)1lnβ\epsilon \geq -T(\alpha+0.5)^{-1}\ln\beta.
  • Temporal composition: Multi-day releases sum to DϵD \cdot \epsilon privacy cost. Given subscriber travel rates, the realized privacy loss may be much smaller than the worst-case bound.
  • Spatial granularity and sparsity: Coarser geographies reduce sparsity, improving utility. Higher τ\tau suppresses small noisy flows to limit high relative errors.
  • Resource and computational cost: Each day requires O(k2)O(k^2) noise additions and post-processing, which is readily parallelizable for kk in the hundreds.

HDX stewards are advised to document private release parameters and empirically observed error distributions alongside published data. Simple look-up tables and visualizations of error magnitude per cell (e.g., “with ϵ=0.5\epsilon=0.5, 95% of cells err by 10\leq 10 trips”) facilitate downstream uncertainty quantification.

5. Limitations and Responsible Data Use

Despite strong empirical performance for policy-relevant aggregate questions, certain limitations are intrinsic. Small flows (particularly at fine granularity) may be suppressed or have high relative error under moderate ϵ\epsilon. The platform must therefore match equilibrium between privacy risk, spatial scale, and the meaningful utility of released data. Providing clear metadata and error guides supports responsible interpretation and decision-making under uncertainty. A plausible implication is that HDX’s adoption of such privacy frameworks could serve as a template for other stakeholders handling high-sensitivity human mobility data (Kohli et al., 2023).

6. Summary and Ongoing Developments

By deploying per-cell Laplace noise with parameter T/ϵT/\epsilon and simple post-processing, HDX can release O–D mobility matrices with formal ϵ\epsilon-differential privacy. The exponentially decaying error tail in ϵ\epsilon ensures that, in empirical high-density CDR datasets, utility for crisis response and humanitarian targeting remains high at ϵ0.5\epsilon \approx 0.5. The general approach supports data-sharing frameworks where formal privacy, empirical utility, and transparent risk-utility trade-offs are central to public-good data dissemination (Kohli et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Humanitarian Data Exchange.