Humanitarian Data Exchange Overview
- Humanitarian Data Exchange is a platform for sharing high-value mobility data, such as origin–destination matrices, while ensuring rigorous privacy protection.
- It employs differentially private mechanisms like the Private-OD algorithm, which adds Laplace noise, applies cell suppression, and maintains strong privacy guarantees.
- Empirical case studies from Afghanistan and Rwanda demonstrate HDX's capability to balance data utility and privacy for effective humanitarian policy and aid distribution.
The Humanitarian Data Exchange (HDX) serves as a platform for sharing high-value mobility data, such as origin–destination (O–D) movement matrices derived from mobile phone data, to inform policymaking in humanitarian crises—ranging from pandemics to natural disasters. A persistent challenge is reconciling the utility needs of response agencies with the formal privacy protections required for sensitive personal mobility traces. Recent developments introduce differentially private mechanisms for such data dissemination, implementing rigorous privacy guarantees without prohibitive accuracy costs (Kohli et al., 2023).
1. Differentially Private Mobility Matrices
Mobility data are typically summarized as O–D matrices, , where each denotes the number of trips from region to (with ). The Private-OD algorithm enables the release of such matrices under pure -differential privacy, parameterized by a privacy budget , a per-user trip bound , and a cell-suppression threshold .
The Private-OD workflow consists of:
- Pre-processing: For individual-level privacy, each subscriber’s trip count is truncated to by dropping excess trips uniformly at random. For trip-level privacy, set .
- Noise addition: Noise is added independently to each off-diagonal cell.
- Post-processing: The noisy count is rounded up and, if below , is set to zero. This suppresses small noisy counts.
This mechanism exploits the post-processing immunity of pure differential privacy, ensuring that the entire process, including thresholding and rounding, preserves -DP.
2. Formal Privacy and Accuracy Guarantees
Private-OD extends to both trip-level and individual-level neighboring dataset definitions, as follows:
- Trip-level neighbors differ by one trip record.
- Individual-level neighbors differ in all trips of a single person (up to trips).
The algorithm is proven to achieve pure -differential privacy:
for all neighboring and output event sets .
Accuracy guarantees are derived via high-probability bounds on absolute cell-wide error. For any , whenever both and ,
With probability at least , the error per cell is bounded by for . For time-differenced (“trend”) queries, analogous exponentially-decaying error bounds are provided, and sharp trade-off equations between , , and failure probability are derived using the Lambert–W function.
3. Application to Humanitarian Response: Afghanistan and Rwanda Case Studies
Private-OD was empirically validated on mobile CDR datasets across three contexts:
- Rwanda 2008 (7 days, 541,000 subscribers, 13 million calls)
- Afghanistan 2015 (7 days post-conflict, 2.79 million subscribers, 64 million calls)
- Afghanistan 2020 (305 days during COVID, 7.12 million subscribers, 3.2 billion calls)
Spatial resolutions reflected admin-2 (province/district) and admin-3 (sub-district) units, with for small-flow suppression and .
3.1 Pandemic Response Modeling
In Afghanistan 2020, daily O–D matrices were employed in a mobility-extended SIR model: \begin{align*} \frac{dS_i}{dt} &= -\beta S_i I_i / N_i - \alpha \beta S_i \left(\sum_j M_{i,j} I_j / N_j \right) / \left( N_i + \sum_j M_{i,j} \right) \ \frac{dI_i}{dt} &= -\frac{dS_i}{dt} - \frac{dR_i}{dt} \ \frac{dR_i}{dt} &= \mu I_i / N_i \end{align*} with policy triggers enacted when . Intervention decisions derived from differentially private matrices (with ) matched those from non-private data with 97% accuracy at admin-2 (province) and 79% at admin-3 (district) resolution.
Summary of Decision Accuracy
| ε (privacy) | Acc. (province) | Prec. (province) | Rec. (province) | Acc. (district) | Prec. (district) | Rec. (district) |
|---|---|---|---|---|---|---|
| Non-private | 100% | 100% | 100% | 100% | 100% | 100% |
| 0.1 (strong priv.) | 93% | 74% | 74% | 71% | 33% | 34% |
| 0.5 | 97% | 89% | 89% | 79% | 52% | 53% |
| 1.0 (weaker priv.) | 98% | 93% | 93% | 82% | 58% | 59% |
3.2 Aid Distribution and Mobility Shocks
Following the Kunduz conflict and Lake Kivu earthquake, weekly O–D matrices measured out-migration and ranked top-3 destination regions. Even at , total-flow errors remained 8% and top-3 destination identification accuracy exceeded 90%, for both province/district and sub-district spatial granularities. This suggests robust utility for rapid humanitarian targeting given moderate privacy budgets.
| Event | Spatial unit | Non-private out-mig. | Out-mig. () | Percent error () | Top-3 accuracy () |
|---|---|---|---|---|---|
| Kunduz | Admin-2 | 49,994 | 48,725 | 2.54% | 90.5% |
| Kunduz | Admin-3 | 87,007 | 82,671 | 4.98% | 90.5% |
| Kivu | Admin-2 | 51,102 | 47,331 | 7.38% | 100% |
| Kivu | Admin-3 | 32,627 | 29,930 | 8.27% | 95.2% |
4. Practical Considerations for Data Publication
When releasing differentially private mobility matrices via HDX, operational design requires several critical selections:
- Privacy parameter selection (, ): The studied mechanism is pure DP (). Recommended values balance strong privacy () and practical accuracy ().
- Heuristic tuning: For a target per-cell error , choose . Alternately, for a failure probability at error , set .
- Temporal composition: Multi-day releases sum to privacy cost. Given subscriber travel rates, the realized privacy loss may be much smaller than the worst-case bound.
- Spatial granularity and sparsity: Coarser geographies reduce sparsity, improving utility. Higher suppresses small noisy flows to limit high relative errors.
- Resource and computational cost: Each day requires noise additions and post-processing, which is readily parallelizable for in the hundreds.
HDX stewards are advised to document private release parameters and empirically observed error distributions alongside published data. Simple look-up tables and visualizations of error magnitude per cell (e.g., “with , 95% of cells err by trips”) facilitate downstream uncertainty quantification.
5. Limitations and Responsible Data Use
Despite strong empirical performance for policy-relevant aggregate questions, certain limitations are intrinsic. Small flows (particularly at fine granularity) may be suppressed or have high relative error under moderate . The platform must therefore match equilibrium between privacy risk, spatial scale, and the meaningful utility of released data. Providing clear metadata and error guides supports responsible interpretation and decision-making under uncertainty. A plausible implication is that HDX’s adoption of such privacy frameworks could serve as a template for other stakeholders handling high-sensitivity human mobility data (Kohli et al., 2023).
6. Summary and Ongoing Developments
By deploying per-cell Laplace noise with parameter and simple post-processing, HDX can release O–D mobility matrices with formal -differential privacy. The exponentially decaying error tail in ensures that, in empirical high-density CDR datasets, utility for crisis response and humanitarian targeting remains high at . The general approach supports data-sharing frameworks where formal privacy, empirical utility, and transparent risk-utility trade-offs are central to public-good data dissemination (Kohli et al., 2023).