The Capacity of Private Information Retrieval from Byzantine and Colluding Databases

Published 5 Jun 2017 in cs.IT, cs.CR, cs.IR, and math.IT | (1706.01442v1)

Abstract: We consider the problem of single-round private information retrieval (PIR) from $N$ replicated databases. We consider the case when $B$ databases are outdated (unsynchronized), or even worse, adversarial (Byzantine), and therefore, can return incorrect answers. In the PIR problem with Byzantine databases (BPIR), a user wishes to retrieve a specific message from a set of $M$ messages with zero-error, irrespective of the actions performed by the Byzantine databases. We consider the $T$-privacy constraint in this paper, where any $T$ databases can collude, and exchange the queries submitted by the user. We derive the information-theoretic capacity of this problem, which is the maximum number of \emph{correct symbols} that can be retrieved privately (under the $T$-privacy constraint) for every symbol of the downloaded data. We determine the exact BPIR capacity to be $C=\frac{N-2B}{N}\cdot\frac{1-\frac{T}{N-2B}}{1-(\frac{T}{N-2B})^M}$, if $2B+T < N$. This capacity expression shows that the effect of Byzantine databases on the retrieval rate is equivalent to removing $2B$ databases from the system, with a penalty factor of $\frac{N-2B}{N}$, which signifies that even though the number of databases needed for PIR is effectively $N-2B$, the user still needs to access the entire $N$ databases. The result shows that for the unsynchronized PIR problem, if the user does not have any knowledge about the fraction of the messages that are mis-synchronized, the single-round capacity is the same as the BPIR capacity. Our achievable scheme extends the optimal achievable scheme for the robust PIR (RPIR) problem to correct the \emph{errors} introduced by the Byzantine databases as opposed to \emph{erasures} in the RPIR problem. Our converse proof uses the idea of the cut-set bound in the network coding problem against adversarial nodes.

Abstract PDF Upgrade to Chat

Citations (173)

View on Semantic Scholar

Summary

The paper models Private Information Retrieval (PIR) capacity when databases are Byzantine or colluding, deriving a formula $C = \frac{N-2B}{N} \cdot \frac{1 - \frac{T}{N-2B}}{1 - (\frac{T}{N-2B})^M}$ that quantifies the impact of malicious entities.
An achievable scheme is proposed that extends robust PIR (RPIR) by incorporating error correction using punctured MDS codes and an outer-layer MDS code to combat Byzantine errors.
A converse argument using a cut-set bound adapted for PIR validates the capacity formula by showing user ignorance of honest databases effectively removes the influence of $2B$ malicious databases.

Capacity of Private Information Retrieval from Byzantine and Colluding Databases

The paper addresses a critical issue in the domain of Private Information Retrieval (PIR) concerning the reliability and privacy preservation when interacting with replicated databases that may not always be trustworthy. It explores the scenario where databases, part of the PIR protocol, can be Byzantine and adversarial, leading to incorrect responses and potentially compromised privacy. This research posits a model referred to as Byzantine PIR (BPIR), integrating a robust method to calculate the PIR capacity under these adverse conditions.

Key Contributions

Problem Formulation: The study models a single-round PIR framework involving $N$ replicated databases, which may include $B$ Byzantine databases capable of returning malicious responses and $T$ databases potentially colluding. It defines the precise capacity $C$ of this PIR problem, accounting for such Byzantine and colluding threats, contingent on the condition that $2B + T < N$. The derived capacity is given by:

$C = \frac{N-2B}{N} \cdot \frac{1 - \frac{T}{N-2B}}{1 - \left(\frac{T}{N-2B}\right)^M}$

This expression showcases the degree to which Byzantine and colluding behaviors impact the retrieval rate, reflecting how these malicious entities emulate the absence of $2B$ honest databases from the system.

Achievability Scheme: The paper introduces an achievable scheme resilient to the adverse impacts of Byzantine databases. This scheme extends the existing robust PIR (RPIR) approach by incorporating error correction mechanisms against Byzantine errors via punctured MDS codes for undesired information and an outer-layer MDS code for the desired data.
Converse Argument: By utilizing a cut-set bound adapted for the PIR setting, the authors derive an upper bound leveraging the knowledge from network coding theory. This bound indicates that user ignorance regarding the identity of honest databases effectively removes $2B$ databases’ influence, providing a coherent theoretical limit underpinning the achievable capacity's validity.

Implications

The results reassert the necessity of redundancy in storage systems dealing with unreliable or malicious entities. In practical terms, systems designed under these models ensure robustness against errors and attacks that may otherwise compromise data integrity and privacy. The insights into capacity degradation due to Byzantine and corruptible databases underscore the careful consideration required in database replication and retrieval protocol design.

From a theoretical perspective, extending PIR research to accommodate adversarial environments enriches understanding and paves the way for designing protocols that remain robust under unpredictable and unfavorable conditions. This paper suggests a broader application of coding theory in PIR, particularly the utility of MDS codes for tackling Byzantine errors within a collective retrieval framework.

Future Directions

Future research might focus on three key areas:

Dynamic Byzantine Environments: Given the stationary approach of the current model concerning Byzantine actors, exploring models capable of dynamically adapting to changing Byzantine database sets and attack patterns remains an open challenge.
Expanded Adversarial Models: Understanding user-side adversarial behavior and its integration with database-focused models could provide holistic security strategies in PIR protocols.
Scalable Multi-Round Protocols: While accommodating single-round interactions, exploring scalable multi-round protocols in unsynchronized environments might offer enhanced rates alongside more balanced trade-offs between efficiency and security.

In summary, this study makes pivotal strides in formulating and tackling PIR challenges in Byzantine and colluding contexts, setting robust theoretical foundations and potential practical pivots for future solutions in this research domain.

Markdown Report Issue