Papers
Topics
Authors
Recent
Search
2000 character limit reached

Secure k-ish NN for Sensitive Queries

Updated 8 December 2025
  • The paper introduces a sensitive query classifier that uses homomorphic encryption and k-ish NN relaxation to enable secure, scalable query classification.
  • It employs a double-blinded coin-toss primitive to efficiently estimate statistical moments, facilitating encrypted distance computations without revealing sensitive data.
  • Experiments on the Wisconsin Breast Cancer Dataset show a slight accuracy trade-off (F1 ≈ 0.98) with significant gains in speed and communication efficiency.

A sensitive query classifier provides privacy-preserving classification for queries on proprietary datasets, where the client wishes to classify a query point against a database held by a server, without either party exposing their respective data. The Secure k-ish Nearest Neighbors ("k-ish NN") classifier (Shaul et al., 2018) achieves this using homomorphic encryption (HE) and algorithmic relaxations that maintain accuracy while enabling highly scalable, parallel, and communication-efficient deployment.

1. Problem Formulation and Security Constraints

Consider a server holding a database S={xiFpdi=1n}S = \{ x_i \in \mathbb{F}_p^d \mid i = 1 \ldots n \} with binary class labels class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}, and a client holding a query point qFpdq \in \mathbb{F}_p^d. The conventional kNN classifier assigns qq the majority label among the kk nearest points in SS: $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$ The sensitive query scenario mandates that (a) the client learns only the classification result, no information about SS, and (b) the server gains no information about qq nor about any intermediate decrypted values. These properties are enforced via an additively or leveled-fully homomorphic encryption scheme, providing IND-CPA security and the necessary operations on encrypted data.

2. k-ish Nearest Neighbors Relaxation

The core methodological innovation is the relaxation of exact kk-nearest neighbors to a probabilistic "k-ish" selection. Instead of always returning the majority over the strict class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}0 nearest, the classifier computes a random class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}1 such that

class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}2

for tunably small class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}3. The empirical distance distribution class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}4 governs the statistical properties underlying the choice of threshold. If class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}5 is approximately Gaussian, set

class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}6

where class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}7 are mean and standard deviation, and class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}8 is the inverse CDF of the normal distribution. Then, the expected number of points with class(xi){0,1}\mathit{class}(x_i) \in \{0, 1\}9 is qFpdq \in \mathbb{F}_p^d0. The probability distribution over possible qFpdq \in \mathbb{F}_p^d1 follows: qFpdq \in \mathbb{F}_p^d2 with qFpdq \in \mathbb{F}_p^d3. The deviation probability is bounded: qFpdq \in \mathbb{F}_p^d4 where qFpdq \in \mathbb{F}_p^d5 is the statistical distance between the empirical and Gaussian models.

3. Double-Blinded Coin-Toss Primitive

Efficient estimation of moments (mean and variance of distances) under HE is enabled by a "double-blinded coin-toss" primitive. Given ciphertext qFpdq \in \mathbb{F}_p^d6 and modulus qFpdq \in \mathbb{F}_p^d7, a coin is tossed with probability qFpdq \in \mathbb{F}_p^d8, never revealing either the probability qFpdq \in \mathbb{F}_p^d9 or the coin outcome. The pseudocode is:

// Client: pk; Server: P = Enc(p)
draw r ∈ {0,…,m} uniformly
compute r' := r
C ← isSmallerHE(P, r')   // returns Enc([p < r'])
return C

Here, isSmallerHE is a degree-qq0 polynomial returning a homomorphically encrypted bit. To estimate the mean qq1, toss qq2 coins with probability qq3, sum encrypted results, and renormalize. Similarly, for qq4, use probabilities qq5. The variance estimate is

qq6

computed entirely in encrypted space via HE addition and multiplication.

4. Homomorphic Encryption Circuit Architecture

The classification is realized as follows:

  1. Input preparation: Client supplies qq7; server uses clear qq8.
  2. Distance calculation: qq9 via HE polynomials for chosen metric (kk0, squared kk1).
  3. Moment estimation: Parallel double-blinded coin-tosses yield kk2.
  4. Threshold derivation: Compute kk3.
  5. Majority vote: For each kk4, compute kk5, and use for encrypted tallies of the two class labels:

kk6

The overall encrypted majority is kk7.

  1. Output: Server forwards encrypted classifier output.

All modules operate in parallel across kk8, resulting in circuit depth independent of database size kk9: SS0 Depth for each operation is SS1, enabling scalable circuit composition. The plaintext modulus SS2 is chosen to be large enough to avoid wrap-around on quantized data but small enough to maintain manageable polynomial degrees SS3. Optimizations include distance quantization (8–12 bits), slot-packing for batched operations, and precomputed polynomial coefficients for comparison.

5. Security and Correctness Guarantees

The protocol operates under a semi-honest (honest-but-curious) adversarial model. Homomorphic encryption ensures that the client’s query remains hidden and the server’s database is protected, revealing only the final class label. The protocol involves one message from client to server containing encrypted query coordinates, and a return message with the encrypted classification result.

Security is formalized by simulation arguments: the server's observations (public key, encrypted query and output, its database) are simulatable from random ciphertexts under HE IND-CPA security; the client’s view is limited to its input and the encrypted label. Moment estimation leverages Chernoff bounds: SS4

SS5

These concentrate the random outcomes. Combined with statistical distance terms, the probability that the selected SS6 strays from SS7 is exponentially suppressed.

6. Performance Evaluation and Practical Considerations

On the Wisconsin Breast Cancer Dataset (569 points; binary labels), plaintext kNN yields SS8; Secure k-ish NN with grid SS9 achieves $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$0. The classifier incurs approximately one percentage point loss in accuracy, which is compensated by a substantial reduction in computation time. Secure k-ish NN executes in less than three hours on 16 cores with HELib/BGV, whereas naive secure kNN (HE sorting) would require weeks.

Communication is minimized: client sends $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$1 ciphertexts; server responds with one or two ciphertexts. The communication cost scales $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$2 and is independent of $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$3. The circuit size is $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$4 gates, depth $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$5, supporting high parallelism.

Practical implementation tips include use of BGV scheme with $\mathrm{class}_{k\textsf{NN}}(q) = \mathrm{maj} \bigl\{ \mathit{class}(x_i) \ \big| \ \text{dist}(q, x_i) \ \text{is among the %%%%6%%%% smallest} \bigr\}$6–500, leveraging slot-packing, precomputing coefficients for comparison and coin-toss modules, and quantizing distances before encryption.

7. Conceptual Significance and Implications

The k-ish NN classifier demonstrates that relaxing the strict nearest neighbor count to an approximate probabilistic variant fundamentally transforms the scalability of secure classification under homomorphic encryption, by replacing expensive sorting with parallelized coin-toss and comparison modules. The result is a one-round protocol supporting efficient, privacy-preserving analytics at loss of only minimal accuracy. This suggests broader scope for algorithmic relaxations in the development of practical cryptographic machine learning tools in sensitive-query contexts (Shaul et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sensitive Query Classifier.