CAPRISE: Conditional Distance Encryption
- The paper introduces CAPRISE, a symmetric encryption scheme that preserves conditional distance ordering for secure, efficient top‑k retrieval in outsourced settings.
- CAPRISE uses separate encryption methods for queries and database embeddings with controlled randomization to protect sensitive geometric information.
- Experimental evaluations show CAPRISE achieves up to 9× throughput improvements and strong differential privacy guarantees compared to partially homomorphic encryption approaches.
Conditional Approximate Distance-Comparison-Preserving Symmetric Encryption (CAPRISE) is a symmetric cryptographic scheme designed to allow efficient and privacy-preserving similarity search over embedding vectors, particularly in outsourced retrieval-augmented generation (RAG) frameworks. CAPRISE enables cloud providers to determine the relative similarity between an encrypted query and encrypted database embeddings—revealing only the required comparison ordering for top-k retrieval—while concealing all other geometric or pairwise information. It achieves this using two distinct encryption algorithms for queries and database items, employing controlled randomization to protect both content and structural relationships. CAPRISE can be composed with differentially private query perturbation, enabling strong theoretical confidentiality guarantees, high retrieval accuracy, and significantly reduced computational overhead compared to partially homomorphic encryption approaches (Ye et al., 18 Jan 2026).
1. Formal Construction of CAPRISE
CAPRISE operates over an embedding space , parameterized by a distance-gap and key space for the scalar multiplier . It utilizes a pseudorandom function (PRF) and randomness sources for vector-valued and scalar sampling. The encryption involves a separate procedure for database embedding vectors () and for query embeddings ().
- Key Generation:
: Sample , . Return secret key .
- Database Embedding Encryption:
1. (random nonce). 2. . 3. (d-dimensional standard Gaussian). 4. (seed=). 5. . 6. . 7. Return .
- Query Embedding Encryption:
Identical steps, except the offset is , and .
- Database Decryption:
For completeness, decryption reconstructs via determinism from :
This system ensures non-determinism (due to and the randomized offsets), and separate randomization scales between database and query ciphertexts.
2. Conditional Distance-Comparison-Preserving Guarantees
CAPRISE preserves only the conditional ordering for query-to-database distances, crucial for top- nearest neighbor retrieval. Let be a query, be database vectors, and corresponding ciphertexts as previously defined.
- If , then the encrypted comparison will also hold.
- However, for database–database distances, is not guaranteed to imply .
Mathematically, the core property is:
but
This separation is enforced by bounding the noise terms , with query offsets strictly smaller than database offsets. The design makes only the required query-to-database distance orderings accessible to the server, obfuscating all other spatial relationships.
3. Operation, Conditional Logic, and Server Behavior
The distinct noise scales in and render all inter-database distances unusable, while maintaining order reliability for sufficiently separated query-to-database distances (gap at least ) amplified by the scaling parameter .
For top- retrieval, the cloud server, given the encrypted query and stored encrypted database , performs:
- For each , compute .
- Return indices corresponding to the smallest .
The server never learns plaintext distances or the order among the encrypted database vectors themselves. This approach supports secure non-interactive ranking, with privacy bound tightly to the inability to relate database embeddings to one another.
4. Security and Privacy Analysis
The security analysis considers an honest-but-curious adversarial cloud. Embedding encryption security rests on the PRF; given , the noise is computationally indistinguishable from random, and reconstructing is as hard as the underlying PRF. CAPRISE's leakage function exposes only the top- comparison outcome:
- Given , the server learns only the outcome “is among the smallest ?”
Privacy Theorem (as stated):
Under standard PRF assumptions, CAPRISE is -indistinguishable: only the relative order of encrypted query–database distances is exposed; absolute distances and all inter-database relations remain hidden.
To mitigate query analysis, CAPRISE composes with Differential Privacy via the DistanceDP mechanism: before query encryption, is perturbed by Gaussian noise calibrated for -DP guarantees. The resultant query , , is then encrypted as above. As a result, even repeated queries do not expose sensitive user information.
This privacy composition extends to theoretical proofs showing (Theorem, as claimed) that the mechanism satisfies -differential privacy over the embedding space.
5. Computational and Communication Characteristics
CAPRISE is designed for practical throughput and low resource requirements. For each vector, encryption involves cost for sampling and for linear transform and offset addition.
Empirical performance on an NVIDIA A100 GPU is:
- CAPRISE: embeddings/sec (for )
- PHE-based RemoteRAG: –$300$ embeddings/sec
This constitutes approximately a improvement in throughput relative to partially homomorphic encryption, with encryption overhead remaining below 19% of the underlying embedding computation.
Ciphertext size is near-minimal: each entry is the embedding plus a small random string (, ).
| embedding size | CAPRISE embeddings/sec |
|---|---|
| 768 | 2,339 |
| 1536 | 1,800 |
| 3072 | 1,200 |
Each database query and retrieval round thus scales efficiently for practical RAG deployments.
6. Differential Privacy Integration and Retrieval Adjustments
To enforce differential privacy for query protection—especially to defend against inference from repeated or correlated searches—DistanceDP perturbation is applied to each query embedding before encryption. The noise is Gaussian or von Mises–Fisher (), with set according to the desired guarantee.
There exists an explicit relationship between distortion and candidate expansion: to recover the actual top- neighbors under angle distortion , the top- search at the cloud must be expanded to top-, as formalized by:
with database size, surface area on -sphere. Thus, increasing privacy via larger entails a modest computational expansion.
Empirical results found that, despite adding DP noise, local re-ranking after top- retrieval over the encrypted database ensures that the true top- is recovered with high recall (95%), and RAG quality is nearly indistinguishable from the plaintext baseline.
7. Experimental Evaluation and Application to Secure RAG
A detailed case study was performed using the MS MARCO passage-retrieval dataset, with gtr-t5-base embeddings (). Privacy resistance to vector inversion was measured by Vec2Text BLEU and F1 scores: plaintext BLEU , F1 ; CAPRISE-encrypted BLEU , F1 . This indicates strong resistance to plaintext recovery.
Against database vector-analysis (attacker success rate, ASR), CAPRISE outperformed ADCPE for all embedding sizes tested, indicating superior robustness.
Query DP trade-offs were quantified, showing the proportional growth in as a function of the noise setting and database size. Example table:
| top- | () | () | ||
|---|---|---|---|---|
| 5 | 258 | 52 | 58 | 12 |
| 10 | 571 | 57 | 108 | 11 |
| 20 | 928 | 46 | 203 | 10 |
End-to-end retrieval with CAPRISE and DP-perturbed queries reliably recovers the plaintext top- with recall , combining strong privacy with practical RAG deployment (Ye et al., 18 Jan 2026).
In summary, CAPRISE provides a symmetric-key embedding encryption scheme for secure, efficient, and privacy-preserving retrieval in cloud-based settings, ensuring that only query-to-database ordinal information is ever revealed, and together with DistanceDP, supports provable privacy guarantees without significant computational or communication penalty.