FPEdit: Robust LLM Fingerprinting
- FPEdit framework is a robust method for fingerprinting LLMs by embedding verifiable natural language signatures via localized edits in FFN projection matrices.
- It uses a promote-suppress value vector optimization to boost target token probabilities while suppressing competitors, maintaining stealth and resilience against adaptation and compression.
- The framework demonstrates high efficiency with minimal impact on model performance, validated through strong success rates under various fine-tuning, pruning, and quantization scenarios.
FPEdit is a framework for robust fingerprinting of LLMs through localized parameter editing. It enables the insertion of verifiable natural language signatures directly into transformer feed-forward network (FFN) projection matrices, providing reliable provenance tracing in adversarial deployment scenarios. Unlike intrinsic fingerprinting methods requiring full parameter access, or backdoor-based techniques susceptible to statistical trigger detection, FPEdit employs sparse, targeted knowledge edits that are stealthy, resilient to adaptation and model compression, and minimally disruptive to model utility (Wang et al., 4 Aug 2025).
1. Fingerprinting via Localized Knowledge Editing
FPEdit conceptualizes fingerprinting as a knowledge-editing task, formulating the embedding of hidden “signatures” as locally constrained model modifications. It selects a set of Natural Language Fingerprint (NLF) trigger–target pairs—for example, (“MODEL CONFERENCE” → “ICLR”)—chosen to have a perplexity distribution similar to genuine user inputs. The framework operates by:
- Identifying the key vector associated with each trigger at each FFN layer.
- Computing a new value vector that simultaneously promotes the fingerprint target and suppresses competing tokens at the generation position.
- Solving a sparse, locally constrained least-squares problem to minimally perturb the FFN output projection matrix.
This results in signatures that persist through full fine-tuning, parameter-efficient adaptation (e.g., LoRA), pruning, quantization, and stochastic decoding, while maintaining stealthiness (natural input perplexity) and downstream performance.
2. Promote-Suppress Value Vector Optimization
Central to FPEdit is the “promote-suppress” value-vector optimization, which explicitly controls the token probabilities at the selected layer and generation step. For a fingerprint pair , the process involves:
2.1 Context-Free Key Vector Computation
The context-free key vector is derived as: where is the hidden state of the trigger token, is the layer normalization, and is the FFN activation.
2.2 Promotion-Only vs. Promote-Suppress Objective
A baseline editing approach promotes by minimizing , but leaves a tail of competing tokens. FPEdit’s loss function augments this by penalizing non-target token probabilities with a suppression term: where controls suppression strength. The optimal value vector is obtained as:
2.3 Localized FFN Weight Update
To inject each fingerprint, FPEdit applies a closed-form update: where projects into the nullspace of the unedited keys, and tracks prior edits.
3. Fingerprint Injection Pipeline
FPEdit provides a systematic procedure for fingerprinting any LLM given white-box access. For fingerprint pairs at layer :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Input: Pretrained model f with W_proj^l, Fingerprint pairs {(x_i,y_i)}_{i=1}^n, Nullspace basis P (from unedited keys K0,V0), Hyperparameter λ, learning rate η, Pre-existing edited keys K_p,V_p = ∅ For each (x_i,y_i) in the fingerprint set do: 1. Forward-pass x_i to obtain h^{l−1}(x_i). 2. Compute key k_i^* ← σ(W_fc^l · γ(h^{l−1}(x_i))). 3. Initialize z ← random or W_proj^l k_i^*. 4. For t = 1…T (gradient steps) do: • Compute L(z) as in (2). • z ← z − η · ∇_z L(z). end Set v_i^* ← z. 5. Compute Δ_i via closed-form formula above. 6. Update W_proj^l ← W_proj^l + Δ_i. 7. Append k_i^*, v_i^* to K_p, V_p; update P. end Output: Fingerprinted model with only localized edits in W_proj^l. |
4. Natural Language Trigger Selection and Embedding
Trigger–target pairs are selected to resemble plausible, low-probability factual queries (e.g., “MODEL LICENSE” “APACHE”). Statistical stealth is ensured by matching trigger perplexity () to user input distributions. Typically, 10 diverse pairs are employed to balance redundancy and minimize detection risk. Embedding each pair as outlined yields FFN memories that decisively favor target generation, evidenced by empirical results—the clean model reliably outputs the target for the trigger under stochastic decoding, with competing tokens eliminated.
5. Evaluation: Robustness, Stealth, Harmlessness
Quantitative assessment employs the Fingerprint Success Rate (FSR): $\mathrm{FSR} = \frac{1}{n}\sum_{i=1}^n \mathbbm{1}[f(x_i)=y_i]$ with metrics computed both pre- () and post- () downstream adaptation, alongside harmlessness (average drop in NLP benchmark performance).
- Persistence under adaptation: FPEdit achieves for full fine-tuning, for LoRA (rank 16), outperforming Direct_SFT, Proflingo, IF, UTF.
- Robustness to compression: FPEdit maintains FSR under 8-bit/4-bit quantization, FSR under 5–20% structured pruning, and FSR up to 20% model merging dilution.
- Stealthiness: FPEdit’s NLF triggers exhibit , indistinguishable from standard user input (–$60$), while alternative methods yield anomalously high perplexity ().
- Harmlessness: Relative performance degradation across 20 benchmarks (e.g., ANLI, SuperGLUE, MMLU, ARC) is ; SFT-based fingerprints typically reduce scores by several points.
6. Computational Requirements and Comparative Analysis
FPEdit demonstrates substantial resource efficiency:
| Method | Time (10 pairs, LLaMA2-7B) | Memory (A100-40 GB) |
|---|---|---|
| FPEdit | < 2 min | 25–30 GB |
| IF/UTF (SFT) | > 5–10 min | > 120 GB |
| Proflingo | ~1.5 h (single fingerprint) | — |
No optimizer states are required; only simple forward/backward passes and closed-form linear solves. FPEdit is 30–50× faster and 4–5× lighter in memory than SFT approaches (IF/UTF) and dramatically more efficient than prefix-search (Proflingo).
7. Limitations and Prospective Development
Limitations include vulnerability to fully informed adversaries—knowledge of edited FFN layers and surgical row pruning or re-projection can partially erase the fingerprint. FPEdit requires white-box access for injection and cannot retroactively fingerprint existing open models. Analyses on representation shattering suggest knowledge edits may have subtle, unanticipated effects outside tested domains.
Future directions involve obfuscation of edit locations and layer selection randomization to harden fingerprints against adversarial removal, multi-modal extension (e.g., vision–LLMs via MULTIEDIT), and integration with watermarking—a plausible implication is more comprehensive AI asset protection via provenance verification and functionality tracing.
FPEdit represents the first LLM fingerprinting approach to combine robust adaptation resilience, stealth against perplexity-based filters, and negligible impact on downstream utility, all with highly efficient resource requirements at insertion time (Wang et al., 4 Aug 2025).