GeneBreaker: Evaluating Jailbreaks on DNA Models
- GeneBreaker is a comprehensive framework that evaluates DNA foundation models' vulnerabilities to jailbreak attacks using targeted sequence generation and in-depth pathogenicity assessment.
- It employs an LLM agent for decoy prompt construction, guided autoregressive sequence generation via beam search, and BLAST alignments against a curated pathogen database.
- The framework demonstrates high attack success rates across multiple viral classes, highlighting significant dual-use biosecurity risks and the need for robust countermeasures.
GeneBreaker is a comprehensive framework for evaluating and actuating jailbreak attacks on DNA foundation models, specifically targeting their propensity to generate pathogenic and potentially harmful DNA sequences despite the presence of inbuilt safety protocols. GeneBreaker integrates an LLM-driven prompt construction module, a beam search-based pathogenicity-optimized sequence generation strategy, and a BLAST-centered evaluation pipeline against a curated pathogen database. The framework systematically probes and quantifies DNA model vulnerabilities, illuminating dual-use biosecurity risks inherent in large-scale DNA sequence generation systems (Zhang et al., 28 May 2025).
1. System Overview and Architecture
GeneBreaker orchestrates a multi-stage pipeline that operationalizes jailbreak attacks on DNA foundation models. The framework consists of three primary components:
- An LLM agent, specifically ChatGPT-4o, is tasked with generating "decoy" prompts—non-pathogenic DNA sequences that exhibit high homology to a target pathogenic gene or region. Prompt curation entails automated retrieval of high-identity, non-pathogenic sequences from GenBank based on structured queries, yielding input scaffolds closely related to, but not functionally identical with, the pathogen of interest.
- Guided autoregressive sequence generation is implemented via beam search, leveraging a composite scoring function that integrates predictions from both the DNA LLM and a secondary "PathoLM" pathogenicity classifier. This enables targeted exploration of sequence space favoring pathogenic signatures under model fluency constraints.
- Evaluation and attack success assessment employ BLAST alignments (both nucleotide and protein) between generated outputs and the JailbreakDNABench, a curated human pathogen sequence database encompassing six high-priority viral categories. Success is declared when identity thresholds (≥90% at DNA or protein level) are satisfied, consistent with functional biosecurity screen cutoffs (Zhang et al., 28 May 2025).
2. Jailbreak Prompt Engineering
Jailbreak prompt construction in GeneBreaker exploits the principle of high-sequence homology using LLM-powered bioinformatics. The sequence prompt is constructed by:
- Selecting GenBank accession IDs of non-human-pathogenic, high-identity homologues to the target pathogenic locus (e.g., feline immunodeficiency virus env regions analogous to HIV-1 env);
- Concatenating a phylogenetic tag which encodes domain context (such as "|D_VIRUS;PSSRNA;…;G_HIV-1");
- Including multiple few-shot homologous, non-pathogenic sequences;
- Appending a genomic prefix segment from the target pathogen.
This in-context learning formulation effectively conditions the DNA LLM to continue generation along a biologically plausible, yet initially innocuous, sequence trajectory, creating the basis for subsequent exploitation during model continuation (Zhang et al., 28 May 2025).
3. Pathogenicity-Guided Sequence Generation
The sequence generation phase proceeds chunk-wise (typically 128 bp increments), employing beam search with each candidate completion scored by a composite objective:
where is predicted pathogenicity derived from a curated, foundation model (PathoLM), is the log-probability of the sequence chunk under the DNA model, and weights model fluency versus pathogenicity maximization. The algorithm can be reformulated as:
At each iteration, candidate continuations are sampled, scored, and top beams are retained. Sequence generation follows an autoregressive pattern:
PathoLM, fine-tuned on bacterial and viral pathogen data, outputs a scalar signal reflecting the probability of functional homology to pathogenic motifs, driving generated sequences toward the desired malicious phenotype within the permissible search distribution (Zhang et al., 28 May 2025).
4. Evaluation Pipeline and Benchmarks
To quantify attack efficacy, GeneBreaker aligns each generated DNA (and translated protein) against JailbreakDNABench using nucleotide and protein BLAST. A successful jailbreak meets the criterion of identity/similarity to any constituent pathogen sequence—mirroring standards for functional screening in biosecurity.
GeneBreaker was systematically evaluated across multiple DNA foundation models (Evo1-7B; Evo2-1B, 7B, 40B) and six viral classes: large DNA, small DNA, +ssRNA, –ssRNA, dsRNA, and enteric RNA viruses. Empirical attack success rates (mean ± SD over five runs for Evo2-40B) were:
| Category | Attack Success Rate (%) |
|---|---|
| Large DNA | 52.0 ± 9.8 |
| Small DNA | 60.0 ± 25.0 |
| +ssRNA | 37.7 ± 5.4 |
| –ssRNA | 26.7 ± 24.4 |
| dsRNA | 20.0 ± 40.0 |
| Enteric RNA | 60.0 ± 20.0 |
Case studies demonstrated high fidelity, with GeneBreaker outputs achieving 92.77% DNA identity and 95.29% amino acid similarity to the Wuhan-Hu-1 SARS-CoV-2 spike (AlphaFold3 RMSD ~0.33 Å vs. crystal structure), and >90% similarity to HIV-1 Env (RMSD ~0.33 Å to PDB 4RZ8) (Zhang et al., 28 May 2025).
5. Evolutionary Modeling and Biological Fidelity
GeneBreaker facilitates large-scale evolutionary modeling by guiding model outputs to recapitulate plausible natural sequence diversity. Using the Wuhan-Hu-1 spike few-shot prompt and elevated sampling temperature, 10,000 novel spike coding DNA sequences were generated; 201 exhibited >99.9% nucleotide identity to real-world Nextstrain variants. Phylogenetic analysis placed these outputs across major SARS-CoV-2 clades (Alpha–Omicron), with per-site mutation entropy distributions mirroring natural variability hotspots, particularly within the N-terminal and receptor-binding domains. This demonstrates the capacity of pathogenicity-guided DNA LMs to regenerate known and previously observed evolutionary variants under jailbreak constraints (Zhang et al., 28 May 2025).
6. Implications for Model Scaling, Dual-Use Risk, and Countermeasures
A consistent scaling law was observed: larger models (Evo2-40B) displayed increased attack success rates relative to smaller counterparts (Evo2-1B, Evo1-7B, Evo2-7B). This suggests that upscaled DNA LMs internalize broader sequence motifs and long-range dependencies critical for reconstructing pathogenic sequences, regardless of explicit exclusion of such sequences during pretraining.
Observations from GeneBreaker motivate several biosecurity interventions:
- Multi-objective fine-tuning strategies that optimize for both DNA sequence quality and explicit pathogenicity avoidance;
- Institutionalization of robust sequence provenance and attribution mechanisms (e.g., watermarking, embedded tags);
- Routine biosecurity-informed red-teaming that integrates automated in-silico screens (as in JailbreakDNABench) with expert human review prior to model deployment.
A plausible implication is that excludive data curation is insufficient, as generative models possess emergent capabilities to produce dual-use sequences from non-pathogenic scaffolds. Integrated technical, procedural, and regulatory safeguards are indicated as next steps for the field (Zhang et al., 28 May 2025).