Papers
Topics
Authors
Recent
Search
2000 character limit reached

Predicting and generating antibiotics against future pathogens with ApexOracle

Published 10 Jul 2025 in cs.LG and q-bio.QM | (2507.07862v1)

Abstract: Antimicrobial resistance (AMR) is escalating and outpacing current antibiotic development. Thus, discovering antibiotics effective against emerging pathogens is becoming increasingly critical. However, existing approaches cannot rapidly identify effective molecules against novel pathogens or emerging drug-resistant strains. Here, we introduce ApexOracle, an AI model that both predicts the antibacterial potency of existing compounds and designs de novo molecules active against strains it has never encountered. Departing from models that rely solely on molecular features, ApexOracle incorporates pathogen-specific context through the integration of molecular features captured via a foundational discrete diffusion LLM and a dual-embedding framework that combines genomic- and literature-derived strain representations. Across diverse bacterial species and chemical modalities, ApexOracle consistently outperformed state-of-the-art approaches in activity prediction and demonstrated reliable transferability to novel pathogens with little or no antimicrobial data. Its unified representation-generation architecture further enables the in silico creation of "new-to-nature" molecules with high predicted efficacy against priority threats. By pairing rapid activity prediction with targeted molecular generation, ApexOracle offers a scalable strategy for countering AMR and preparing for future infectious-disease outbreaks.

Summary

  • The paper introduces ApexOracle, an AI-driven framework that integrates genomic, textual, and molecular data for antibiotic prediction and generation.
  • It demonstrates a 27.1% increase in prediction accuracy for minimum inhibitory concentrations compared to state-of-the-art models.
  • The system successfully generalizes to unseen bacterial strains and generates novel compounds with lower Tanimoto similarity to existing antibiotics.

Predicting and Generating Antibiotics Against Future Pathogens with ApexOracle

The paper "Predicting and generating antibiotics against future pathogens with ApexOracle" presents a novel AI-driven framework, ApexOracle, for predicting and generating effective antibiotics against both known and emerging pathogens. It leverages a unique integration of pathogen genomic, textual, and molecular data to create a unified system that not only forecasts antibiotic efficacy but also generates novel antibiotic candidates. The following sections provide a detailed overview of ApexOracle's architecture, evaluation, and implications.

ApexOracle Architecture

ApexOracle is designed to address the limitations of existing antimicrobial models, which often focus on isolated strain-specific datasets. Its core architecture consists of three key representation modules:

  1. Genomic Encoder: Utilizes Evo2, a DNA LLM, to encode pathogen genomic data into numerical representations capturing genetic hallmarks.
  2. Textual Trait Encoder: Employs a fine-tuned Me-LLaMA model to process descriptions of pathogen traits, providing phenotypic context that complements genomic data.
  3. Molecular Representation Learning and Generation Module: Based on a Diffusion LLM (DLM), this module transforms molecular structures into latent spaces and generates new molecules. Figure 1

    Figure 1: The architecture of ApexOracle and DLM training tasks, illustrating the integration of pathogen strain knowledge for antimicrobial prediction and molecular generation.

ApexOracle's unified framework allows it to predict antimicrobial efficacy and design new compounds within the same architecture, ensuring that generated antibiotics are contextually tailored to specific pathogens.

Evaluation of ApexOracle

The evaluation of ApexOracle demonstrates its superior performance across multiple dimensions of antimicrobial prediction and generation. Key findings include:

  • Prediction Accuracy: ApexOracle consistently outperforms state-of-the-art models in predicting the minimum inhibitory concentrations (MICs) of antibiotics for various bacterial strains. For example, the DLM-based representations significantly improved prediction accuracy by 27.1% in R2R^2 compared to alternative models (Figure 2a).
  • Generalization to Unseen Strains: Through hierarchical evaluation strategies, ApexOracle successfully predicts antibiotic activity for strains it was not trained on, maintaining robust accuracy across different taxonomical clusters (Figures 2b-g).
  • Small Molecule Antibiotic Discovery: ApexOracle excels in predicting small molecule antibiotic efficacy without requiring extensive strain-specific datasets, outperforming baseline classifiers in both zero-shot and cross-validated settings (Figure 2h). Figure 2

    Figure 2: Evaluation of ApexOracle prediction performance against various models and clustering strategies.

Molecule Generation and Novelty

A significant capability of ApexOracle is its ability to generate novel antibiotic candidates. Using predictor-guided generation, the model proposes molecular structures with high predicted potency and structural novelty, as evidenced by lower Tanimoto similarities to existing compounds (Figures 3a-d). Figure 3

Figure 3: Predicted MIC distributions and structural novelty of generated molecules, highlighting the efficacy of pathogen-guided generation.

Implications and Future Directions

ApexOracle represents a substantial advancement in antimicrobial discovery by integrating multimodal data streams to predict and generate candidate antibiotics. Its ability to generalize to unseen pathogens and explore unconventional chemical spaces offers promising strategies for preemptively combating antimicrobial resistance (AMR).

Future developments could focus on addressing current limitations, such as incorporating toxicity and synthetic feasibility into the generation process and expanding the model to other pathogen types such as viruses. With further refinement, ApexOracle could become a critical component in rapid antibiotic discovery, capable of immediately responding to new infectious threats through AI-driven design.

Conclusion

ApexOracle establishes a new paradigm in antibiotic development, combining pathogen knowledge with advanced AI techniques to predict and design therapeutics against both current and future bacterial threats. It bridges critical gaps in existing methods, offering a scalable and proactive approach to combat AMR and infectious diseases. Continued efforts to enhance its capabilities and integrate real-time data will further align AI-driven methodologies with urgent clinical needs.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 17 likes about this paper.