AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery

Published 14 Nov 2025 in cs.AI, cs.CE, and cs.LG | (2511.11257v1)

Abstract: The discovery of novel Ionic Liquids (ILs) is hindered by critical challenges in property prediction, including limited data, poor model accuracy, and fragmented workflows. Leveraging the power of LLMs, we introduce AIonopedia, to the best of our knowledge, the first LLM agent for IL discovery. Powered by an LLM-augmented multimodal domain foundation model for ILs, AIonopedia enables accurate property predictions and incorporates a hierarchical search architecture for molecular screening and design. Trained and evaluated on a newly curated and comprehensive IL dataset, our model delivers superior performance. Complementing these results, evaluations on literature-reported systems indicate that the agent can perform effective IL modification. Moving beyond offline tests, the practical efficacy was further confirmed through real-world wet-lab validation, in which the agent demonstrated exceptional generalization capabilities on challenging out-of-distribution tasks, underscoring its ability to accelerate real-world IL discovery.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that AIonopedia integrates multimodal LLMs and contrastive learning for end-to-end ionic liquid discovery.
It employs a dual-tower design combining molecular graphs, SMILES, and physicochemical descriptors, achieving superior predictive accuracy and out-of-distribution generalization.
The agent's hierarchical screening and wet-lab validations confirm its practical utility in uncovering novel, high-capacity ammonia-absorbing ionic liquids.

AIonopedia: An LLM-Driven Multimodal Platform for Ionic Liquid Discovery

Introduction

The rapid expansion of chemical research—specifically in the domain of ionic liquids (ILs)—demands automated methodologies for molecular design, screening, and property prediction. Traditional approaches to IL discovery, relying on expert intuition or physics-based computations, confront combinatorial complexity, scarce labeled data, and fragmented workflows. Recent advances in deep learning have enhanced predictive capabilities, but most models remain constrained by unimodal representations or narrow data coverage. AIonopedia redefines the chemical informatics landscape by integrating state-of-the-art LLMs with multimodal molecular representations, hierarchical search strategies, and tool orchestration, thereby delivering a robust, end-to-end agent for IL research (2511.11257).

Architecture and Pipeline

AIonopedia is anchored by a ReAct-driven planner harnessing GPT-5, which combines reasoning and tool invocation for flexible, multi-step workflow execution. The agent interacts with six specialized modules: web search, chemical structure retrieval (PubChem), SMILES canonicalization, data processing, a multimodal property predictor, and a molecule searcher. This suite of tools enables exhaustive data acquisition, structure normalization, property estimation, and hierarchical screening.

The property predictor is built on a dual-tower, multimodal contrastive learning paradigm aligning molecular graphs, SMILES sequences, and physicochemical descriptors. Modality fusion is realized through cross-attention architectures stacked atop graph-transformer and LLM encoders. Training proceeds in two stages: (1) modality alignment using 2.8 million synthetic samples from curated molecular libraries and (2) fine-tuning on a newly developed, comprehensive IL dataset (∼100,000 labeled entries), which eradicates redundancy and augments chemical diversity.

Hierarchical molecule screening leverages property-guided beam search and Tanimoto similarity to traverse the chemical space, identifying candidates for both in silico optimization and wet-lab experimental validation. This framework addresses the limitations of generative models that frequently produce chemically unrealistic molecules, enabling controlled and feasible exploration.

Data Acquisition and Representation

AIonopedia introduces novel data extraction protocols to reconcile non-standard ion abbreviations with canonical SMILES and molecular names. This capability is critical in IL research, where informally denoted ions impede automatic mapping and downstream computation. Leveraging web-enabled GPT-5, the agent achieves a canonical SMILES match accuracy of 94.7%, outperforming all compared LLMs. The dataset underlying property prediction encompasses the broadest diversity of IL species and solute-solvent systems to date, facilitated by meticulous fingerprinting (ECFP, MACCS, atom-pair, PubChem) and robust clustering.

Physicochemical descriptors employed include a suite of twenty-one features—hydrogen-bonding, rotatable bonds, surface area, stereochemistry, partition coefficients, molecular reactivity, aromatic/ring content, atomic composition, and shape indices—providing high-dimensional, discriminative characterization of IL candidates.

Property Prediction: Evaluation and Ablation

Comparison against leading approaches—domain-specific LLMs (Galactica, Qwen3, Gemma3), multimodal models (MolCA, SPMM, LlaSMol, PRESTO), chemoinformatics baselines (ILBERT, MLP)—demonstrates that AIonopedia delivers the best results on nearly all evaluated metrics and splits. Notably, the Qwen3-0.6b variant achieves top ranking across 20 evaluation metrics, exhibiting superior OOD generalization and transfer performance.

Numerical highlights:

RMSE for solvation free energy on strict cation-based splits: 0.328 kcal/mol (Qwen3-0.6b).
RMSE for melting point prediction: 39.9 ± 3.4 K (Qwen3-0.6b).
Pearson $r$ for transfer free energy and mass density consistently above 0.97.

Ablation studies confirm that modality alignment and the combination of graph and text modalities are essential for peak performance; omitting either impairs predictive fidelity. For instance, removal of the alignment phase nearly doubles RMSE for IL/water transfer AG.

Benchmarking Against Simulation and ML Models

In direct comparison with molecular dynamics (MD) simulations performed in GROMACS, AIonopedia achieves lower error ranges and higher correlation for solute-solvent interaction metrics, particularly on OOD systems absent from the training set, e.g., [P4442]+[DEP] and [P66614]+[L-Lact]. Standard MD approaches require hours to days per system and struggle with coverage and generalization; AIonopedia surmounts these hurdles with predictions well within chemical precision. On small datasets for mass density and hydration free energy, the agent either matches or exceeds performance of traditional ML models and simulations.

Wet-Lab Validation and Molecular Discovery

Transitioning from in silico optimization to experimental chemistry, AIonopedia's screening pipeline was deployed for real-world wet-lab validation in the context of ammonia absorption. Excluding all literature-reported ILs for NH3 absorption, the agent identified [P4442]+[DEP]—a phosphorus-centered ionic liquid—as a novel, high-capacity ammonia absorber (1.80 mol/mol at 25°C, 95% NH3). This result expands the IL design space beyond expert-driven family constraints and showcases the generalization power of the agent. Comparative data mining further corroborates that modeled solvation free energy serves as a thermodynamic proxy for gas absorption efficiency.

Practical and Theoretical Implications

AIonopedia establishes a new paradigm in chemical informatics by fully automating the IL discovery pipeline, from knowledge curation to property prediction and experimental validation. Its multimodal contrastive learning approach leverages available unlabeled data and robustly generalizes to new chemical spaces. The agent's modular, ReAct-driven architecture is extensible to other molecular domains, facilitating tool orchestration and automated hypothesis exploration.

Practically, this system accelerates green solvent design, supporting applications in carbon capture, solute extraction, battery electrolytes, biomass conversion, and pharmaceuticals. The ability to orchestrate molecular search and validation through high-dimensional, chemistry-aware LLM reasoning portends the rise of autonomous research agents, potentially capable of generating and testing hypotheses with minimal human oversight.

Theoretically, AIonopedia highlights the advantages of contrastive multimodal alignment in chemical systems, transcending limitations of unimodal molecular foundation models. The fusion of LLMs with chemical graph encoders and descriptor spaces represents a scalable blueprint for future AI-driven molecular science.

Conclusion

AIonopedia marks a decisive advancement in the field of IL informatics, integrating multimodal representation learning with agentic tool orchestration and wet-lab validation. It achieves superior predictive accuracy, OOD generalization, and end-to-end workflow automation when benchmarked against both ML and physics-based baselines. The practical realization of zero-shot chemical discovery—confirmed in previously unexplored phosphorus-centered ionic liquids—demonstrates the agent's strong utility. Future work may extend AIonopedia towards complete autonomy in experimental design, real-time hypothesis testing, and dynamic literature synthesis, further accelerating innovation across chemistry and materials science.