Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning

Published 12 Jan 2026 in cs.LG | (2601.07145v1)

Abstract: Developing new fluorophores for advanced imaging techniques requires exploring new chemical space. While generative AI approaches have shown promise in designing novel dye scaffolds, prior efforts often produced synthetically intractable candidates due to a lack of reaction constraints. Here, we developed SyntheFluor-RL, a generative AI model that employs known reaction libraries and molecular building blocks to create readily synthesizable fluorescent molecule scaffolds via reinforcement learning. To guide the generation of fluorophores, SyntheFluor-RL employs a scoring function built on multiple graph neural networks (GNNs) that predict key photophysical properties, including photoluminescence quantum yield, absorption, and emission wavelengths. These outputs are dynamically weighted and combined with a computed pi-conjugation score to prioritize candidates with desirable optical characteristics and synthetic feasibility. SyntheFluor-RL generated 11,590 candidate molecules, which were filtered to 19 structures predicted to possess dye-like properties. Of the 19 molecules, 14 were synthesized and 13 were experimentally confirmed. The top three were characterized, with the lead compound featuring a benzothiadiazole chromophore and exhibiting strong fluorescence (PLQY = 0.62), a large Stokes shift (97 nm), and a long excited-state lifetime (11.5 ns). These results demonstrate the effectiveness of SyntheFluor-RL in the identification of synthetically accessible fluorophores for further development.

Summary

  • The paper introduces an RL-driven approach that integrates property prediction, reaction templating, and dynamic weighting to generate synthetically accessible fluorophores.
  • It demonstrates enhanced prediction accuracy with Chemprop-Morgan models achieving ROC-AUC ~0.90 and MAE below 20 nm, ensuring candidates meet target photophysical criteria.
  • Experimental validation confirmed high-performance leads, including a compound with PLQY 0.62, 97 nm Stokes shift, and 11.5 ns lifetime suitable for live-cell imaging.

SyntheFluor-RL: Reinforcement Learning for Readily Synthesizable Small Molecule Fluorophores

Introduction

The discovery of small molecule fluorophores with tailored photophysical properties is foundational for advancements in bioimaging technologies, including SMLM, FRET, and FLIM. Existing fluorophores are primarily derived from a handful of core scaffolds, limiting available spectra and tunability. While data-driven approaches have advanced predictive modeling for photophysical properties, generative models have often struggled to yield synthetically tractable candidates due to a disconnect from chemical feasibility constraints. The work presented in "Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning" (2601.07145) proposes SyntheFluor-RL, an RL-based generative AI approach that exclusively operates on known reactions and commercially accessible building blocks, thereby ensuring synthetic accessibility while optimizing for multiple target fluorescence properties.

SyntheFluor-RL Workflow and Architecture

The SyntheFluor-RL pipeline integrates property-predictive GNNs, an RL-based generative engine, multi-criteria filtering, and experimental validation in an iterative workflow. Figure 1

Figure 1: Overview of the SyntheFluor-RL pipeline integrating property prediction, RL-based molecule generation within synthetic constraints, filtering, and experimental validation.

Property Predictor Model Development

Training property predictors is a critical initial step. The ChemFluor dataset (2,912 molecules, 63 solvents; 4,336 molecule-solvent pairs) serves as the basis for supervised training of both Chemprop GNNs and MLPs using Morgan and RDKit fingerprints, augmented with solvent descriptors (SP, SdP, SA, SB) to account for solvent effects on photophysical properties. Figure 2

Figure 2: (A) Dataset property distributions, (B) model architectures, (C) classifier evaluation curves, (D) regression plots for absorption/emission, (E) sp2^2 network algorithm for π\pi-conjugation assessment.

Chemprop-Morgan outperforms alternatives for regression tasks, with PLQY classifier ROC-AUC ∼\sim0.90 and MAE for absorption/emission below 20 nm. A bespoke DFS algorithm computes the largest sp2^2-connected network, quantifying conjugation crucial for visible-range emission.

RL-Based Fluorophore Generation and Filtering

SyntheFluor-RL is an extension of SyntheMol-RL, employing an RL value function to optimize for PLQY, absorption, emission, and maximum sp2^2 network size. The method incorporates 70 reaction templates, substantially expanding molecular diversity over previous efforts. Figure 3

Figure 3: (A) RL algorithm schematic, (B) extended reaction space, (C) comparison of property distributions in generated vs. random molecules.

Dynamic property weighting ensures balanced optimization during molecule generation. Of 11,590 generated molecules (over 16.5 hours on 32 CPUs, 1 GPU), those with enhanced PLQY probability and extended conjugation are strongly enriched relative to randomly sampled REAL Space compounds.

Filtering involves sp2^2 network size cut (≥12), PLQY (predicted p>0.5p > 0.5), absorption/emission within the visible range, structure diversity via K-means/Tanimoto clustering, and novelty relative to ChemFluor. TD-DFT calculations and oscillator strength serve as final prioritization, yielding 19 distinct candidates. Figure 4

Figure 4: (A) Structural diversity assessment (intra-/inter-cluster similarity), (B) t-SNE visualization situating generated molecules between existing datasets, (C) oscillator strength selection.

Experimental Validation

Out of 19 selected candidates, 14 were synthesized, with 13 enduring purity/solubility testing. All are insoluble in DMSO/water, but soluble in CHCl₃ for photophysical characterization. Three compounds emerged as highly fluorescent: benzothiadiazole (compound 13), benzofuran (compound 2), and isoxazolopyridine derivatives (compound 11). Notably, compound 13 displays a PLQY of 0.62, Stokes shift of 97 nm, and excited-state lifetime of 11.5 ns—substantially longer than canonical blue dyes such as DAPI or Pacific Blue. Figure 5

Figure 5: (A) Comparison of emission spectra, (B) visual confirmation of compound 13 brightness under UV, (C) excitation/emission spectra, (D) measured fluorescence lifetime.

Live-cell imaging with HEK293 cells confirms that compound 13 readily enters cells and yields concentration-dependent, high-contrast nuclear labeling under standard DAPI excitation. Figure 6

Figure 6: Dose-dependent cell fluorescence from compound 13 at 0, 0.1, 1, and 10 μM, indicating robust cell permeability and strong fluorescence.

Performance Benchmarking and Synthesis Accessibility

The reaction fingerprint of selected molecules confirms that a significant fraction employ recently added reaction templates, emphasizing the necessity of an expanded reaction repertoire to enable extended conjugation networks characteristic of high-performance fluorophores.

Generated compounds exhibit absorption and emission distributions that parallel, but are not redundant with, those of existing fluors in Enamine REAL Space, affirming the chemical realism of the generated set. Figure 7

Figure 7: Distributions of predicted absorption and emission wavelengths in generated vs. random REAL Space molecules.

Implications for Molecular Design and Future Directions

SyntheFluor-RL provides a scalable architecture for multi-objective molecular generation, integrating synthetic accessibility by design. Its property predictors, embedding solvent descriptors, provide more accurate screening for application-relevant conditions. The dynamic weighting mechanism in RL training prevents collapse into local optima and promotes diversity. The exclusive use of synthesizable intermediates and reactions addresses a key limitation of previous AI-driven fluorophore generators, such as ChemTS and FLAME, which either require substantial post-filtering or yield many synthetically infeasible candidates.

Experimentally, the successful identification of three structurally diverse fluorophores—two of which are previously unsynthesized—demonstrates the generative capacity of the approach. While extinction coefficients of the lead are presently suboptimal relative to commercial dyes, the modular pipeline allows for rapid derivatization and optimization.

Potential expansions of SyntheFluor-RL include: (1) increasing the size and diversity of the property prediction training set, (2) integration of explicit aqueous solubilizing group design, (3) enhancement of the algorithm’s building block and reaction templating for even broader photophysical tuning, and (4) further automation in candidate ranking for experimental validation. The approach is readily extensible to other molecular property domains such as sensing, energy materials, and therapeutics, by adapting property predictors and synthetic reaction libraries.

Conclusion

SyntheFluor-RL establishes an effective RL-driven generative design strategy for the discovery of synthesizable small molecule fluorophores with application-relevant photophysical properties. Its integration of property-informed reward shaping, reaction template expansion, and dynamic optimization enables both diversity and targeted exploration within accessible chemical space. The experimental validation of diverse, biologically compatible fluorophores positions this framework as a robust paradigm for iterative, property-guided molecular design across chemical modalities.

Reference: "Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning" (2601.07145).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 16 likes about this paper.