onepot CORE: Automated Synthesis Platform
- onepot CORE is an integrated automated platform that combines ML-based feasibility assessment, an enumerated chemical space of 3.4 billion compounds, and robotics for small-molecule synthesis.
- It employs seven robust single-step reactions with a neural network for route planning, achieving yields from 40–80% and turnaround times as short as five business days.
- The system streamlines DMTA cycles in drug discovery by automating synthesis, purification, and analysis, effectively eliminating traditional production bottlenecks.
onepot CORE (“Compounds On‐demand via Robotic Execution”) is an end-to-end automated platform that integrates enumerated chemical space, advanced machine-learning–driven feasibility assessment, and robotic execution to streamline small-molecule synthesis for drug discovery. The platform comprises 3.4 billion enumerated, synthesizable small molecules, an AI chemist ("Phil") for route planning, automated synthesis, and analysis, along with a fully integrated robotic system for synthesis and purification. Its primary objectives are to eliminate the "make" bottleneck in the classical Design–Make–Test–Analyze (DMTA) cycle through on-demand synthesis, predefine a chemically accessible space, maximize yield and success rates via ML-driven route selection, and provide rapid turnaround times—often as short as five business days in the U.S. setting (Tyrin et al., 18 Jan 2026).
1. Enumerated Chemical Space and Scope
The onepot CORE chemical space enumerates 3.4 × 109 unique, synthetically accessible compounds. Enumeration is performed via systematic pairing of curated building blocks with seven robust reaction classes, followed by ML-based feasibility filtering. The process leverages distributed computation and chemoinformatics standardization to ensure deduplication and integrate risk/cost scores inherited from building-block supplier meta-data.
The enumeration workflow:
- Curate and standardize building blocks (SMILES format) from multiple U.S.-based supplier catalogs.
- Apply canonical SMIRKS transformation templates over seven reaction classes (see Section 2) to exhaustively generate candidate products.
- Remove duplicates and trivial failed pairs using standard chemoinformatic filters.
- Apply an ML-based feasibility model to select robustly synthesizable compounds.
Naive O(N2) combinatorics are managed with template–SMARTS grouping and parallel execution on hundreds of compute cores, reducing computational cost and enabling construction of a massive, filtered product list (Tyrin et al., 18 Jan 2026).
2. Reaction Classes and Building Block Curation
2.1 Supported Single-Step Reaction Classes
The current deployment supports seven broadly used, robust single-step transformations, with each class associated with generic SMIRKS templates:
| Reaction class | Mechanistic highlight/SMIRKS |
|---|---|
| Amide coupling | [R‑COOH] + [R‑NH] [EDC/HATU/DIPEA] R‑CO‑NH‑R |
| Suzuki–Miyaura | [Ar‑X] + [Ar‑B(OH)] [Pd/KCO/Base] Ar‑Ar |
| Buchwald–Hartwig amination | [Ar‑X] + [R‑NH] [Pd/tBuXPhos/Base] Ar‑NH‑R |
| Urea synthesis | [R‑NH] + [R‑NH] [CDI] R‑NH‑CO‑NH‑R |
| Thiourea synthesis | [R‑NH] + [R‑NH] [CSCl] R‑NH‑CS‑NH‑R |
| N-Alkylation | [R‑NH] + [R‑X] [Base] R‑NH‑R |
| O-Alkylation | [R‑OH] + [R‑X] [Base] R‑O‑R |
2.2 Building Block Selection
The platform aggregates meta‐catalogs from suppliers, annotating each building block with cost, shipping reliability ("supplier risk score"), and other meta-data. In processing, high molecular weight species (>500 Da), highly reactive functionalities, and isotopically labeled reagents (unless specifically requested) are excluded. During enumeration, supplier risk and price scores propagate to derived products, enabling downstream ranking for cost and procurement efficiency.
3. Enumeration, Feasibility Assessment, and ML
3.1 Enumeration Algorithm
Building blocks are grouped by SMARTS patterns to match the requirements of each reaction class. For each reaction type and valid pair, SMIRKS transformations produce candidate product SMILES. Distributed computation handles the O(N2) scaling. Deduplication ensures unique product generation, each annotated with precursors, suppliers, and stoichiometry.
3.2 Machine-Learning–Driven Feasibility Filtering
To address limitations of rule-based/template-only filtering—which produces substantial false positives/negatives—onepot CORE employs a multilayer feed-forward neural network for feasibility prediction. The architecture uses concatenated Morgan fingerprints (radius 2), physicochemical descriptors, and reaction-class encoding as input features, with two 512-unit hidden layers (ReLU activations) and a sigmoid classifier. Model calibration maps predicted probabilities () to empirical success, assigning "Low," "Medium," or "High" chemical risk to each proposed reaction.
Training employs a staged data hierarchy:
- Pre-training on large, public, noisy-reaction datasets
- Medium-fidelity miniaturized LC/MS screen data
- High-fidelity full-scale synthesis outcomes
Performance on a balanced test set with internal full-scale data achieves validation accuracies per reaction class up to 77–79% for amide coupling and Suzuki–Miyaura transformations:
| Reaction | Public Only | + Miniaturized | + Full Scale |
|---|---|---|---|
| Amide coupling | 55% | 68% | 77% |
| Suzuki–Miyaura | 58% | 70% | 79% |
| Buchwald–Hartwig | 50% | 63% | 72% |
| Urea synthesis | 52% | 65% | 74% |
4. Automated Synthesis Workflow
4.1 Route Planning, Procurement, and Reagent Handling
Phil, the platform’s AI chemist, uses a retrosynthesis engine to enumerate all viable routes, selecting optimal halogen or functional group exchanges as required. If any required building block is unavailable or out of stock among vetted suppliers, synthesis is refused. Received solids are barcoded into fixed-format vials, stock solutions are prepared (0.1–0.2 M in DMSO), and automation scripts (Python-based) handle all pipetting and mixing, including inert-atmosphere operations as needed.
4.2 Reaction Execution, Workup, and Purification
Reactions are executed with active control of conditions (25–80 °C, 1–4 hours), then subject to direct, automated aqueous workup. Crude mixtures are automatically analyzed and purified with a HPLC-MS–guided, mass-triggered fraction collector, with UV (254 nm) and single-quad MS guiding product assessment. Post-purification, vials are re-weighed, re-analyzed by LC/MS, and either dried or dispensed as 10 mM DMSO solutions.
Isolated yields typically range from 40–80%, dependent on precursors and reaction class.
4.3 Phil’s Role in Operation
Phil leverages LLM-based planning for protocol design (bases, solvents, temperature profiles), real-time chromatogram interpretation (byproduct identification), and continuous improvement via feedback of full-scale synthesis outcomes into the ML model.
5. Analytical Validation and Biological Applications
5.1 Operational Metrics
The end-to-end synthesis, QC, and shipping workflow achieves turnaround as short as 5 U.S. business days (median 9–10 days), as validated in a preliminary "50-compound" run:
- 20 compounds delivered in 3 days
- 40 in 7 days
- 50 in 10 days (delayed by supplier shipment)
5.2 Purity and Structural Confirmation
Automated LC/MS purity analyses show >95% purity across all reaction classes. Structural identity validation (with H NMR) on a subset of 24 compounds confirmed scaffold correctness, with representative purity figures displayed below:
| Example reaction | LC/MS purity | H NMR purity |
|---|---|---|
| Amide coupling | 100% | 95.4% |
| Suzuki–Miyaura | 100% | 100% |
| CDI-mediated urea | 100% | 100% |
5.3 Biological Assay Suitability: DPP4-Inhibitor SAR
A focused library of DPP4 inhibitor analogs was synthesized and evaluated in a standard fluorescence assay (AMC substrate, 37 °C). Literature-aligned IC values and SAR patterns were reproduced:
| Compound | Core scaffold | IC (nM) |
|---|---|---|
| Sitagliptin (resynthesized) | — | 643 |
| Vildagliptin (resynth.) | — | 804 |
| Optimized analog (Y10) | [structure] | 277 |
| Further lead (Y12) | [structure] | 311 |
| Reference literature | — | 19 |
This close match to literature benchmarks confirms the biological and analytical suitability of onepot CORE outputs.
6. Impact, Limitations, and Prospective Developments
onepot CORE enables a substantial acceleration of DMTA cycles: with turnaround times shrunk to days, laboratories can execute up to five times more iterative learning cycles within existing project constraints. ML-powered feasibility assessment facilitates more quantitative chemical risk exploration, enabling synthesis of "risky" functional group combinations with well-calibrated probabilities of success.
Current limitations include restriction to single-step transformations, dependence on a non-exhaustive commercial building block catalog, and absence of routine tandem MS confirmation. Ongoing work targets the integration of multi-step retrosynthetic routes, additional reaction classes (e.g., one-pot cascades, SuFEx, and photoredox catalysis), expanded analytical QC (ELSD/CAD, tandem MS), and continued AI/automation advances, including autonomous condition optimization and in silico retrosynthesis.
onepot CORE constitutes an unprecedented integration of enumerated chemical space, machine learning–guided synthesis planning, and fully autonomous laboratory robotics, thereby enabling faster, more reliable, and broader access to diverse small molecules for pharmaceutical and broader chemical research (Tyrin et al., 18 Jan 2026).