Open Polymers 2026: Unified Polymer Modeling

Updated 3 February 2026

OPoly26 is an open-science platform offering a unified framework for atomistic polymer modeling, digital representation, and simulation workflows.
It provides extensive DFT and MD datasets covering diverse chemical architectures and monomer types, ensuring reproducible and transferable polymer informatics.
The infrastructure integrates machine learning, digital grammars, and automated simulation setups to accelerate polymer design, discovery, and property prediction.

Open Polymers 2026 (OPoly26) designates the most extensive open-science infrastructure for atomistic polymer modeling, digital polymer representation, high-throughput informatics-driven design, and realistic simulation workflows released to date. It provides a unified framework and dataset supporting machine learning, quantum chemistry, polymer informatics, and simulation-based research with a focus on transferability, explicitness, and reproducibility for the growing polymer science community.

1. OPoly26 Dataset: Scope and Chemical Diversity

The Open Polymers 2026 (OPoly26) dataset comprises 6,573,000 density functional theory (DFT) calculations on atomistic clusters up to 360 atoms, collectively spanning over 1.2 billion atoms (Levine et al., 28 Dec 2025). This scale enables coverage of previously inaccessible regimes for macromolecular and polymeric systems. The data are derived from 2,444 unique monomer repeat units and include 94,000 amorphous molecular dynamics (MD) simulation cells, with system sizes ranging from 300 to 5,000 atoms per simulation cell.

Monomer diversity includes:

Traditional homopolymers (840 entries)
Fluoropolymers (521)
π-conjugated (“optical”) systems (892)
Polymer electrolytes (300)
Peptoids (>100)
Lipid monomers (47)

Architectural diversity in OPoly26 is currently limited to linear chain polymers (homopolymers, alternating and random copolymers, solvated systems, peptoids with explicit solvation), with future expansions planned for architectures such as branched, crosslinked, graft, and silicon-containing polymers.

The substructure set was sampled to ensure uniform coverage of cluster sizes (10–360 atoms), environments (bulk, solvated, reactive, ionic), and complexes for force and energy prediction. Each DFT point is associated with canonical SMILES, atom types, total energy, forces, and relevant molecular metadata. The dataset is deposited under CC-BY-4.0 at https://huggingface.co/facebook/OMol25, enabling unrestricted access for the research community (Levine et al., 28 Dec 2025).

2. Computational and Methodological Foundations

DFT computations employ ORCA 6.0.0, using the ωB97M-V functional with def2-TZVPD basis and nonlocal VV10 correction. Integration utilizes pruned grids for density and Coulomb term acceleration (RI–J, COSX) (Levine et al., 28 Dec 2025). All calculations conform to strict quality and convergence criteria:

|E_total| < 150 eV; |E_total|/atom < 10 eV/atom
Maximum force per atom < 50 eV/Å
S² multiplicity < 0.5 (open-shell) or < 1.1 (organics)
Nonnegative HOMO-LUMO gap
Electron count consistency

Relevant equations include the Kohn–Sham total energy functional: $E_{\rm DFT}[n] = T_s[n] + \int v_{\rm ext}(\mathbf r)\,n(\mathbf r)\,d\mathbf r + \tfrac12 \iint \frac{n(\mathbf r)\,n(\mathbf r')}{|\mathbf r-\mathbf r'|}\,d\mathbf r\,d\mathbf r' + E_{\rm xc}[n].$ Cluster geometries are extracted from MD production trajectories with cumulative wall time exceeding 239,000 ns (Levine et al., 28 Dec 2025), ensuring the statistical sampling of relevant conformations.

OPoly26 data are formatted as JSON/HDF5, archiving Cartesian coordinates, energies, forces, population analyses, and chemical environment for ML/IP training. Additional infrastructure is provided for data access and benchmarking through https://github.com/facebookresearch/fairchem.

3. Integration with Machine Learning and Informatics

Augmenting ML potentials (MLIPs) for polymers, OPoly26 improves prediction accuracy for both energies and forces compared to small-molecule–only training (OMol25). For instance, mean absolute error (MAE) in energy prediction is reduced from 78.3 meV (OMol25 only) to 29.7 meV (OPoly26 only) on the polymer test set, with comparable improvements for forces (Levine et al., 28 Dec 2025). Combined datasets maintain small-molecule performance while enabling accurate polymer predictions.

OPoly26 supports:

MLIP training (e.g., eSEN, UMA, Orb-v3) for atomistic simulations
Transfer learning/fine-tuning for polymer-specific tasks
Evaluation of energy, force, and reactivity prediction in out-of-distribution (e.g., DFTB or Si-polymer) environments

Benchmarks validate the transferability and capacity for universal atomistic predictions when OPoly26 is used together with existing datasets focusing on small molecules or extended systems.

4. High-throughput Design and Structure Generation

Open Polymers 2026 workflows further integrate digital polymer grammars and graph-based generative tools for systematic polymer library generation and inverse mapping. PolyGrammar offers a parametric, context-sensitive formal grammar model for representing and generating all valid polymer structures in certain classes (notably polyurethanes, copolymers, polyacrylates) (Guo et al., 2021). Its framework includes:

Grammars with explicit nonterminals $\mathcal{N} = \{ X, h, s \}$ and terminals $\Sigma = \{ H(x), S(x) : x \in \mathbb{N} \}$
Context-sensitive production rules (P₁–P₁₄), yielding explicit, valid, invertible representations of polymer chains
Hypergraph and line-graph symbolic mapping for chemical structures, supporting direct conversion to and from SMILES

Inverse algorithms allow efficient translation between SMILES and grammar representations in milliseconds per chain for datasets of hundreds of polyurethanes, guaranteeing completeness and chemical validity (Guo et al., 2021).

Complementing generative representation, the VFS (virtual forward synthesis) pipeline supports the informatics-driven enumeration and computational screening of over 7 million ring-opening polymerization (ROP) candidates (Kern et al., 2024). The pipeline utilizes:

A monomer database (>30 million entries, collated from ZINC15, ChEMBL, eMolecules, and literature)
SMARTS-based graph reaction rules for nine ROP classes
ML-based property prediction: Gaussian Process Regression predicts polymerization enthalpy $\Delta H$ ; multitask neural networks predict thermal/mechanical parameters ( $T_g$ , $T_d$ , $E$ , $\sigma_b$ , $C_p$ )
Multi-objective fitness-based filtering for synthesizability, performance, and recyclability, resulting in 35,000 down-selected candidates

Open-source codebases for VFS and property models are available at https://github.com/Ramprasad-Group/polyVERSE (Kern et al., 2024).

5. Automated Simulation Setup and Workflow Orchestration

Polyply provides a modular suite for the automated parameterization and coordinate generation of (bio-)macromolecular and nanomaterial systems, directly supporting OPoly26 objectives (Grünewald et al., 2021). Key features include:

gen_params: Assign force-field parameters based on residue–level block and link libraries, using multi-scale graph matching
gen_coords: Build 3D coordinates via a coarse-grained, self-excluding random-walk algorithm, with subsequent backmapping to target resolution
Compatibility with GROMACS and LAMMPS; all-atom, united-atom, coarse-grained models
High-throughput scalability: 500-residue × 100-chain melts prepared in minutes across six polymer types and two force fields

Polyply enables setup and simulation of complex multi-component systems, including phase-separated block copolymers (PS-b-PEO/LiTFSI), branched polysaccharides (dextran), and membrane encapsulated coacervates. The design allows rapid extensibility—new monomers or topologies are supported by adding residue graphs or block/link entries without core code modification (Grünewald et al., 2021).

6. Impact, Community Engagement, and Open Science Initiatives

OPoly26 marks a substantive expansion of polymer informatics, providing resources for:

Data-driven discovery of new material classes (polymer electrolytes, sustainable and recyclable plastics, optoelectronics)
Universal, atomistic MLIP development for polymers and hybrid materials (Levine et al., 28 Dec 2025)
Public ML benchmarks and leaderboards specifically for polymers
Generation of standardized, explicit chemical data for downstream use in property prediction, optimization, and retrosynthetic planning
Integration with open databases, such as the Open Reaction Database, for retrosynthesis

All major codebases, pipelines, datasets, and experimental validation results have been distributed under permissive open-source licenses (CC-BY-4.0, Apache-2.0), and infrastructure exists for community contributions, curation, and leaderboarding (Levine et al., 28 Dec 2025, Grünewald et al., 2021, Kern et al., 2024).

7. Future Directions and Research Challenges

Planned extensions for OPoly26 and associated frameworks include:

Expansion to non-linear, branched, crosslinked, and grafted architectures in both quantum and simulation datasets (Levine et al., 28 Dec 2025)
Automation and ML-driven design of retrosynthetic and structure–property relationships for step-growth and chain-growth polymerizations (Guo et al., 2021)
Incorporation of stereochemistry, 3D structure, and hierarchical/continuous grammar models for precision representation (Guo et al., 2021)
Integration with genetic algorithms and yield prediction models to accelerate candidate down-selection in high-dimensional chemical spaces (Kern et al., 2024)
Development of foundation models and universal MLIP frameworks training on polymers, small molecules, and extended solids jointly, toward an “atomistic GPT for polymers” (Levine et al., 28 Dec 2025)

A plausible implication is that OPoly26’s scope and rigor provide the essential foundation for approaching universal, data-driven polymer design, simulation, and property prediction at scale for the materials research community.