Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quantifying the Emergence of Selection Prior to Biological Evolution

Published 21 Dec 2025 in q-bio.MN, physics.bio-ph, and q-bio.PE | (2512.18752v1)

Abstract: Selection is central to biological evolution, yet there has been no general experimental framework for quantifying selection in chemical systems before life. Here we demonstrate that selection in a prebiological chemical system can be directly quantified. Assembly Theory predicts that selection corresponds to a transition from undirected to directed exploration of chemical possibility space, measurable through the amount of Assembly, A, which integrates molecular assembly index with observed copy number. By analysing peptide ensembles produced under diverse polymerisation conditions, we show that undirected reactions explore sequence space almost uniformly, yielding exploration ratios of 0.85-0.95, whereas reactions influenced by evolved proteases generate markedly lower ratios (0.51-0.75) and elevated A, consistent with selective reinforcement of specific assembly pathways. Across multiple environments and amino-acid combinations, the exploration ratio and ensemble assembly A robustly distinguish directed from undirected exploration, establishing a general, experimentally tractable metric for detecting and measuring selection in chemical evolution.

Summary

  • The paper establishes that Assembly Theory quantitatively detects selection in prebiotic chemical systems by using metrics such as the exploration ratio (ER) and ensemble assembly.
  • Simulations and experiments show that protease-mediated, directed processes yield significantly lower ERs and higher complexity compared to undirected polymerization methods.
  • The framework offers a scalable method for identifying chemical selection, with implications for biosignature detection, origins-of-life research, and directed chemical discovery.

Quantitative Detection of Selection in Prebiotic Chemical Systems via Assembly Theory

Context and Motivation

A fundamental open problem in prebiotic chemistry and origins-of-life research concerns the empirical detection and quantification of selection prior to the emergence of biological evolution. While Darwinian selection is well-characterized in systems with replication, heredity, and variation, the difficulty lies in unambiguously identifying selection in purely chemical systems before these phenomena arise. The absence of a general, experimental framework for this quantification limits understanding of how directedness, and thus biological agency, emerges from undirected molecular processes. The study "Quantifying the Emergence of Selection Prior to Biological Evolution" (2512.18752) systematically addresses this gap by developing and deploying Assembly Theory (AT) as a quantitative tool to distinguish directed from undirected chemical exploration.

Assembly Theory: Framework and Metrics

Assembly Theory formalizes the construction of molecular objects via recursive, stepwise assembly from defined building blocks. The assembly index denotes the minimal assembly steps required for an object given the available building blocks. For ensembles, the Joint Assembly Space (JAS) encompasses the minimal set of recursive steps for all observed objects, approximated as the union of individual minimal paths for complex mixtures.

Two key metrics emerge from AT for detecting selection:

  • Exploration Ratio (ER): The ratio of observed objects to the total set of objects (including necessary but unobserved intermediates) in the JAS. ER quantifies how thoroughly the observable ensemble samples the JAS. Undirected, combinatorial processes result in ER near unity (close to complete, uniform sampling), while selection-driven, directed processes yield substantially lower ERs by restricting exploration to favored pathways.
  • Ensemble Assembly (A): This aggregate metric integrates assembly indices and copy numbers of observed species, measuring the cumulative selectivity required to generate the observed ensemble. Selection-driven processes skew A to higher values due to enrichment in high-complexity objects.

Computational and Experimental Strategy

Simulation Studies

Simulations with five abstract monomers (a-e) demonstrate that undirected, recursive combination leads to rapid, uniform occupation of sequence space and high ER, especially at low assembly indices. Introducing recursive bias (e.g., favoring species with terminal “b”) mimics environmental selection and produces a marked reduction in ER, enhanced sequence similarity, and increased complexity per ensemble, indicative of selective reinforcement.

Experimental Chemical Systems

The primary experimental system is peptide formation under varying environmental constraints. Three undirected polymerization modes were tested: heat-driven wet-dry cycling, CDI-mediated coupling in aqueous solution, and CDI-coupling in the solid state. As the principal directed process, biologically evolved proteases (papain, bromelain, trypsin, chymotrypsin) catalyze peptide formation from amino acid methyl esters, embedding known sequence selectivity. Using HPLC-MS/MS and the OLIGOSS pipeline, peptide ensemble compositions were comprehensively mapped and analyzed.

Data Analysis

Assembly indices and JAS approximations for each ensemble were computationally determined. All detected peptides above MS sensitivity thresholds were treated as having equal effective copy number. ER and ensemble assembly A were calculated for each reaction condition, across multiple replicates and amino acid compositions.

Results and Key Findings

Distinction Between Undirected and Directed Exploration

Undirected processes—across all tested environmental constraints and sets of amino acids—exhibited exploration ratios in the range 0.85ER0.950.85 \leq \mathrm{ER} \leq 0.95 and steadily expanding sequence diversity, indicative of nearly uniform combinatorial space traversal. This behavior proved robust to changes in polymerization modality and substrate composition.

Directed (protease-mediated) processes demonstrated markedly lower ERs (0.51ER0.750.51 \leq \mathrm{ER} \leq 0.75 depending on protease and cycle), with ensemble assembly A rising more sharply per unique sequence, reflecting increased selective reinforcement of specific high-complexity pathways. These two regimes, directed and undirected, are thus quantitatively distinguishable via ER and A, with no overlap in metric ranges across all conditions.

Sequence-Level Evidence of Selection

Bigram transition analysis of peptide sequences generated under protease catalysis revealed substrate-specific selectivity: for instance, papain produced significant enrichment of alanine and valine motifs while nearly excluding proline, mirroring known protease substrate preferences. In contrast, undirected CDI-coupling generated a more statistically uniform distribution of motifs, including proline-rich sequences, consistent with absence of sequence-specific selection.

Theoretical and Practical Consequences

  • The study validates that only recursively reinforced pathway constraints constitute selection in AT; mere kinetic (catalytic) bias, not underpinned by such constraints, does not.
  • The recognized numeric thresholds of ER and the behavior of A establish a general experimental test for selection in chemical systems, applicable irrespective of the specific selection mechanism.
  • The methodology introduces a route for hypothesis-driven examination of prebiotic selection: e.g., by varying environmental conditions (mineral surfaces, temperature cycles, pH gradients) and measuring ER, one can empirically evaluate which contexts promote the earliest forms of chemical selection.
  • The AT-derived metrics also offer potential for agnostic biosignature detection: low ER ensembles in unidentified chemical contexts could serve as evidence for prebiotic or even extant biological selection processes.

Implications and Future Directions

Assembly Theory, paired with the exploration ratio and ensemble assembly, furnishes a scalable, mechanism-agnostic, and experimentally tractable measure of selection in chemistry prior to life. This capability has several major implications:

  • Origins-of-Life Research: The methodology bridges the conceptual gap between combinatorial chemistry and Darwinian biology, providing the first experimental handle on when undirected chemistry yields to directedness—a critical transition in the emergence of biological complexity.
  • Biosignature Science: The generality of the approach makes it suitable for the detection of molecular-level selection in samples of unknown provenance, which could transform astrobiological searches for life or its precursors.
  • Directed Chemical Discovery: The framework may accelerate identification of environmentally or artificially driven functional chemical systems by flagging selective reinforcement as it emerges in high-dimensional compositional spaces.

Future developments should focus on expanding AT’s applicability to more complex, heterogeneous chemistries, automating exploration of chemical parameter spaces for selective signatures, and integrating AT-metrics with machine learning for high-throughput screening and prediction.

Conclusion

This work delivers a quantitative, experimentally validated framework for detecting and measuring chemical selection prior to biological evolution (2512.18752). By leveraging Assembly Theory, it is now possible to empirically distinguish undirected combinatorial exploration from selection-driven directedness in chemical systems. These advances provide an essential foundation for studying the origins of selection, enabling systematic probing of the earliest transitions toward evolutionary organization and complexity.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 13 tweets with 492 likes about this paper.