Non-Targeted LC/GC-HRMS Analysis
- Non-targeted LC/GC-HRMS is a comprehensive analysis method that captures full-scan mass spectra to identify all substances in a sample without prior target selection.
- It integrates sophisticated workflows including sample enrichment, chromatographic separation, and advanced spectral deconvolution algorithms for accurate compound annotation.
- This approach supports regulatory, environmental, and clinical applications by enabling retrospective discovery of unknowns and emphasizing methodological reproducibility.
Non-targeted analysis (NTA) using liquid chromatography or gas chromatography coupled to high-resolution mass spectrometry (LC/GC-HRMS) is a comprehensive analytical approach designed to screen samples for all present substances—whether suspected, unknown, or unexpected—without the a priori selection of target analytes. Unlike classic targeted methods, NTA leverages full-spectrum acquisition (across a broad domain), advanced data-mining, and library-free or multi-attribute identification algorithms to enable retrospective discovery and annotation of compounds in complex matrices across environmental, biological, clinical, and regulatory domains (Alsubaie et al., 23 Dec 2025).
1. Conceptual Basis and Analytical Scope
Non-targeted LC/GC-HRMS acquires full-scan mass spectra from chromatographically separated mixtures, extracting broad sets of features (ions, peaks, or components) independent of prior knowledge. In LC-HRMS, electrospray or orthogonal soft-ionization facilitates the detection of polar or thermolabile molecules; in GC-HRMS, electron ionization is used for volatile and semi-volatile species. Both modalities allow subsequent retrospective interrogation for unknowns or suspects not included at the time of analysis (Alsubaie et al., 23 Dec 2025).
This paradigm shift is crucial for regulatory and exposomics applications, where unknown or emergent contaminants, metabolites, or biogenic compounds must be detected and annotated under minimal prior information (Alsubaie et al., 23 Dec 2025, Guillevic et al., 2021, Watrous et al., 2018).
2. Chromatographic and Mass-Spectrometric Workflows
2.1 Sample Preparation and Chromatography
- Sample Introduction: Matrix-specific enrichment via solid-phase extraction (SPE) or purge-and-trap is performed for aqueous and biological matrices (Cairoli et al., 2022, Watrous et al., 2018).
- Chromatographic Separation:
- LC: C18 columns (e.g., 100 × 2.1 mm, 1.7–1.8 µm) with multi-step gradients of water/acetonitrile or isopropanol/acetic acid for bioactive lipids and xenobiotics (Watrous et al., 2018).
- GC: 30–60 m capillary columns with 0.25 mm I.D., 0.25 µm film; typical temperature ramps 40–320 °C for volatiles and derivatized analytes (Cairoli et al., 2022, Guillevic et al., 2021).
- Ionization and Detection:
- LC-HRMS: ESI/HESI (positive/negative mode), Orbitrap/QTOF analyzers ( at ~200), full-scan acquisition for 50–1000 (Watrous et al., 2018, Cairoli et al., 2022).
- GC-HRMS: EI 70 eV, high-resolution TOF, QTOF, or Orbitrap; mass range 24–600, resolving power up to at ~200 (Cairoli et al., 2022, Guillevic et al., 2021).
2.2 Preprocessing and Feature Extraction
- Spectral Alignment: Nonlinear retention-time correction (e.g., spline fitting to standards) to sub-0.03 min precision (Watrous et al., 2018).
- Peak Detection and Deconvolution: Algorithms such as MzMine, local-minimum search, and chromatogram builder are parameterized for high-sensitivity and specificity; e.g., minimum 2.5×10 counts, tolerance 5 ppm (Watrous et al., 2018).
- Blank Subtraction and Quality Control: Internal standards span the RT/mass range, facilitating normalization; blanks are routine to identify background or artefactual features (Watrous et al., 2018).
3. Data Analysis: Mathematical and Algorithmic Frameworks
3.1 Matrix Decomposition and Component Analysis
- PARAFAC2 and MCR–ALS: Multiway decomposition extracts components across samples and sites, useful for environmental NTA pipelines (Cairoli et al., 2022).
- Sparse Multivariate Curve Resolution: The MCR–ALS optimization:
where is the data matrix, (elution profiles), (spectra), and penalizes non-sparsity. L-norm (Lasso) regularization is empirically favored, reducing rotational ambiguity and yielding sparse, chemically plausible spectra (Mani-varnosfaderani et al., 2019). - Lasso-MCR–ALS: - Rapid convergence toward chemically valid solutions. - Eliminates spurious nonzero features (contrasted with ridge/L or L which can overfit or stagnate, respectively). - Implementation via coordinate-descent or block Lasso solvers (Mani-varnosfaderani et al., 2019).
3.2 Graph-Based and Combinatorial Formula Inference
- ALPINAC Algorithm: For GC–EI–HRMS, fragment formula annotation proceeds without spectral libraries using:
- Unbounded Knapsack Enumeration: For each , integer solutions over major atom/isotope masses are sought,
- with double-bond equivalents (DBE) as a physicochemical constraint () (Guillevic et al., 2021).
- Directed Acyclic Pseudo-Fragmentation Graphs: Nodes=fragments, edges=neutral losses. Singleton nodes pruned unless information is incomplete.
- Isotopocule Modeling: Expansion in minor isotope variants. Each candidate receives a likelihood score weighted by the fraction of total signal explained and penalized by formula complexity.
- Iterative Fitting: Joint least squares optimization (Levenberg–Marquardt); molecular ion candidates filtered using valence and parity constraints (Guillevic et al., 2021).
Table 1. ALPINAC Workflow Phases
| Phase | Core Method | Output |
|---|---|---|
| Sample acquisition & HRMS | GC-EI-HRMS, HDF5 storage | Raw spectra, centroided peaks |
| Exhaustive formula generation | Unbounded knapsack, DBE, isotopes | Candidate formulas per |
| Co-elution graph filtering | Directed acyclic pseudo-graph | Feasible fragment subgraphs |
| Isotopocule expansion & fitting | Intensity modeling, LOD pruning | Scaled fragment contributions |
| Iterative selection & ranking | Likelihood , LM fit | Ranked candidate formulas |
- Performance: ≥95% of ion “area” is reconstructed for the majority of tested compounds; correct molecular ion is the top candidate in ≈80–83% of cases (Guillevic et al., 2021).
3.3 Spatiotemporal and Systems-Level Modeling
- Process PLS (Path Modeling): Spatiotemporal prediction and tracking of pollution components in river networks, using block-wise partial least squares (SIMPLS), path coefficients (), and normalized RMSE (NRMSE) for model accuracy (Cairoli et al., 2022). This enables tracking the transport and temporal dynamics of previously unknown species across environmental compartments.
4. Compound Identification without Reference Standards
- Library-Free Approaches: When standard spectra are unavailable, annotation involves combinatorial formula search, isotopologue simulation, and in silico calculation of molecular properties (e.g., collision cross-section, isotope pattern) (Guillevic et al., 2021, Nuñez et al., 2018).
- Multi-Attribute Matching: High-confidence identifications require combining orthogonal attributes—exact mass, isotopic pattern, CCS, RT, MS/MS—each weighted in a transparent scoring system (MAME engine) (Nuñez et al., 2018).
FDR and FNR are controlled by adjusting score cutoffs; after parameter optimization, high-confidence identifications achieve FDR down to 10% (Nuñez et al., 2018).
- Spectral Networking: For classes such as eicosanoids, MS/MS network analysis (cosine similarity, GNPS thresholds) clusters known and novel features, propagating structural information and annotating “formula gaps” (Watrous et al., 2018).
- PARAFAC2/Process PLS Integration: Enables tracking of annotated and novel pollutants spatially and temporally, prioritizing emerging contaminants and establishing links to pollution sources (Cairoli et al., 2022).
5. Reproducibility and Computational Infrastructure
- Six-Pillar Reproducibility Model: Regulatory-grade reproducibility is defined by six criteria (Alsubaie et al., 23 Dec 2025):
- Laboratory validation (C1)
- Data availability (C2)
- Code availability (C3)
- Standardized I/O formats (C4)
- Knowledge integration (C5)
- Portable implementation (C6)
Temporal analysis of 103 tools (2004–2025) reveals a consistent trend: while openness (C2–C3–C5) reached 86% and code/data sharing increased, operability (C1+C6, i.e., real-world validation and workflow portability) declined to 43%. Only 8.7% of tools satisfy all six pillars; containerization and workflow management systems improve compliance but remain rare (Alsubaie et al., 23 Dec 2025).
Table 2. Adoption of Reproducibility Pillars (2020–2025)
| Pillar | Adoption Rate (%) |
|---|---|
| Data (C2) | 93.4 |
| Code (C3) | 85.2 |
| Knowledge (C5) | 80.3 |
| Validation (C1) | 50.8 |
| Formats (C4) | 59.0 |
| Portability (C6) | 34.4 |
Recommendations emphasize laboratory validation (spike–recovery, LOD), open file formats (mzML, mzTab), and workflow portability (Nextflow, Snakemake, Docker/Singularity), with priority for food and environmental domains (Alsubaie et al., 23 Dec 2025).
6. Applications and Case Studies
- Atmospheric Trace Gases: Automated GC–EI–HRMS with ALPINAC enables detection and annotation of unknown halocarbons and hydrocarbons not in commercial libraries, with rigorous ranking and coverage metrics (Guillevic et al., 2021).
- River Pollution Networks: End-to-end NTA pipelines (sampling, SPE, GC/LC–HRMS, PARAFAC2, Process PLS) enable historian-like tracing of pollutants and prioritizing unknowns by their transport and appearance along hydrological networks (Cairoli et al., 2022).
- Lipidomics and Disease Biomarkers: Directed LC–HRMS and spectral networking expand the annotated eicosanoid space from ~150 to >500 entities, discovering putative novel molecules associated with age and inflammation (Watrous et al., 2018).
- Standards-Free Environmental Chemoinformatics: ISiCLE and MAME platforms leverage multi-attribute scoring for confident identification of small molecules in synthetic mixtures, blending in silico predictions of CCS and isotopic signatures (Nuñez et al., 2018).
7. Limitations, Challenges, and Prospective Directions
- Ambiguity from Sparse or Low-abundance Features: Confidence in formula assignment degrades with few detected fragments or low S/N ratio, necessitating cross-modal or complementary validation ((Guillevic et al., 2021)—flagged by low , high ranking index).
- Rotational Ambiguity in Spectral Decomposition: Sparse penalties () in MCR–ALS reduce rotational ambiguity, but L minimization is nonconvex and poorly conditioned (Mani-varnosfaderani et al., 2019).
- Fragmentation Variability and MS/MS Limitations: Cyclized or unusual lipids fragment unpredictably; automated MS/MS annotation tools often fail, requiring manual or network-based validation (Watrous et al., 2018).
- Reproducibility Gap: Growing divergence between openness (data, code sharing) and operational reproducibility (validation, portability). No dedicated tools for food-matrix contaminant screening have emerged (Alsubaie et al., 23 Dec 2025).
- Standardization: Broader implementation of vendor-neutral formats and workflow-containers remains a strategic goal for field-wide reproducibility and regulatory acceptance.
This overview synthesizes mathematical frameworks, validation metrics, algorithmic strategies, and reproducibility standards directly from the cited research corpus. It highlights the progression and current challenges of non-targeted analysis using LC/GC-HRMS, emphasizing both technical rigor and infrastructural maturity required for cross-domain and regulatory workflows.