Hybrid Physics-ML Model for Forward Osmosis Flux with Complete Uncertainty Quantification

Published 11 Dec 2025 in cs.LG and stat.ML | (2512.10457v1)

Abstract: Forward Osmosis (FO) is a promising low-energy membrane separation technology, but challenges in accurately modelling its water flux (Jw) persist due to complex internal mass transfer phenomena. Traditional mechanistic models struggle with empirical parameter variability, while purely data-driven models lack physical consistency and rigorous uncertainty quantification (UQ). This study introduces a novel Robust Hybrid Physics-ML framework employing Gaussian Process Regression (GPR) for highly accurate, uncertainty-aware Jw prediction. The core innovation lies in training the GPR on the residual error between the detailed, non-linear FO physical model prediction (Jw_physical) and the experimental water flux (Jw_actual). Crucially, we implement a full UQ methodology by decomposing the total predictive variance (sigma2_total) into model uncertainty (epistemic, from GPR's posterior variance) and input uncertainty (aleatoric, analytically propagated via the Delta method for multi-variate correlated inputs). Leveraging the inherent strength of GPR in low-data regimes, the model, trained on a meagre 120 data points, achieved a state-of-the-art Mean Absolute Percentage Error (MAPE) of 0.26% and an R2 of 0.999 on the independent test data, validating a truly robust and reliable surrogate model for advanced FO process optimization and digital twin development.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a hybrid physics-ML framework that integrates mechanistic FO transport models with GPR residual learning to improve flux prediction accuracy.
It achieves near-perfect correlation (R² ≈ 0.9998) with observed data, outperforming pure-ML and physics-only models on an extensive test set.
The model provides full uncertainty quantification by decomposing epistemic and aleatoric uncertainties, guiding robust experimental design and digital twin development.

Hybrid Physics-ML Modeling for Forward Osmosis Flux with Complete Uncertainty Quantification

Introduction and Problem Statement

Accurate prediction of water flux ( $J_w$ ) in Forward Osmosis (FO) remains a critical challenge due to intricate, nonlinear internal mass transport phenomena and the difficulty in empirically quantifying key mechanistic parameters such as water permeability ( $A$ ), salt permeability ( $B$ ), and the structural parameter ( $S$ ). Traditional mechanistic models, typically based on the Spiegler–Kedem solution-diffusion framework, are substantially limited by these parameter variabilities and routinely fail to deliver reliable uncertainty quantification (UQ), compromising their use in engineering design and optimization. Meanwhile, black-box ML methods such as feed-forward neural networks can fit data well but lack physical interpretability and rigorous predictive uncertainty decomposition. This work proposes a hybridized approach to resolve these key issues via a physics-informed Gaussian Process Regression (GPR) framework, enriched with comprehensive, analytical uncertainty estimation and validation.

Model Architecture: Physics-Informed Residual Learning

The proposed architecture synergistically couples a full, non-linear mechanistic FO transport model with a robust GPR surrogate. The GPR is explicitly trained on the residuals between the experimental flux and the physical model's prediction:

$e = J_{w, \text{actual}} - J_{w, \text{physical}}(\mathbf{z})$

The final hybrid model output is then constructed as:

$J_{w, \text{hybrid}}(\mathbf{z}) = J_{w, \text{physical}}(\mathbf{z}) + g_{\text{GPR}}(\mathbf{z})$

By relegating unmodeled, complex physical phenomena and experimental artifacts to the GPR component while preserving physical interpretability in the overall prediction, this structure enforces thermodynamic and mechanical consistency and confers strong generalization properties—even in the low-data regime (120 training points).

Figure 1: Model performance across MAE, $R^2$ , and MAPE for the four FO water flux models. The Hybrid Robust GPR model consistently outperforms all variants.

Feature Selection and Sensitivity Analysis

A structured, 10-dimensional feature vector encoding membrane properties ( $A$ , $\epsilon_\mathrm{psl}$ , $\tau$ , $t_\mathrm{psl}$ ), solution concentrations ( $c_\mathrm{f,in}$ , $c_\mathrm{d,in}$ ), operational velocities ( $u_\mathrm{f,in}$ , $u_\mathrm{d,in}$ ), and geometric/hydrodynamic descriptors ( $L_x$ , $t_c$ ) was used. Sensitivity analysis via Jacobian magnitude quantification reflected dominant effects from water permeability $A$ , feed concentration $c_{f,\text{in}}$ , support-layer thickness $t_{\text{psl}}$ , and porosity $\epsilon_{\text{psl}}$ , recapitulating mechanistic FO theory.

Figure 2: Hybrid model sensitivity to input features, quantified using the average absolute Jacobian magnitude.

Comparative Predictive Performance

Quantitative evaluation on an external test set ( $N=2854$ ) demonstrated that the Hybrid Robust GPR outperformed pure-ML and hybrid-ANN models, achieving a test MAPE of 0.26% and $R^2$ of 0.999. The physics-only model failed to capture critical nonlinearities and experimental artifacts (Figure 3), whereas the hybrid model achieved near-perfect alignment ( $R^2=0.9998$ ) with observed measurements, verifying the effectiveness of residual correction.

Figure 3: Parity plot for the FO physical model. The model shows systematic deviations due to unmodeled effects.

Figure 4: Parity plot for the Hybrid Robust GPR model. Perfect alignment signifies effective residual learning.

Analytical Uncertainty Quantification

A rigorous, analytically tractable UQ protocol was implemented:

Epistemic Uncertainty: The spatially varying predictive variance from the GPR was extracted to quantify model confidence based on sampling density in input space.
Aleatoric Uncertainty: Measurement errors of the 10 physical input features, including cross-correlations, were analytically propagated via the Delta method, using the hybrid model's local Jacobian.

The total variance was thus computed as:

$\sigma^2_{\text{total}} = \sigma^2_{\text{model}} + \sigma^2_{\text{input}}$

Monte Carlo validation across representative test samples demonstrated $<3\%$ relative error in variance between the analytic Delta method and brute-force MC simulations, confirming the validity of the linear approximation for uncertainty propagation.

Figure 5: Analytical vs. Monte Carlo (MC) uncertainty validation; extremely high variance correlation ( $R=0.994$ ) with negligible relative error.

Uncertainty Decomposition and Interpretation

Decomposing predictive uncertainty into epistemic and aleatoric components across input space reveals that, in well-sampled regions, aleatoric (input-driven) uncertainty dominates. This finding underscores the limiting role of physical measurement accuracy on achievable surrogate model fidelity and provides actionable guidance for experimental design and resource allocation in FO research.

Figure 6: Uncertainty decomposition indicating predominance of aleatoric uncertainty in standard operating regions.

Implications and Prospects

The physics-augmented GPR framework achieves state-of-the-art predictive metrics with rigorous, decomposed uncertainty quantification, effectively formalizing risk in process optimization and digital twin applications. Practically, this permits (1) robust uncertainty-aware FO module design, (2) principled parameter inference and optimization (enabling Bayesian optimization), and (3) seamless deployment as verifiable digital twins. Theoretically, the work exemplifies the power of hybrid residual learning in multi-physics surrogate modeling and sets a reproducible precedent for propagating both measurement- and model-driven uncertainties in physically complex systems.

Conclusion

This work establishes a new methodological benchmark for FO flux prediction through a principled synthesis of non-linear physical modeling and data-driven residual correction, systematically integrating full-probabilistic uncertainty quantification. The demonstrated combination of empirical accuracy, interpretability, and formal risk assessment enables direct translation to process digitalization and automation initiatives. Extensions to active Bayesian optimization and adaptive experimental planning are clear next steps for further scaling the impact of this approach.