Personalized Blood Glucose Prediction

Updated 28 January 2026

Personalized blood glucose prediction is the tailored forecasting of future glucose levels using individual biosignals, historical CGM data, and advanced modeling techniques.
Modern methods employ ARIMA, kernel-based models, and deep neural networks that achieve low RMSEs and high clinical acceptability across simulated and real settings.
Emerging strategies integrate federated learning, privacy-preserving frameworks, and multi-modal biosensing for efficient, deployable, and personalized diabetes management.

Personalized blood glucose prediction refers to the computational forecasting of future blood glucose concentrations for an individual, tailored using person-specific biosignals, biometric profiles, or historical data. This domain underpins closed-loop insulin delivery, proactive diabetes management, hypoglycemia/hyperglycemia prevention, and the deployment of clinical decision support systems. Modern approaches integrate statistical time series models, non-linear kernel machines, dynamical systems, and deep learning architectures, frequently with device-level constraints and privacy-preserving frameworks. Personalized forecasting is motivated by the high inter- and intra-individual variability in glucose-insulin dynamics, dietary response, and behavioral context, which render generic population models suboptimal. Rigorous quantitative assessment employs error-based metrics, clinical acceptability frameworks, and comparative benchmarking across both simulated and real-world datasets.

1. Modeling Paradigms for Personalized Blood Glucose Prediction

Historically, personalized glucose forecasting was dominated by univariate autoregressive models fit per patient (e.g., AR, ARMA, ARIMA), which process only the individual’s own continuous glucose monitoring (CGM) signal. Letting $G_{t}$ denote the observed interstitial glucose at time $t$ , an ARIMA $(p,d,q)$ process approximates the future value as

$\Delta^{d} G_t = \sum_{i=1}^{p} \phi_i G_{t-i} + \sum_{j=1}^q \theta_j \epsilon_{t-j} + \epsilon_t$

where $\{\phi_i\}, \{\theta_j\}$ are parameters, $d$ is the order of differencing, and $\epsilon_t$ is white noise. Optimal orders are selected by Akaike or Bayesian Information Criteria (AIC/BIC) based on individual history. For short-horizon forecasts (15 minutes), mean absolute errors (MAE) as low as 10.15 mg/dL (using 6 hours of 5-min sampled data) have been reported, with diminishing improvement for longer windows or higher frequency sampling (Rodriguez, 2021).

Kernel-based methods such as patient-specific Support Vector Regression (SVR) and Gaussian Process Regression (GPR) can exploit non-linear dependencies and offer uncertainty quantification. Such approaches consistently outperform linear ARIMA-type processes in personal CGM settings, particularly in capturing excursions and preserved temporal coherence (Bois et al., 2020, Bois et al., 2020).

Deep neural architectures generalize these paradigms by learning higher-order, context-sensitive, and multi-modal dependencies. Widely adopted models include:

Feedforward Neural Networks (FFNNs) and Extreme Learning Machines (ELMs): Simple neural regressors with per-person parameter fitting; fast but generally outperformed by recurrent schemes.
Recurrent Neural Networks (RNNs, LSTMs, and GRUs): Learn temporal dependencies intrinsic to glucose-insulin systems. Two-layer LSTMs or Bi-GRUs, personalized by per-patient training or fine-tuning, yield high clinical acceptability (AP ≥95% in euglycemia at 30 min, event sensitivity 75–96%) (Bois et al., 2020, Rigamonti et al., 21 Jan 2026).
Convolutional-Recurrent Networks (CRNNs): Incorporate both temporal convolution (for short-term feature extraction) and recurrent blocks (for long-term memory). These achieve RMSEs as low as 9.4 mg/dL (in silico, 30 min PH), and can be deployed on smartphones with sub-10 ms inference (Li et al., 2018).
Attention-based and Transformer Architectures: Recent advances employ attention to capture variable-time dependencies and integrate multi-modal features, complemented by subject embeddings for personalization. Informer-like and patch-wise transformers show state-of-the-art performance (e.g. RMSE < 16 mg/dL for 30 min and 47 mg/dL for 240 min PH on clinically relevant datasets), with reliable calibration and substantial gains for longer input histories (24 h–1 week) (Sergazinov et al., 2022, Karagoz et al., 12 May 2025).
Dynamical state-space and mechanistic neural models: Hybrid schemes (e.g., bio-informed RNNs) fuse learned representations with biology-inspired losses enforcing fitness to established glucose-insulin ODEs, further enhancing interpretability and safety (Carli et al., 24 Mar 2025, Isaac et al., 5 Oct 2025).

2. Data Preprocessing, Feature Engineering, and Personalization Strategies

Personalized glucose forecasters are trained on subject-specific data streams, typically sampled at 5–15-min intervals via CGM. Data processing protocols include:

Sliding-window extraction: At each time $t$ , the predictive model ingests a fixed-size window $[G_{t-n+1}, ..., G_t]$ of length $n$ (e.g. $n=72$ for 6 h at 5-min resolution).
Normalization: Standardization (z-score, Min-Max, or window-based normalization) is imperative for maintaining stationarity and optimizing convergence, especially in online settings.
Multi-modal augmentation: Where available (insulin dosing, meal timing/quantity, physical activity, other wearables), exogenous features are concatenated with CGM windows or separately tokenized in transformer backbones (Rigamonti et al., 21 Jan 2026, Karagoz et al., 12 May 2025).
Personalization: Approaches include
- Training separate models per patient
- Fine-tuning shared (population) models on individual data (achieving full gains with ≤3–4 weeks of adaptation data) (Rigamonti et al., 21 Jan 2026)
- Embedding patient IDs in neural networks for implicit per-person conditioning (Sergazinov et al., 2022, Armandpour et al., 2021).

Personalized error curves plateau when past windows exceed 6 h (for univariate/input-only models), justifying practical caps on input size (Rodriguez, 2021).

3. Loss Functions, Training Protocols, and Clinical Validation

Loss landscapes and training regimens are optimized for the peculiar class imbalance and outlier structure introduced by rare but consequential hypo- and hyperglycemic events:

Standard error metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Absolute Relative Difference (MARD), and Median Absolute Percentage Error (MAPE) are widely reported for comparative benchmarking (Bois et al., 2020, Cichosz et al., 2 Jan 2026, Karagoz et al., 12 May 2025).
Excursion-sensitive loss: The Hypo-Hyper (HH) loss introduces a penalty that scales quadratically beyond euglycemic boundaries (70–180 mg/dL), up-weighting errors in the tails and reducing RMSE in hypoglycemia by up to 50% over standard MSE (Dave et al., 2024).
Robust or trimmed mini-batch procedures: Discarding top $1-\beta$ fraction (e.g., β = 0.9) outlier samples per mini-batch stabilizes gradients and mitigates the influence of erratic, unpredictable excursions (Armandpour et al., 2021).
Uncertainty quantification: Probabilistic networks yield calibrated prediction intervals via Bayesian inference, dropout-based infinite mixture models, or GPR, supporting clinical interpretability (Sergazinov et al., 2022, Sirlanci et al., 2019).

Clinical acceptability is assessed with Continuous Glucose-Error Grid Analysis (CG-EGA), evaluating the proportion of predictions in clinically safe (A+B) vs. dangerous (E) zones, alongside time gain (TG) metrics, event sensitivity, and other clinical KPIs. Multiple studies report >90% accurate prediction in euglycemic or hyperglycemic zones, but lower sensitivity and higher relative error for hypoglycemic events, highlighting an enduring challenge (Bois et al., 2020, Dave et al., 2024, Rigamonti et al., 21 Jan 2026, Cichosz et al., 2 Jan 2026).

4. Federated and Privacy-Preserving Personalization

To address privacy constraints and the "cold-start" problem for new patients, federated learning designs are now central:

Centralized federated learning (FedAvg): Global neural weights are aggregated across clients, exchanging only model parameters without transmitting raw CGM data. Subsequent client-specific fine-tuning with personalized losses (e.g., HH) improves excursion detection by 35% over local-only models, arriving within 5–10% of centralized baselines (Dave et al., 2024).
Asynchronous/decentralized FL (GluADFL): Devices participate in peer-to-peer weight averaging across custom topologies (random, ring, cluster). This framework supports flexible participation and addresses device dropout; stability is preserved if ≤70% nodes are inactive per round. Population models derived from federated communities underpin improved unseen-patient initialization, reducing RMSE by 0.4–0.8 mg/dL over naive personal models (Piao et al., 2024).
Privacy compliance: Modern FL frameworks ensure HIPAA/GDPR compliance, non-identifiable data exchange, and are amenable to integration in mHealth CGM apps or wearable devices.

5. Extensions: Multimodal, Mechanistic, and Non-Invasive Personalization

Recent work expands the input space and modeling granularity beyond standard CGM and logged meal/insulin data:

Non-invasive biosensing via sweat: A physiologically grounded pharmacokinetic model can be personalized via double-loop optimization of patient-specific parameters (diffusion coefficients, gland thickness) to predict blood glucose from sweat with $R^2=0.95$ and RMSE $\approx$ 0.9 mmol/L, outperforming prior linear regressions (best r=0.75) (Yin et al., 2024). These approaches are suitable for semi-continuous, personalized ambulatory glucose monitoring.
Integration of expert knowledge and Bayesian networks: Hybrid Bayesian-data-driven frameworks for T2DM leverage clinical biomarker networks and structural time-series modeling, achieving MAE = 6.4 mg/dL for 15-min forecasts and robust interpretability, supporting clinical decision support and metabolic phenotyping (Sun et al., 2024).
Machine-learned dynamical systems: Physics-informed GRU networks regularized with ODE-consistency losses capture circadian variation, produce interpretable internal states (e.g., insulin-on-board), and deliver robust long-term forecasts (GoF up to 65%, 20% gain over linear ODEs) (Carli et al., 24 Mar 2025).

6. Practical Deployment, Resource Constraints, and Open Challenges

Lightweight algorithms (ARIMA, small random forests, compressed LSTMs or GRUs) and online learning protocols are best suited for deployment on wearables and mobile devices, requiring only kilobytes to megabytes of RAM and millisecond-scale inference/computation time (Rodriguez, 2021, Li et al., 2018).
For more accurate event (hypo/hyperglycemia) anticipation, higher-frequency sampling (5-min), inclusion of auxiliary contextual features (activity, meals, insulin), and excursion-weighted losses are recommended.
Persistent challenges include:
- Generalizing across rapidly changing physiologies, device drift, and sensor artifacts
- Achieving event-level hypoglycemia sensitivity in high class-imbalance contexts
- Calibrating and validating models across CGM device generations, ethnicities, and clinical workflows
- Scaling up privacy-preserving or federated architectures for global real-time personalization

Real-world best practices entail adaptive fine-tuning on rolling 2–4 week patient histories, regular re-calibration upon error drift, and joint benchmarking under standardized environments such as GLYFE (Bois et al., 2020).

7. Benchmarking, Comparative Evaluation, and Reproducibility

Benchmark studies such as GLYFE and focused comparisons of transformer variants provide calibrated, per-patient performance stratified by prediction horizon, and comprehensive clinical-risk acceptability (Bois et al., 2020, Karagoz et al., 12 May 2025). The general consensus is:

Model Type	RMSE (30 min)	RMSE (60 min)	Clinical Acceptability (A+B)
SVR/GP-DP	1.7–9.0 mg/dL	14–25 mg/dL	>95%
LSTM/Bi-GRU	9–15 mg/dL	18–26 mg/dL	≥90% (Euglycemia)
Patch-wise Transformer	15.8 mg/dL	24.6 mg/dL	Noted “best” for long horizons
ARIMA/ARX	13–16 mg/dL	23–35 mg/dL	85–92%

These numbers refer to in silico and real T1DM datasets; error magnitudes are higher in real-world, event-heavy or T2DM settings. Comparative studies make clear that classical ARIMA is outperformed by kernel and deep designs, and that the repeated inclusion of patient-level context and short- to medium-length histories is essential for optimal performance.

Personalized blood glucose forecasting now integrates a diverse ecosystem of modeling strategies, device architectures, and privacy-preserving frameworks. Current research emphasizes hybrid dynamical-neural methods, robust loss formulations, and deployment-aware adaptation, while ongoing challenges span clinical translation, hypo-event targeting, and scalable, accurate, explainable personalization across the diabetes spectrum (Rodriguez, 2021, Sergazinov et al., 2022, Rigamonti et al., 21 Jan 2026, Cichosz et al., 2 Jan 2026, Dave et al., 2024, Karagoz et al., 12 May 2025, Piao et al., 2024, Yin et al., 2024).