ECG Foundation Models: Scalable Deep Learning

Updated 13 January 2026

ECG Foundation Models are scalable deep learning frameworks pre-trained on vast, heterogeneous ECG data to generate versatile representations for multiple clinical tasks.
They employ advanced architectures such as Transformer-based backbones, convolutional networks, and mixture-of-experts to capture temporal and spatial characteristics in ECG signals.
Efficient fine-tuning methods like LoRA, linear probing, and adapter-tuning allow these models to achieve significant diagnostic accuracy improvements with minimal additional resource overhead.

Electrocardiogram (ECG) foundation models are large-scale, pre-trained deep learning architectures designed to learn general-purpose representations from massive and heterogeneous ECG datasets. These models are not tied to a specific diagnostic task but instead provide a flexible backbone that can be adapted—often with minimal effort—for a variety of downstream clinical applications, including arrhythmia detection, risk factor prediction, demographic estimation, and real-time monitoring. By aggregating data from millions of unlabeled traces and employing advanced self-supervised, contrastive, or generative pretraining strategies, ECG foundation models address the limitations of narrow, task-specific learners and maximize clinical scalability, robustness, and efficiency.

1. Core Architectures and Design Strategies

ECG foundation models span multiple neural network and ensemble design families, optimized for time-series signal processing:

Transformer-based backbones: These utilize self-attention mechanisms, including hierarchical temporal blocks (TimesNet), encoder variants tailored for masked modeling (MOMENT), and prompt-based generative models (TEMPO), to capture long-range dependencies, inter-lead relationships, and event temporal context (Xu et al., 28 Nov 2025).
Convolutional networks: Architectures such as convolutional encoder–transformers (ECG-FM), RegNet-style CNNs (ECGFounder), and hybrid ConvNeXt backbones (TolerantECG) focus on spatial and temporal feature extraction, often augmented with attention and aggregation modules (Xu et al., 28 Nov 2025, McKeen et al., 2024, Dang et al., 14 Jul 2025, Li et al., 2024).
Mixture-of-Experts and Ensemble designs: Recent ensemble frameworks, notably "EnECG," integrate multiple specialized foundation models, including TimesNet, DLinear, MOMENT, TEMPO, and ECG-FM, each pre-trained for time-series and clinical ECG tasks, coordinated via a low-rank adapted gating network (Xu et al., 28 Nov 2025).
Multi-modal and graph-aware systems: Models such as CSFM leverage Transformer encoders jointly across ECG, PPG, and textual domains, while FoundationalECGNet includes Graph Attention Networks and wavelet-augmented denoising for improved fidelity and interpretability in abnormality detection (Gu et al., 23 Jun 2025, Sk. et al., 10 Sep 2025).

2. Pretraining Objectives and Data Regimes

Foundation ECG models are consistently pre-trained on large, heterogeneous datasets—often exceeding one million recordings—using specialized objectives:

Self-supervised contrastive learning: InfoNCE and derivatives (SimCLR, BYOL, CPC) learn invariant representations by maximizing agreement between augmented views of the same ECG, with negative pairs spanning different subjects or time windows (McKeen et al., 2024, Song et al., 2024, Wan et al., 2 Mar 2025, Shu et al., 1 Dec 2025).
Masked and generative modeling: Masked autoencoders (MAE) reconstruct occluded patches of the ECG signal, while hybrid objectives may couple reconstruction with contrastive alignment (as in HL-based models and ECG-FM) (McKeen et al., 2024, Song et al., 2024, Wan et al., 2 Mar 2025, Xu et al., 28 Nov 2025).
Multi-modal and semantic integration: Models such as CSFM and EchoingECG extend to multimodal regimes—combining waveform and text, or ECG and ECHO/PPG—using joint contrastive and probabilistic embedding frameworks (Gu et al., 23 Jun 2025, Gao et al., 30 Sep 2025).
Clinically-guided contrastive weighting: CLEF introduces adaptive negative-pair weighting with clinical risk scores, aligning latent distances with medically meaningful inter-subject dissimilarity and robustly handling missing metadata (Shu et al., 1 Dec 2025).

Diverse pretraining corpora include MIMIC-IV-ECG, Harvard-Emory ECG Database (HEEDB), PhysioNet, PTB-XL, CODE-15, Chapman-Shaoxing, and ambulatory/wearable collections, with preprocessing pipelines standardizing sampling rates (250–500 Hz), lead configurations, and segment durations (5–10 s typical; up to hours for ambulatory data) (Li et al., 2024, Wan et al., 2 Mar 2025, Xu et al., 28 Nov 2025, Dang et al., 14 Jul 2025, Lunelli et al., 12 Sep 2025).

3. Adaptation, Fine-tuning, and Efficient Transfer

Most ECG foundation models are designed for parameter-efficient adaptation to downstream tasks:

LoRA and adapter-tuning: Parameter-efficient LoRA (Low-Rank Adaptation) is often applied only on newly attached output layers or gating heads, freezing >99% of backbone parameters, as in EnECG (Xu et al., 28 Nov 2025). This helps reduce computation and memory demands—EnECG peak memory <10GB (five tasks), compared to ≥12GB for full fine-tuning per backbone.
Linear probing and lightweight heads: A frozen backbone + trainable linear head can deliver strong classification/regression (e.g., ECG-FM achieves AUROC 0.930, AUPRC 0.735 under linear probe), confirming feature generality (McKeen et al., 2024, Xu et al., 28 Nov 2025).
Ensemble learning and MoE: Dynamic mixture-of-experts strategies outperform static or zero-shot ensembles, saturating accuracy with N=5 (up to +15% F₁ loss for smaller ensembles) (Xu et al., 28 Nov 2025).
Preview linear probing and stochastic depth: Post-training strategies introduce a brief, frozen linear probing phase and stochastic depth regularization, closing the gap between large pre-trained FMs and specialized models, with gains up to +3.3% AUROC and +20.9% AUPRC on PTB-XL (Zhou et al., 16 Sep 2025).

4. Multi-task Learning and Evaluation Protocols

A defining trait of ECG foundation models is simultaneous optimization for diverse downstream tasks, often within a unified framework:

Typical multi-task suite (EnECG):
1. RR-interval estimation (regression)
2. Age estimation (regression)
3. Sex classification (binary)
4. Potassium abnormality detection (binary; rare, ~3% incidence)
5. Arrhythmia detection (multiclass, e.g. 15-way)

Joint loss is a weighted sum over per-task losses:

$L(\theta, A, B, \psi) = \sum_t \lambda_t L_t(y_t, \hat{y}_t)$

Evaluation paradigms: Benchmarks such as BenchECG and OpenECG standardize datasets, protocol, and cross-domain generalization, employing leave-one-dataset-out, data-scaling, and external validation across codebases (PTB-XL, MIMIC-IV, CPSC2018, Chapman, MIT-BIH, Apnea-ECG, etc.) (Wan et al., 2 Mar 2025, Lunelli et al., 12 Sep 2025).
Metrics: AUROC, AUPRC, MAE (for regression), macro- and weighted-F1, with significance determined via bootstrapping, paired t-tests, and confidence intervals (Xu et al., 28 Nov 2025, Gu et al., 23 Jun 2025, Lunelli et al., 12 Sep 2025, Li et al., 2024).

5. Empirical Gains, Robustness, and Clinical Impact

ECG foundation models have demonstrated substantial performance improvements and practical gains:

Accuracy improvements: EnECG attains RR MAE 87.7 ±6.4 (vs. 141.5), age MAE 12.97 ±0.61 (vs. 13.41), sex F₁ 0.69, K⁺ F₁ 0.53 (vs. 0.50), arrhythmia accuracy 0.76 (vs. 0.66), statistically significant at p<0.05 across seeds (Xu et al., 28 Nov 2025).
Resource and memory efficiency: EnECG achieves state-of-the-art accuracy with <0.1% of backbone parameters adapted, ≤5% increase in FLOPs/sample, and supports real-time (<0.1 s/patient) clinical workflows on commodity GPUs (Xu et al., 28 Nov 2025).
Robustness to missing data and noise: TolerantECG is robust to arbitrary lead subsets and realistic noise scenarios, outperforming baselines across PTB-XL and MIT-BIH test conditions (Dang et al., 14 Jul 2025). AnyECG exhibits superior performance with only 1–4 leads and under strong noise/heterogeneity, driven by dedicated tokenization and denoising stages (Wang et al., 2024).
Label efficiency and data scaling: Pretrained models (e.g., ECG-JEPA, ECG-CPC) achieve up to 9× label efficiency on structure-function tasks; pretraining gains are invariant under subsampling for N∈250,1000.
Multimodal and cross-domain generalizability: CSFM transfers robustly across ECG, PPG, and clinical text, maintaining high accuracy (e.g., SBP MAE 4.42 mmHg, macro-F₁ 0.328) under variable lead configurations and device types (Gu et al., 23 Jun 2025). EchoingECG models uncertainty for ECG→ECHO prediction, outperforming prior deterministic and multimodal baselines in zero- and few-shot regimes (Gao et al., 30 Sep 2025).

6. Limitations and Future Directions

Despite rapid advances, current ECG foundation models remain limited by several factors:

Domain gaps and task coverage: Most models excel in adult ECG interpretation; gaps persist for cardiac structure/function prediction, high-dimensional clinical outcomes, and patient characterization (Al-Masud et al., 29 Sep 2025).
Pretraining data heterogeneity: Methodological differences in training corpora and preprocessing hinder direct, architecture-only comparisons (Lunelli et al., 12 Sep 2025, Li et al., 2024).
Model interpretability and trust: Transformer and deep CNN FMs are opaque; saliency map alignment to clinical landmarks has improved transparency, but regulatory-grade explainability awaits standardization (McKeen et al., 2024, Dang et al., 14 Jul 2025).
Scaling laws and efficiency: While data scaling experiments show saturation at ~60–70% of SSL pool size (BYOL/MAE), marginal returns for contrastive-only objectives (SimCLR) require larger datasets, raising resource constraints (Wan et al., 2 Mar 2025).
Multimodal, federated, and privacy-preserving expansion: Integrating ECG with other biosignals, demographics, and EHR at scale is a frontier; federated learning and privacy-preserving strategies remain early-stage (Han et al., 2024).

Prominent future extensions include hierarchical MoE with class-specific gating, unified joint pretraining on comprehensive ECG corpora, multi-modal late fusion (ECG, PPG, text, imaging), adaptive expert selection, and deeper generalization benchmarking (Xu et al., 28 Nov 2025, Gu et al., 23 Jun 2025, Wan et al., 2 Mar 2025, Al-Masud et al., 29 Sep 2025, Han et al., 2024).

7. Summary Table of Key Models and Innovations

Model/System	Pretraining Regime	Innovation	Principal Gains or Findings	Reference
EnECG	Ensemble + LoRA/MoE	Efficient adapters, multi-expert fusion	+50% memory reduction, SOTA accuracy	(Xu et al., 28 Nov 2025)
ECG-FM	Contrastive + generative	Masked contrastive, saliency, open weights	AUROC 0.935 (LVEF<40%), robust	(McKeen et al., 2024)
CSFM	Masked Transformer	Multimodal, channel-agnostic	Robust transfer, low memory	(Gu et al., 23 Jun 2025)
TolerantECG	ConvNeXt + duo-distill	Robust to missing/noisy leads	Best/2nd-best PTB-XL, MIT-BIH	(Dang et al., 14 Jul 2025)
AnyECG	Tokenizer + CMA	Rhythm codebook, proxy-task synergy	+6% multi-task gain, SOTA anomaly/long	(Wang et al., 2024)
CLEF	ResNeXt + risk-weighted	Clinically-guided contrastive loss	+2.6% AUROC, robust single-lead	(Shu et al., 1 Dec 2025)
ECGFounder	RegNet CNN, PU loss	Large-scale supervised backbone	150 labels, expert-level AUC ≥0.95	(Li et al., 2024)
xECG (BenchECG)	xLSTM + SimDINOv2	Linear complexity, robust pretraining	SOTA BenchECG score 0.868, long-context	(Lunelli et al., 12 Sep 2025)
CardX (ExChanGeAI)	MoE (4 experts), router	Privacy-preserving, plugin platform	6× fewer params, strong external F1	(Bickmann et al., 17 Mar 2025)
EchoingECG	Probabilistic CLIP	Uncertainty-aware ECG→ECHO	SOTA zero/few-shot echo prediction	(Gao et al., 30 Sep 2025)

Foundation models for ECG analysis now enable high-accuracy, efficient, and generalizable cardiac diagnostics across large, diverse datasets, supporting robust multi-task frameworks, resource-efficient adaptation, and clinical deployment within standard hospital or edge hardware environments.