Process-Guided Concept Bottleneck Model

Updated 16 January 2026

Process-Guided Concept Bottleneck Models incorporate domain-specific process constraints to enhance interpretability and robustness by enforcing expert or mechanistic concept dependencies.
They utilize a modular training strategy that combines pre-training individual concept encoders with joint end-to-end fine-tuning using process-based regularizers to mitigate supervision gaps.
Empirical results in scientific estimation and clinical imaging confirm that PG-CBM reduces bias and improves out-of-domain generalization compared to standard CBMs.

The Process-Guided Concept Bottleneck Model (PG-CBM) denotes a class of architectures that integrate domain-specific process knowledge directly into Concept Bottleneck Models (CBMs), constraining the learning and inference pathway via human-defined, interpretable bottlenecks aligned with causal or expert-ranked conceptual priorities. PG-CBM simultaneously addresses the interpretability of deep models, robustness under domain shift, and alignment with domain-relevant mechanisms, with notable deployments in scientific estimation tasks and clinical image analysis (Pang et al., 2024, Asiyabi et al., 15 Jan 2026, Lin et al., 2022).

1. Concept Bottleneck Models and Motivation

CBMs improve explainability by placing an intermediate layer of semantic concepts between the model’s input and output, factorizing prediction as $f(x) = g(h(x))$ , where $h$ predicts concepts $\hat z$ and $g$ maps them to the task output $\hat y$ (Asiyabi et al., 15 Jan 2026). However, standard CBMs have critical limitations in many scientific and clinical settings:

Full annotation of every concept on every sample is often infeasible, leading to supervision gaps.
Empirically learned concept-output mappings ignore known dependencies between concepts.
These models are susceptible to out-of-distribution failures and spurious correlations.

PG-CBM extends CBMs by enforcing mechanistic or expert-guided dependencies among concepts and between concepts and outputs. In clinical PG-CBM variants (Pang et al., 2024), human expertise determines which concepts “matter”; in scientific PG-CBM frameworks (Asiyabi et al., 15 Jan 2026), mechanistic regularizers are imposed reflecting biophysical or domain-grounded causal relationships. In ultrasound analysis, sequential reasoning steps and hierarchical bottlenecks further instill domain priors (Lin et al., 2022).

2. Formal Architecture and Optimization Objective

PG-CBM structures prediction as a sequence of modules—feature encoding, concept prediction, and aggregation—with additional process-guidance injected either as regularizers at the concept interaction stage, or via explicit concept ranking. For an input $x$ , intermediate concepts $z_1,...,z_k$ are predicted:

$\hat c = (h_1(x), h_2(x), ..., h_k(x)), \qquad \hat y = g(\hat c)$

Process guidance is encoded through the loss function as:

$\mathcal L = \mathcal L_{\text{task}}(\hat y, y) + \lambda_1\, \mathcal L_{\text{concept}}(\hat c, z) + \lambda_2\, \mathcal L_{\text{process}}(\hat c)$

Where:

$\mathcal L_{\text{concept}}$ : concept-wise regression/classification loss, with masking for missing annotations.
$\mathcal L_{\text{task}}$ : output loss, e.g., cross-entropy or $L_2$ .
$\mathcal L_{\text{process}}$ : summation over process constraints $g_{ij}(\hat z_i, \hat z_j)$ reflecting monotonicity, allometric, or correlation requirements (for scientific PG-CBM (Asiyabi et al., 15 Jan 2026)), or clinical guidance regularizer aligning concept importance maps (for clinical PG-CBM (Pang et al., 2024)).

For classification tasks with clinical PG-CBM, the regularizer aligns the model-selected concept importance map $\Delta Y_{k, l}$ , obtained by “remove-one-concept” perturbation, with expert-elicited ranks $\alpha_{k,l}$ (High/Mid/Low), by maximizing $\Delta Y_{k, l}$ for high-importance concepts and minimizing for low-importance ones.

3. Encoding and Eliciting Process Knowledge

Process knowledge in PG-CBM is domain-dependent:

Expert concept ranking (clinical PG-CBM): Concepts are ranked by experts per class, usually in three tiers (High/Mid/Low). During training, the removal effect of each concept on output is measured and regularized to match these ranks (Pang et al., 2024).
Causal/biophysical regularizers (scientific PG-CBM): Functional relationships like monotonicity ( $g_{ij} = \max(0, -(\hat z_j - \hat z_i))$ ), allometric scaling, or correlation constraints are implemented as soft penalties in the loss (Asiyabi et al., 15 Jan 2026). Only qualitative forms are enforced, not precise numeric models.

This process guidance can be extended via learnable or continuous expert scores, weak/self-supervision from simulators, or interactive human feedback.

4. Training Regimes and Data Utilization

PG-CBM accommodates heterogeneous and sparse concept supervision:

Two-stage training (scientific PG-CBM): Each concept encoder $h_i$ is pre-trained with its own dataset. Aggregator $g$ is fine-tuned end-to-end using limited labels for the final output, with process constraints active and masked losses for missing concept annotations (Asiyabi et al., 15 Jan 2026).
Joint end-to-end optimization (clinical PG-CBM): Models are trained with combined losses and guidance regularizers, leveraging annotated datasets with expert-ranked concepts (Pang et al., 2024).
Hierarchical teacher-forcing (ultrasound PG-CBM): Stage-wise independent training for segmentation (“seeing”), property extraction (“conceiving”), and classification (“concluding”) using relevant ground-truth or network outputs (Lin et al., 2022).

Hyperparameter selection involves grid search for regularization weights, standard optimizer settings (AdamW), and label smoothing. The ability to use partially labeled concept datasets enables utilization of multi-source data unavailable in standard CBMs.

5. Quantitative Performance and Interpretability

Empirical studies confirm PG-CBM’s superiority in out-of-domain generalization, bias reduction, and reliability:

Biomass estimation (scientific PG-CBM, AGBD case, (Asiyabi et al., 15 Jan 2026)):
- PG-CBM: RMSD = 21.8 Mg ha⁻¹, mean bias = +1.5 Mg ha⁻¹ (3.2%), absolute bias = 17.5 Mg ha⁻¹.
- Vanilla CBM: RMSD 24.3, bias 2.8. Black-box DL: RMSD 25.5, bias 4.0.
- ESA CCI: RMSD 38.5, bias −27.9. GEDI L4B: RMSD 19.7, bias −8.0.
- Consistent error profile across stem density, improved robustness to remote attribute-space outliers.
Clinical image classification (clinical PG-CBM (Pang et al., 2024)):
- RaabinWBC out-of-domain, VGG16+MLP: Baseline F1 = 54.45 ± 3.14; PG-CBM F1 = 58.40 ± 5.55 (+3.95)
- Skin lesion DDI out-of-domain, VGG16+Linear: Baseline F1 = 58.87 ± 0.39; PG-CBM F1 = 60.13 ± 0.85 (+1.26)
- Random guidance ablation confirms only systematic expert guidance confers gains.
Ultrasound quality (progressive CBM (Lin et al., 2022)):
- Outperforms non-concept bottleneck baselines on in-house and public benchmarks with no fine-tuning.

Intermediate concepts in PG-CBM architectures enable explicit auditing: errors or spurious learning manifest as anomalous concept values, facilitating localized diagnosis or human intervention before final prediction corruption.

6. Limitations and Extensions

Primary limitations include:

Expert annotation bottleneck: Human input required for concept ranking or annotation, though typically via coarse scales and only on small data samples (Pang et al., 2024).
Computational cost: Perturbation-based importance estimation (clinical PG-CBM) and multi-stage regularizers increase overhead; however, parallelization is feasible.
Static process guidance: Current implementations rely on fixed concept maps or regularizers and do not adapt dynamically to evolving expert priorities or causal structures.
Sparse concept label reliance: At least partial concept annotation is required; domains with no annotations must rely on weak supervision or simulators.

Notable extensions proposed include learnable importance weighting, continuous feedback, use of differentiable dynamical simulators (e.g., Neural ODEs), Bayesian uncertainty propagation, and domain-aware vision–language retrieval for process constraints.

7. Implications and Generalization

PG-CBM architectures provide a new paradigm for trustworthy, interpretable AI in clinical, scientific, and any domain where human-interpretable intermediate concepts and process priors are relevant.

Transparency by design: Explanations and intermediate diagnostic signals are inherent in the PG-CBM pathway, not post-hoc interpretations.
Domain-causal consistency: Mechanistic constraints and expert alignment reduce shortcut learning and enhance robustness to domain shifts.
Data efficiency and diagnostic utility: Capability for training with sparse, heterogeneous concept labeling, and for identifying latent source of errors within the bottleneck interface.
Applicability: Demonstrated in ecology (biomass, (Asiyabi et al., 15 Jan 2026)), medicine (cell images, skin lesions, (Pang et al., 2024)), and ultrasound (Lin et al., 2022). Extensions are plausible across radiology, pathology, ornithology, and other interpretable concept domains.

This suggests the PG-CBM framework is generalizable to broader applications, provided that relevant domain process knowledge or expert concept ranking is accessible. Its integration of interpretability, process-grounding, and multi-source learning constitutes a substantive advance toward robust, scientifically trustworthy AI systems.

Markdown Report Issue Upgrade to Chat

References (3)

Integrating Clinical Knowledge into Concept Bottleneck Models (2024)

Process-Guided Concept Bottleneck Model (2026)

Explainable fetal ultrasound quality assessment with progressive concept bottleneck models (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Process-Guided Concept Bottleneck Model (PG-CBM).