Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

Published 23 Apr 2026 in cs.LG, cs.AI, and cs.CV | (2604.21395v1)

Abstract: We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises supervised loss must retain non-zero Jacobian sensitivity in directions that are label-correlated in training data but nuisance at test time. This is not a contingent failure of current methods; it is a mathematical consequence of the supervised objective itself. We call this the geometric blind spot of supervised learning (Theorem 1), and show it holds across proper scoring rules, architectures, and dataset sizes. This single theorem unifies four lines of prior empirical work that were previously treated separately: non-robust predictive features, texture bias, corruption fragility, and the robustness-accuracy tradeoff. In this framing, adversarial vulnerability is one consequence of a broader structural fact about supervised learning geometry. We introduce Trajectory Deviation Index (TDI), a diagnostic that measures the theorem's bounded quantity directly, and show why common alternatives miss the key failure mode. PGD adversarial training reaches Jacobian Frobenius 2.91 yet has the worst clean-input geometry (TDI 1.336), while PMH achieves TDI 0.904. TDI is the only metric that detects this dissociation because it measures isotropic path-length distortion -- the exact quantity Theorem 1 bounds. Across seven vision tasks, BERT/SST-2, and ImageNet ViT-B/16 backbones used by CLIP, DINO, and SAM, the blind spot is measurable and repairable. It is present at foundation-model scale, worsens monotonically across language-model sizes (blind-spot ratio 0.860 to 0.765 to 0.742 from 66M to 340M), and is amplified by task-specific ERM fine-tuning (+54%), while PMH repairs it by 11x with one additional training term whose Gaussian form Proposition 5 proves is the unique perturbation law that uniformly penalises the encoder Jacobian.

Abstract PDF Upgrade to Chat

Authors (1)

Vishal Rajput

Summary

The paper shows that ERM inherently preserves non-zero Jacobian sensitivity to label-correlated nuisances, even when these features are irrelevant at test time.
It introduces the Trajectory Deviation Index (TDI) to quantify representational drift and demonstrates that PMH regularization minimizes this geometric flaw.
The work unifies various robustness phenomena by linking spurious predictive features to vulnerabilities, providing a minimal, architecture-agnostic repair strategy.

Overview and Theoretical Contributions

This work rigorously establishes that Empirical Risk Minimization (ERM)—the prevailing paradigm for supervised learning—induces a fundamental, necessary geometric flaw in neural encoders: any ERM minimizer is compelled to preserve sensitivity (i.e., maintain non-zero Jacobian norm) in directions aligned with label-correlated nuisance features, even if these features are irrelevant or actively nuisance at test time. This property is not a by-product of model architecture, data size, or insufficient training, but rather a mathematical requirement of the objective function itself. The authors denote this structural artefact as the geometric blind spot.

The central result, formalized in Theorem 1, provides a lower bound on the embedding drift (i.e., the expected squared displacement of the encoder representation under small isotropic Gaussian perturbations to the input) in any direction that is spuriously correlated with the label:

$D(\phi^*, \sigma) \geq \frac{\sigma^2 \rho^2 C(P)}{L^2} > 0$

where $\rho$ quantifies the correlation of the nuisance feature with the label, $C(P)$ depends only on the data distribution, and $L$ is the Lipschitz constant of the decoder. This lower bound persists independently of model capacity, architecture, or dataset size, and holds for any proper scoring rule.

Unification of Robustness Phenomena

The theoretical machinery not only recasts existing empirical observations as corollaries, but also unifies four previously disparate lines of research:

Non-robust Predictive Features: Models encode high-frequency or semantically spurious but label-correlated features [Ilyas et al. 2019].
Texture Bias: ERM-trained vision models are biased toward local textures that correlate with labels, regardless of whether these are causally relevant [Geirhos et al. 2019].
Corruption Fragility: Models are fragile under common corruptions (e.g., noise, blur), as these perturbations often align with label-correlated nuisance directions [Hendrycks & Dietterich 2019].
Robustness–Accuracy Tradeoff: Regularizing away such nuisance directions inevitably incurs an accuracy penalty proportional to $\rho^2$ [Tsipras et al. 2019].

This results in a precise geometric explanation for why these vulnerabilities are persistent and inescapable under standard supervised objectives.

Diagnostics and Experimental Results

Trajectory Deviation Index (TDI)

To empirically probe the geometric blind spot, this work introduces the Trajectory Deviation Index (TDI): a metric quantifying the expected representational drift under small isotropic Gaussian input perturbations. Unlike accuracy metrics, Centered Kernel Alignment (CKA), intrinsic dimension, or the Jacobian Frobenius norm, TDI specifically targets isotropic path-length distortion, capturing structural anisotropy and geometric roughness directly tied to Theorem 1’s lower bound.

Comparing Geometric Regularization Schemes

The authors systematically study three objective regimes:

Standard ERM: No explicit geometric regularization; encoders display significant sensitivity to nuisance-aligned perturbations, as predicted.
Adversarial Training (PGD): Regularizes the model to be robust along specific, worst-case directions. While this reduces the aggregate Jacobian norm, it leads to strong anisotropy—most sensitivity is “rotated” away from the adversarial direction, concentrating elsewhere. Notably, PGD achieves the lowest Jacobian Frobenius norm but exhibits worse TDI (1.336) than ERM itself (1.093), confirming Corollary 4’s prediction that adversarial training can exacerbate isotropic roughness.
Penalizing the Mean-Hessian (PMH): A new regularization, which penalizes the representation drift due to isotropic Gaussian noise—empirically and theoretically shown (Proposition 5) to be the unique perturbation law achieving uniform Jacobian suppression. PMH achieves the lowest TDI (0.904) with only a moderate reduction in Jacobian norm.

Across seven diverse tasks—including vision (CIFAR-10, Chest X-ray), natural language (BERT on SST-2), graph data, molecular regression, and at large scale (ImageNet ViT-B/16)—the blind spot is universal, measurable with TDI, and, crucially, repairable via PMH.

Numerical Highlights

Blind Spot Ratios: The geometric blind spot becomes sharper with scale: in BERT-family models, the blind-spot ratio drops monotonically from 0.860 (DistilBERT-66M) to 0.742 (BERT-large-340M).
Fine-Tuning Effect: Task-specific fine-tuning amplifies the blind spot by 54%, while PMH repairs it by up to 11×, with a negligible accuracy cost.
Corruption Robustness: TDI correlates with robustness under both Gaussian and a broad suite of non-Gaussian corruptions, outperforming alternatives without the need for corruption-specific training.

Theoretical and Practical Implications

Structural Reframing of Robustness

This analysis reframes robustness problems as fundamentally geometric: non-isometry induced by spurious label correlations is inescapable under ERM, leading to representations that are necessarily vulnerable to perturbations in specified nuisance directions. Adversarial robustness phenomena are, in this lens, a special case of broader geometric fragility.

Minimal and Universal Repair

Proposition 5 provides a decisive characterization: Gaussian noise is the unique distribution that, when used for penalizing representation drift, uniformly suppresses all directions in the encoder Jacobian. The PMH loss is thus a mathematically minimal, architecture-agnostic intervention; it requires only a single additional regularization term and no architectural changes. This enables immediate applicability to any supervised pipeline—including but not limited to neural encoders used by CLIP, DINO, and SAM.

Diagnostic Utility at Scale

TDI is shown to be predictive of actual robustness and geometric regularity at both small and foundation-model scale. The practical upshot is that representational non-isometry—a previously elusive failure mode—can be detected and ameliorated in deployed systems with minimal overhead.

Limitations and Future Directions

The lower bound is existential and not a tight per-model predictor; for stronger results, more precise estimation of nuisance subspaces is needed.
PMH is most effective when the nuisance structure aligns with the Gaussian perturbation model; for domains like structured pose data, alternate or learned noise models may yield further gains.
The work focuses on out-of-distribution distribution shift; in-distribution adversarial robustness is not directly targeted, though incidental improvements are observed.

Research avenues include leveraging data-driven or domain-adaptive estimations of nuisance subspaces for tighter geometric control and developing generalizations of PMH for modalities where nuisance factors are non-Gaussian or structured.

Conclusion

This work delivers a precise theoretical and empirical account of a necessary geometric blind spot in representations learned via ERM. The key claims—universality of the blind spot, its amplifying effect under fine-tuning and scaling, and the unique optimality of Gaussian-based PMH for isotropic geometric repair—are substantiated across both mathematical theory and extensive experiments. The implications for AI are direct: all ERM-trained models deployable today are subject to this vulnerability, but the remedy—a minimal, principled regularization via PMH—is both theoretically unique and practically effective. TDI offers a rigorous diagnostic for real-world models, enabling both measurement and targeted repair of representational geometry in supervised learning.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What this paper is about

This paper says something surprising about almost every AI model trained in the usual way (supervised learning, where models learn from lots of labeled examples). It proves that this training style forces a built‑in “blind spot” in how models see the world. In short: if the training data contains any pattern that helps predict the label—even if that pattern is actually a distraction or “nuisance” at test time—the model’s internal representation must stay sensitive to that pattern. This isn’t a bug of a specific model or dataset; it’s a math fact about the training objective itself.

The authors also propose a simple fix called PMH and a new measurement tool called TDI to diagnose and repair this problem.

The main questions, in simple terms

Do supervised models have a built‑in weakness that makes them pay attention to the wrong things if those things helped during training?
Can we measure this weakness directly and reliably?
Can we reduce the weakness without breaking accuracy?

How they approached it (with everyday analogies)

Think of a model as a map-maker:

The model turns inputs (like images or sentences) into an internal “map” (its representation).
When you nudge the input a tiny bit (like slightly changing pixels), a good map shouldn’t twist or stretch wildly, especially in directions that don’t matter.

Two core ideas:

The theorem (the “blind spot”)

The authors prove that supervised training (empirical risk minimization, ERM) cannot make the internal map perfectly “smooth” in certain directions. If a pattern in the data is correlated with the label during training—like background texture in photos or sentence length in reviews—the model must keep some sensitivity to it. Even if that pattern is a nuisance later, the model can’t fully ignore it without losing training accuracy.
In geometry terms, the representation can’t be perfectly isometric (equally smooth) in those directions; there will be unavoidable “bumps.”

Measuring the bumps: TDI

TDI stands for Trajectory Deviation Index. Imagine walking along the model’s representation while you gently shake the input in all directions equally (tiny random wiggles). TDI measures how much the internal “path” gets bent and stretched by those wiggles. Lower TDI means a smoother, more stable internal map; higher TDI means the map is bumpy and twisty.
Why this matters: TDI measures exactly what the theorem says must be nonzero—how much the map gets distorted under small, equal-in-every-direction changes.

Fixing the geometry: PMH

PMH is a small extra training term that asks the model’s internal representation to stay similar when the input is slightly noised with Gaussian noise (the “shake it equally in all directions” kind of noise).
Important twist: The authors prove that Gaussian noise is uniquely suited for this job. It is the only perturbation that pressures the model to reduce sensitivity equally across all directions (not just some).
Contrast with adversarial training (PGD): Adversarial training focuses on the single worst direction, like plugging one hole in a balloon—pressure just bulges somewhere else. That can make the map smoother in one direction but rougher overall.

Terms in plain language:

Jacobian: Think of it as “how sensitive is the internal representation to tiny input changes?” (like the local slope in every direction).
Isotropic: The shake is fair in every direction (no preferred direction).
Proper scoring rule: A standard way to measure prediction errors (e.g., cross-entropy). The theorem works across many such losses.

What they found and why it matters

Here are the key results, explained simply:

The blind spot is inevitable under supervised learning.
- If a training pattern (like texture) helps predict the label, the model must remain sensitive to it. No amount of data size or model size guarantees removal. This is a structural limitation of the training goal, not a training failure.
This one theorem explains four well-known issues:
- Non-robust predictive features: Models latch onto fragile patterns (like tiny pixel changes) that happen to predict labels.
- Texture bias: Vision models rely too much on texture over shape when textures correlate with labels.
- Corruption fragility: Small corruptions (like noise or blur) push inputs in directions the model is sensitive to, causing errors.
- Robustness–accuracy tradeoff: If you try to remove sensitivity to those nuisance patterns, you may lose some in-distribution accuracy, because the model was using them to get answers right on training-like data.
Bigger models don’t fix it; they can make it worse.
- More capacity lets models encode all predictive patterns more precisely, including nuisances. The authors see this blind spot get stronger as LLMs get larger, and it also grows when models are fine-tuned on a specific task.
TDI reveals problems other metrics miss.
- Adversarial training (PGD) can make the overall sensitivity smaller (the Jacobian size drops), but the sensitivity gets funneled into fewer directions. TDI catches that the geometry on clean inputs actually gets worse:
- Example numbers (lower is better): ERM TDI ≈ 1.09, PGD TDI ≈ 1.34 (worse), PMH TDI ≈ 0.90 (better).
- Bottom line: Measuring just “how much” sensitive (size) is not enough; you must measure “in how many directions” and “how it distorts paths,” which TDI does.
PMH repairs the geometry with a minimal change.
- Add one extra term during training that asks internal representations to agree between an input and its slightly noised version (Gaussian noise).
- No need to change the model’s architecture.
- It lowers TDI (makes the internal map smoother) across many tasks and models, often with tiny or no accuracy cost.
- The authors also give a short proof that Gaussian noise is the unique choice that penalizes sensitivity equally in all directions.
It works broadly.
- They test across several vision tasks, BERT-based sentiment analysis, and large vision backbones (like those used in CLIP, DINO, SAM).
- The blind spot shows up across the board and can be reduced with PMH.

Why this matters (implications)

For practitioners: Supervised learning has a built-in geometric blind spot. If your training data has spurious but label-linked patterns, your model will likely rely on them—and be fragile in those directions. Simply scaling up models or data won’t reliably remove this.
For evaluation: Use TDI or similar geometry-aware metrics, not just accuracy or gradient size, to see if a model’s internal map is stable in all directions.
For training: Consider adding PMH (or a similar isotropic geometry-regularizing term) to make representations smoother and more robust, with minimal changes and cost.
For research: This reframes adversarial robustness and related issues as consequences of a single structural fact about supervised objectives. It suggests focusing on representation geometry, not only on outputs.
For safety and reliability: Models whose internal maps are smoother in all directions tend to be more reliable under everyday noise and unexpected shifts.

Note: The paper is a preprint (not yet peer-reviewed), so the community will still need to validate, test limits, and refine these ideas. However, the core message is clear and practical: supervised training installs a geometric blind spot, TDI can measure it, and PMH offers a minimal, principled way to repair it.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise, actionable list of what remains uncertain or unexplored in the paper and where future research can extend, test, or refine the claims.

Realism of the correlated-nuisance assumption: The definition requires I(n;y|s)=0 (nuisance predictive but redundant given signal), which is strong. How do the theorem and bounds change when n and s interact (e.g., synergistic or interaction terms), or when I(n;y|s)>0?
Identifiability of s(x) and n(x): The theory presumes these factors exist; the experiments often require domain knowledge to “name” n (e.g., QM9). How can one discover or estimate nuisance subspaces automatically in high-dimensional data without supervision?
Estimation of distributional constants: The bounds depend on ρ, C(P), or Δ(P,ℒ) and on the decoder Lipschitz constant L. How can these be estimated robustly from finite samples? Are there practical estimators with error bars and guidance on sample complexity?
Tightness of the lower bound: How tight is D(φ*,σ) ≥ σ^{2ρ^2C(P)/L²} in realistic settings? Can one derive matching upper bounds or show regimes where the bound is vacuous or conservative?
Dependence on decoder Lipschitz constant L: The theory uses L explicitly, but deep decoders’ L is unknown and typically large. Can architectures or training procedures that directly control L (e.g., spectral normalization, Lipschitz networks) reduce the blind-spot bound in practice?
Beyond the linearized regime: Results rely on σ→0 linearization with O(σ⁴⁾ remainder and often assume a Lipschitz Jacobian. How do the guarantees behave at finite σ used in training (e.g., 0.1–0.2), especially for ReLU networks with piecewise-constant, non-Lipschitz Jacobians?
Proposition 5 “Gaussian uniqueness”: The proof appeals only to the covariance Σδ, implying any zero-mean spherical distribution with Σδ=σ^2I yields the same first-order penalty. Is Gaussian truly unique beyond the second-moment criterion? Empirically compare Gaussian vs. Rademacher, uniform-on-sphere, or Student-t noise under identical covariance.
Finite-sample and optimization effects: Theorem 1 concerns population ERM minimizers, but practical training finds approximate minima on finite data. How do optimization error and sampling noise alter the bound and the observed TDI?
Scope of adversarial baselines: The paper mainly evaluates VAT/PGD. How do other robust objectives (TRADES, MART, ALP, MaxUp, GAT, consistency-regularized adversarial training) affect TDI, anisotropy, and the theorem’s predicted “balloon-squeezing” effect?
Standard augmentation and OOD methods: The claim that standard augmentation cannot close the blind spot is asserted rather than proven generally. Provide either a formal impossibility result or broad empirical tests including strong augmentation pipelines (e.g., AugMix, DeepAugment) and OOD methods (IRM, GroupDRO, Fishr, REx).
Breadth of domains: Experiments span several tasks, but open questions remain for detection/segmentation, speech/audio, time series, RL, generative modeling, and large-scale NLP beyond SST-2. Does the blind spot and PMH behavior generalize to these settings?
Foundation-scale verification: Claims of presence at “foundation-model scale” need systematic, detailed evaluations (e.g., CLIP, DINO, SAM backbones) with TDI, Jacobian statistics, and robustness metrics across diverse datasets and perturbations.
TDI validity and invariances: Analyze TDI’s sensitivity to representation scaling, normalization layers, pooling, residual paths, and layer dimensionality. Is TDI comparable across architectures and depths? Would per-layer normalization choices or rescaling game the metric?
Directional diagnostics: TDI aggregates over directions and layers. Develop diagnostic tools that localize which nuisance-aligned directions dominate (e.g., Jacobian singular spectrum, directional TDI, anisotropy maps) to guide targeted repairs and interpretability.
Statistical robustness: Many results are reported with limited seeds. Provide confidence intervals, significance tests, and effect sizes, plus cross-seed variability for TDI, Jacobian Frobenius, and robustness metrics.
Predictivity of TDI for robustness: Quantify the correlation between TDI and downstream robustness across datasets and perturbation families with rigorous statistical analysis (correlation coefficients, confidence bounds), probing failure cases.
Tradeoff characterization beyond O(ρ^2): Corollary 3 gives an O(ρ²⁾ cost for nuisance suppression. What are the constants, and how does the tradeoff behave when ρ is large, multi-nuisance factors exist, or nuisances overlap with salient fine-grained cues?
Hyperparameter selection and automation: PMH needs σtrain, cap, λ, and schedule w(t). Provide automatic selection procedures (e.g., TDI-guided tuning or bilevel optimization) and study robustness to mis-specification in non-Gaussian deployment conditions.
Discovery of nuisance structure: In tasks where nuisance factors are unknown (QM9 case study), develop methods to infer nuisance-aligned subspaces (e.g., causal discovery, gradient-based attribution clustering, unsupervised latent disentanglement) and to adapt PMH accordingly.
Interaction with self-supervised pretraining: Contrastive/self-distillation pretraining often stabilizes representations. How does pretraining alter the blind-spot bound, TDI, and PMH gains during fine-tuning? Are there synergies or redundancies?
Architectural levers: Investigate if architectural constraints (e.g., spatial pooling strategies, equivariant layers, low-pass filters, spectral norms) reduce Jacobian anisotropy and TDI without hurting accuracy, and whether they complement PMH.
Calibration and uncertainty: Study how PMH and adversarial training affect calibration, entropy, and selective prediction under noise/shift, especially given changes in Jacobian geometry.
Robustness beyond isotropic noise: Many real shifts are structured (texture, occlusion, color casts). How does PMH trained with spherical perturbations transfer to structured, non-isotropic shifts, and can multi-perturbation or learned perturbation families better target nuisance subspaces?
Measuring and controlling anisotropy: Proposition 6 defines an anisotropy index but is not empirically reported widely. Provide practical estimators for Jacobian singular value spectra and test whether minimizing anisotropy aligns with improved TDI and robustness.
Avoiding over-smoothing: PMH suppresses Jacobian uniformly; in tasks that rely on fine-grained details, how do we prevent useful high-frequency features from being erased? Develop constraints or adaptive penalties that preserve signal-sensitive directions.
Causality perspective: The nuisance/signal split is implicitly causal. Can causal invariance objectives (IRM, Invariant Causal Prediction) be combined with PMH to target stable causal features while controlling geometric distortion?
Practical compute and deployment costs: Quantify PMH’s training/inference overhead, TDI evaluation cost, and memory footprint across scales, and compare to adversarial/consistency-training baselines.
Extension to non-proper scoring rules and structured losses: The general corollary covers strictly proper scoring rules. What happens with hinge/margin losses, detection/segmentation losses, or multi-task objectives with mixed losses?
Handling discrete or non-Euclidean inputs: For graphs, text tokens, or categorical features, Gaussian perturbations may be unnatural. What are principled perturbation families on manifolds or discrete spaces that preserve the “uniform Jacobian penalty” property?
Scale laws and extrapolation: The paper notes blind-spot worsening with model scale in limited ranges. Establish scale laws for TDI/aniso vs. parameter count and data size, and clarify when PMH reverses the trend.
Formal limits of augmentation repair: Provide a theoretical statement clarifying when (and why) any finite augmentation set cannot remove nuisance-correlated Jacobian sensitivity implied by Theorem 1.
Safety and fairness: Since the blind spot is tied to label–nuisance correlation, study demographic subgroups and fairness metrics. Does PMH reduce spurious-correlation harms across groups without masking minority signals?

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

The following applications can be deployed now using the paper’s findings, metrics, and PMH training recipe. When applicable, we note sectors, concrete tools/workflows that could emerge, and assumptions/dependencies affecting feasibility.

MLOps and Model Governance (software, cross-industry)
- Use-case: Add Trajectory Deviation Index (TDI) as a gating and monitoring metric alongside accuracy, loss, and Jacobian Frobenius for model selection and CI/CD.
- Workflow/tool: “Geometry audit” step in training pipelines that computes TDI@0 and TDI-vs-σ curves; fail builds if TDI worsens vs baseline; publish “Geometry section” in model cards with TDI curves and anisotropy index.
- Assumptions/dependencies: Requires implementing TDI probes with small Gaussian evaluation noise (σ→0); adds compute overhead for inference-only measurements; assumes differentiable encoders and supervised ERM training.
Robust Supervised Fine-Tuning via PMH (software, vision, NLP, graphs)
- Use-case: Improve out-of-distribution and corruption robustness with minimal accuracy cost by adding a single PMH term to supervised fine-tuning.
- Workflow/tool: Plug-in training module that:
- Adds representation matching loss L_PMH = ||φ(x) − φ(x+δ)||² with δ ∼ N(0, σ²I),
- Uses warm-up scheduling w(t) and a cap so L_PMH ≤ cap × L_task (cap/(1+cap) rule gives exact training share),
- Tunes σ_train to the largest value that does not reduce clean accuracy (or to expected deployment noise—“T-alignment”).
- Sectors:
- Vision (e.g., classifiers, re-id, pose, medical imaging),
- NLP (e.g., BERT fine-tuning to reduce sensitivity to spurious artifacts like sentence length or punctuation),
- Graphs (e.g., node/graph classification robustness).
- Assumptions/dependencies: Gaussian noise should be applied in an input space aligned with nuisance (e.g., in QM9-style tasks, apply noise to node features rather than 3D coordinates if geometry is the signal). Requires differentiable encoders and proper scoring rules (standard cross-entropy/MSE fit).
Replace or Augment Adversarial Training in Production Classifiers (software, security, vision/NLP)
- Use-case: Improve clean-input geometry and general corruption robustness where PGD-based adversarial training hurts isometry (as shown by higher TDI despite lower Jacobian Frobenius).
- Workflow/tool: Compare PMH vs. PGD using TDI and anisotropy index; favor PMH where clean-input geometry matters and adversarial point-defense reduces isometry.
- Assumptions/dependencies: For threat models requiring worst-case guarantees, PMH complements rather than replaces certified methods; validate against relevant attacks and corruptions.
Foundation-Model Fine-Tuning Hygiene (software, enterprise AI)
- Use-case: Prevent blind-spot amplification when fine-tuning large backbones (e.g., ViT-B/16, BERT variants).
- Workflow/tool: Add PMH during task-specific fine-tuning and track “blind-spot ratio” and TDI across model scales to detect geometry drift; prefer PMH-tuned checkpoints for downstream tasks.
- Assumptions/dependencies: Larger models can encode nuisances more precisely; expecting scale to “fix” robustness is unsafe—must measure and mitigate via PMH/TDI.
Data Audits for Spurious Correlations (industry, policy-facing, data-centric AI)
- Use-case: Quantify nuisance–label correlation (ρ) to forecast robustness–accuracy trade-offs before training interventions.
- Workflow/tool:
- Estimate ρ by probing correlation between candidate nuisance factors and labels,
- Use corollary-based scaling (trade-off ∝ ρ²) to plan interventions (PMH, data rebalancing, targeted augmentation),
- Prioritize data collection that reduces ρ where feasible.
- Assumptions/dependencies: Requires domain knowledge to enumerate plausible nuisances; estimates are approximate and task-specific.
Sector-specific deployments
- Healthcare imaging: Reduce scanner/site artifacts in diagnostic encoders via PMH; monitor TDI to ensure representational smoothness and better corruption robustness without major accuracy loss.
- Dependencies: Regulatory validation; confirm that applied noise targets nuisance dimensions (e.g., pixel-space augmentation may be acceptable; confirm with clinical stakeholders).
- Autonomous driving and robotics: Improve stability to lighting/texture/weather noise by PMH fine-tuning on camera/sensor inputs; use TDI to regression-test geometry across firmware updates.
- Dependencies: Map Gaussian noise to nuisance-relevant input spaces (e.g., pixel intensity, sensor features); simulation-in-the-loop to validate.
- Finance/risk modeling: Reduce reliance on spurious historical proxies (e.g., zip code) by auditing ρ and tracking TDI during supervised training; PMH to suppress sensitivity to broad nuisance directions.
- Dependencies: Must align perturbations to non-causal features; fairness constraints and legal compliance required.
- Manufacturing/quality control: Stabilize sensor-based defect detectors against benign variations (temperature, vibration) using PMH; gate model rollout on TDI improvements.
- Dependencies: Identify nuisance factors; ensure noise injection mirrors realistic perturbations.
- Education/EdTech/NLP: Fine-tune text classifiers with PMH to reduce sensitivity to formatting/length artifacts; track TDI to choose robust checkpoints for deployment at scale.
- Dependencies: Define perturbation space (token embeddings or subword-level).
Academic Practice and Benchmarking (academia, open-source)
- Use-case: Add TDI curves and anisotropy metrics to benchmark leaderboards; run “geometry audits” in model ablations; publish TDI alongside accuracy and robustness.
- Workflow/tool: Lightweight library to compute TDI@0, TDI-vs-σ, anisotropy index; templates for reporting in papers and model cards.
- Assumptions/dependencies: Community adoption; modest compute for TDI probes; standard differentiable models.
Developer Tools
- TDI + Anisotropy Toolkit: A Python package (e.g., PyTorch/TensorFlow) to compute TDI curves, Jacobian Frobenius, and anisotropy indices; simple API hooks for training/eval.
- PMH Trainer Plug-in: Drop-in module implementing Gaussian perturbation, schedule w(t), and cap/(1+cap) control; presets for vision, NLP, and graph tasks; “Auto-σ” tuner that selects σ_train to the largest value that preserves clean accuracy.
- CI/CD Integration: GitHub Actions templates to run geometry audits and fail builds when TDI regresses.

Long-Term Applications

The following rely on further research, scaling, validation, or standardization before broad deployment.

Standards and Regulation for Robustness Reporting (policy, safety-critical sectors)
- Use-case: Require reporting of TDI curves and geometry metrics in regulated ML (healthcare, automotive, finance) alongside traditional accuracy and calibration metrics.
- Potential outcome: “Geometric robustness” sections in audit reports; procurement checklists that specify TDI thresholds for acceptance.
- Dependencies: Consensus on metric definitions and thresholds; sector-specific validation that TDI correlates with field robustness outcomes.
New Training Objectives and Theory Beyond ERM (academia, foundational research)
- Use-case: Design and analyze objectives that explicitly counter the geometric bound (e.g., causality-aware objectives, structured invariances, or isotropy-promoting penalties that go beyond Gaussian PMH).
- Potential outcomes: Alternatives or complements to ERM for supervision; theory that extends to non-differentiable pipelines, sequence-to-sequence, or generative settings.
- Dependencies: Theoretical advances and large-scale empirical validation; careful trade-off studies vs. accuracy.
Automated Geometry-Aware AutoML/NAS (software, platform vendors)
- Use-case: Automated selection of σ_train, cap, and schedules; architecture and augmentation search guided by TDI and anisotropy scores.
- Potential outcomes: “Geometry-aware” AutoML that balances accuracy and isometry; controller policies that forecast robustness–accuracy trade-offs given data ρ estimates.
- Dependencies: Efficient TDI proxies (random projections, low-variance estimators) for fast search; robust cross-task generalization.
Hardware/Systems Support for Geometry Probing (semiconductors, cloud providers)
- Use-case: Accelerate Jacobian/TDI estimation in training/evaluation via specialized kernels or on-device probes (e.g., random directional derivatives).
- Potential outcomes: Real-time geometry monitors in edge devices; low-overhead TDI gating in large-scale training.
- Dependencies: Kernel engineering, support in ML frameworks, demonstrated performance/benefit at scale.
Data-Centric Strategies to Reduce ρ (industry, data collection/design)
- Use-case: Measurement and active reduction of nuisance–label correlation in training data via sampling, relabeling, or targeted data acquisition.
- Potential outcomes: Curated datasets with lower ρ that inherently decrease the blind-spot lower bound; improved robustness without heavy regularization.
- Dependencies: Reliable estimation of nuisances; acquisition budgets; risk of unintended shifts in label distribution.
Fairness and Compliance Auditing Using Geometry (policy, compliance teams)
- Use-case: Use the blind-spot framework to audit reliance on proxy features correlated with protected attributes; complement fairness tests with geometry metrics and nuisance-aligned perturbations.
- Potential outcomes: Early-warning diagnostics for proxy dependence; mitigation workflows that pair data balancing with PMH-like isotropic penalties.
- Dependencies: TDI is not a fairness metric; requires identifying and perturbing proxies aligned with protected attributes; legal and ethical oversight.
Certified and Multimodal Robustness (safety, foundation models)
- Use-case: Combine isotropic penalties (PMH) with certified robustness methods; extend TDI to multimodal inputs (vision–language–audio), each with modality-specific nuisance spaces.
- Potential outcomes: “Geometry certificates” for large multimodal models; robustness portfolios that generalize across unforeseen corruptions.
- Dependencies: New multimodal TDI definitions; certification techniques that incorporate isotropy; extensive validation.
Continuous Geometry Monitoring in Production (DevOps, AIOps)
- Use-case: Long-horizon tracking of TDI and anisotropy to detect drift, especially after fine-tuning or model updates; canary deployments gated by geometry thresholds.
- Potential outcomes: Reduced post-deployment fragility; quicker detection of regressions that accuracy metrics miss.
- Dependencies: Instrumentation, logging budgets, and alerting policies; stakeholder training.
Education and Practitioner Training (academia, industry skilling)
- Use-case: Incorporate “geometric blind spot” concepts into ML curricula and internal trainings; teach practitioners to measure TDI, estimate ρ, and use PMH.
- Potential outcomes: Widespread adoption of geometry-aware training and evaluation; improved robustness culture.
- Dependencies: Community materials, open-source exemplars, and reproducible labs at scale.

Cross-Cutting Assumptions and Dependencies

The theorem applies to supervised ERM with strictly proper losses and differentiable encoders; robustness bounds and TDI are most meaningful under small isotropic perturbations.
PMH assumes Gaussian perturbations in an input space that captures nuisance; domain knowledge may be required to choose the correct space (e.g., node features vs. coordinates).
TDI adds computation for evaluation; lightweight approximations (random projections, subsampling layers) may be necessary in large-scale settings.
In safety-critical contexts, combine PMH/TDI with domain-specific validation and, where needed, certified defenses—PMH is not a silver bullet for worst-case guarantees.
Larger models and task-specific fine-tuning can amplify the blind spot; plan for geometry monitoring and mitigation at these stages.

View Paper Prompt View All Prompts

Glossary

Adversarial training (PGD): A robustness method that trains models on worst-case input perturbations within a norm ball, often using Projected Gradient Descent to generate adversarial examples. "Adversarial training (PGD)."
Anisotropy (Jacobian anisotropy): Directional imbalance in sensitivity of a model’s representation to input changes, often concentrating sensitivity in few directions. "increases the anisotropy index $\mathcal{A}(\phi)$ "
Barlow Twins: A self-supervised contrastive method that encourages invariance by reducing redundancy between representations of augmented views. "SimCLR, BYOL, VICReg, and Barlow Twins"
Bayes: Refers to the Bayes-optimal predictor or minimal risk achievable with full knowledge of the data distribution. "an $n$ -independent predictor pays excess loss $\geq\rho^2$ above Bayes."
Bregman divergence: A measure of discrepancy between probability distributions induced by a convex function; used here to quantify deviation from an $n$ -blind conditional. "(Bregman divergence of true conditional from $n$ -blind conditional; see Lemma~\ref{lem:bregman})."
BYOL: A self-supervised learning method (Bootstrap Your Own Latent) that learns representations by predicting target network outputs without negative pairs. "SimCLR, BYOL, VICReg, and Barlow Twins"
CAE (Contractive autoencoder): An autoencoder variant that penalizes the Jacobian of the encoder to encourage locally invariant representations. "Contractive autoencoders (CAE,~\citep{rifai2011contractive})"
Centered Kernel Alignment (CKA): A similarity measure between representation spaces based on centered kernel alignment. "Centered Kernel Alignment (CKA,~\citep{kornblith2019similarity})"
Contrastive learning: A self-supervised paradigm that brings augmented views of the same data closer while pushing different data apart. "Self-supervised and contrastive learning."
Correlated-Nuisance Distribution: A distributional condition where a nuisance variable correlates with labels but is redundant given the signal. "[Correlated-Nuisance Distribution]"
Decoder (L-Lipschitz decoder): The classifier/regressor mapping from representation to output, constrained to have bounded change under input changes. "with $L$ -Lipschitz decoder $h_\theta$ "
Denoising autoencoder (DAE): An autoencoder trained to reconstruct clean inputs from their noisy versions, often implicitly penalizing the Jacobian. "denoising autoencoders (DAE,~\citep{vincent2008extracting})"
Empirical Risk Minimisation (ERM): The standard supervised learning objective of minimizing average loss on labeled data. "empirical risk minimisation (ERM)"
Embedding Drift: Expected squared change in representations under small isotropic input perturbations; approximated by the Jacobian Frobenius norm. "[Embedding Drift]"
FGSM: Fast Gradient Sign Method, a single-step adversarial attack used to evaluate model robustness. "FGSM robustness"
Frobenius norm (Jacobian Frobenius norm): The sum of squared entries of the Jacobian matrix; measures overall sensitivity magnitude. "Jacobian Frobenius norm."
Gaussian noise: Isotropic normal perturbations used to regularize and uniformly penalize the encoder Jacobian. "Gaussian noise is the unique perturbation family that suppresses the Jacobian uniformly"
Geometric blind spot: A necessary geometric flaw induced by supervised objectives where nuisance-correlated directions retain sensitivity. "We call this the geometric blind spot of supervised learning"
Intrinsic dimensionality: The effective dimensionality of representations, capturing manifold complexity. "intrinsic dimensionality~\citep{ansuini2019intrinsic,pope2021intrinsic}"
Isometry (non-isometry): Property of distance preservation; non-isometry here means representation distances are distorted, especially along nuisance directions. "This structural non-isometry has a concrete geometric consequence"
Isotropic: Uniform in all directions; used to describe perturbations or penalties that do not favor specific directions. "isotropic path-length distortion"
Jacobian: The matrix of partial derivatives of the encoder’s representation with respect to inputs, measuring local sensitivity. "Jacobian regularisation"
Lipschitz (L-Lipschitz): A smoothness condition bounding how much a function can change relative to input changes. "encoders with Lipschitz Jacobian"
Mechanistic interpretability: The study of what specific computations and features a trained model has learned internally. "Mechanistic interpretability asks what a trained model did learn."
Mutual information: A measure of shared information between variables; used to formalize label–nuisance correlation. " $I(n(x);y)>0$ "
Nuisance direction: Input directions correlated with labels in training but irrelevant or unstable at test time. "label-correlated nuisance directions"
Path-length distortion: The increase in representation-space path length under small input changes; a measure of non-isometry. "path-length distortion $D(\phi^*,\sigma)$ "
Proper scoring rule (strictly proper): A loss function that is minimized uniquely by the true conditional distribution, ensuring calibrated predictions. "strictly proper scoring rule."
Representation manifold: The set of encoded representations of inputs; its geometry reflects learned invariances and sensitivities. "the representation manifold $\phi(\mathcal{X})$ "
Rank-1 Jacobian: A Jacobian matrix with a single non-zero singular value, indicating sensitivity concentrated in one direction. "rank-1 Jacobian"
Right singular vector: The input-space direction corresponding to a singular value of the Jacobian, indicating a principal sensitivity direction. "with $w$ as right singular vector."
Self-supervised and contrastive learning: Training without labels by enforcing consistency or discrimination between augmented views. "Self-supervised and contrastive learning."
SimCLR: A contrastive self-supervised method using InfoNCE on augmented image pairs. "SimCLR, BYOL, VICReg, and Barlow Twins"
Sub-block inequality: A linear algebra bound relating norms of matrix blocks to the whole, used in bounding Jacobian contributions. "By the sub-block inequality"
Trajectory Deviation Index (TDI): A diagnostic measuring expected squared path-length distortion under isotropic perturbations across network layers. "Trajectory Deviation Index (TDI)"
VAT (Virtual Adversarial Training): A robustness method encouraging stable predictions under adversarially chosen small perturbations without labels. "VAT"
ViT (Vision Transformer): A transformer-based vision architecture that operates on image patches. "a small ViT~\citep{dosovitskiy2021vit}"
VICReg: A self-supervised method (Variance-Invariance-Covariance Regularization) promoting informative and invariant representations without collapse. "SimCLR, BYOL, VICReg, and Barlow Twins"

View Paper Prompt View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

Summary

Supervised Learning’s Geometric Blind Spot: Structural Limits and a Minimal Repair

Overview and Theoretical Contributions

Unification of Robustness Phenomena

Diagnostics and Experimental Results

Trajectory Deviation Index (TDI)

Comparing Geometric Regularization Schemes

Numerical Highlights

Theoretical and Practical Implications

Structural Reframing of Robustness

Minimal and Universal Repair

Diagnostic Utility at Scale

Limitations and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What this paper is about

The main questions, in simple terms

How they approached it (with everyday analogies)

What they found and why it matters

Why this matters (implications)

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Cross-Cutting Assumptions and Dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets