Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair
Abstract: We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises supervised loss must retain non-zero Jacobian sensitivity in directions that are label-correlated in training data but nuisance at test time. This is not a contingent failure of current methods; it is a mathematical consequence of the supervised objective itself. We call this the geometric blind spot of supervised learning (Theorem 1), and show it holds across proper scoring rules, architectures, and dataset sizes. This single theorem unifies four lines of prior empirical work that were previously treated separately: non-robust predictive features, texture bias, corruption fragility, and the robustness-accuracy tradeoff. In this framing, adversarial vulnerability is one consequence of a broader structural fact about supervised learning geometry. We introduce Trajectory Deviation Index (TDI), a diagnostic that measures the theorem's bounded quantity directly, and show why common alternatives miss the key failure mode. PGD adversarial training reaches Jacobian Frobenius 2.91 yet has the worst clean-input geometry (TDI 1.336), while PMH achieves TDI 0.904. TDI is the only metric that detects this dissociation because it measures isotropic path-length distortion -- the exact quantity Theorem 1 bounds. Across seven vision tasks, BERT/SST-2, and ImageNet ViT-B/16 backbones used by CLIP, DINO, and SAM, the blind spot is measurable and repairable. It is present at foundation-model scale, worsens monotonically across language-model sizes (blind-spot ratio 0.860 to 0.765 to 0.742 from 66M to 340M), and is amplified by task-specific ERM fine-tuning (+54%), while PMH repairs it by 11x with one additional training term whose Gaussian form Proposition 5 proves is the unique perturbation law that uniformly penalises the encoder Jacobian.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about
This paper says something surprising about almost every AI model trained in the usual way (supervised learning, where models learn from lots of labeled examples). It proves that this training style forces a built‑in “blind spot” in how models see the world. In short: if the training data contains any pattern that helps predict the label—even if that pattern is actually a distraction or “nuisance” at test time—the model’s internal representation must stay sensitive to that pattern. This isn’t a bug of a specific model or dataset; it’s a math fact about the training objective itself.
The authors also propose a simple fix called PMH and a new measurement tool called TDI to diagnose and repair this problem.
The main questions, in simple terms
- Do supervised models have a built‑in weakness that makes them pay attention to the wrong things if those things helped during training?
- Can we measure this weakness directly and reliably?
- Can we reduce the weakness without breaking accuracy?
How they approached it (with everyday analogies)
Think of a model as a map-maker:
- The model turns inputs (like images or sentences) into an internal “map” (its representation).
- When you nudge the input a tiny bit (like slightly changing pixels), a good map shouldn’t twist or stretch wildly, especially in directions that don’t matter.
Two core ideas:
- The theorem (the “blind spot”)
- The authors prove that supervised training (empirical risk minimization, ERM) cannot make the internal map perfectly “smooth” in certain directions. If a pattern in the data is correlated with the label during training—like background texture in photos or sentence length in reviews—the model must keep some sensitivity to it. Even if that pattern is a nuisance later, the model can’t fully ignore it without losing training accuracy.
- In geometry terms, the representation can’t be perfectly isometric (equally smooth) in those directions; there will be unavoidable “bumps.”
- Measuring the bumps: TDI
- TDI stands for Trajectory Deviation Index. Imagine walking along the model’s representation while you gently shake the input in all directions equally (tiny random wiggles). TDI measures how much the internal “path” gets bent and stretched by those wiggles. Lower TDI means a smoother, more stable internal map; higher TDI means the map is bumpy and twisty.
- Why this matters: TDI measures exactly what the theorem says must be nonzero—how much the map gets distorted under small, equal-in-every-direction changes.
- Fixing the geometry: PMH
- PMH is a small extra training term that asks the model’s internal representation to stay similar when the input is slightly noised with Gaussian noise (the “shake it equally in all directions” kind of noise).
- Important twist: The authors prove that Gaussian noise is uniquely suited for this job. It is the only perturbation that pressures the model to reduce sensitivity equally across all directions (not just some).
- Contrast with adversarial training (PGD): Adversarial training focuses on the single worst direction, like plugging one hole in a balloon—pressure just bulges somewhere else. That can make the map smoother in one direction but rougher overall.
Terms in plain language:
- Jacobian: Think of it as “how sensitive is the internal representation to tiny input changes?” (like the local slope in every direction).
- Isotropic: The shake is fair in every direction (no preferred direction).
- Proper scoring rule: A standard way to measure prediction errors (e.g., cross-entropy). The theorem works across many such losses.
What they found and why it matters
Here are the key results, explained simply:
- The blind spot is inevitable under supervised learning.
- If a training pattern (like texture) helps predict the label, the model must remain sensitive to it. No amount of data size or model size guarantees removal. This is a structural limitation of the training goal, not a training failure.
- This one theorem explains four well-known issues:
- Non-robust predictive features: Models latch onto fragile patterns (like tiny pixel changes) that happen to predict labels.
- Texture bias: Vision models rely too much on texture over shape when textures correlate with labels.
- Corruption fragility: Small corruptions (like noise or blur) push inputs in directions the model is sensitive to, causing errors.
- Robustness–accuracy tradeoff: If you try to remove sensitivity to those nuisance patterns, you may lose some in-distribution accuracy, because the model was using them to get answers right on training-like data.
- Bigger models don’t fix it; they can make it worse.
- More capacity lets models encode all predictive patterns more precisely, including nuisances. The authors see this blind spot get stronger as LLMs get larger, and it also grows when models are fine-tuned on a specific task.
- TDI reveals problems other metrics miss.
- Adversarial training (PGD) can make the overall sensitivity smaller (the Jacobian size drops), but the sensitivity gets funneled into fewer directions. TDI catches that the geometry on clean inputs actually gets worse:
- Example numbers (lower is better): ERM TDI ≈ 1.09, PGD TDI ≈ 1.34 (worse), PMH TDI ≈ 0.90 (better).
- Bottom line: Measuring just “how much” sensitive (size) is not enough; you must measure “in how many directions” and “how it distorts paths,” which TDI does.
- PMH repairs the geometry with a minimal change.
- Add one extra term during training that asks internal representations to agree between an input and its slightly noised version (Gaussian noise).
- No need to change the model’s architecture.
- It lowers TDI (makes the internal map smoother) across many tasks and models, often with tiny or no accuracy cost.
- The authors also give a short proof that Gaussian noise is the unique choice that penalizes sensitivity equally in all directions.
- It works broadly.
- They test across several vision tasks, BERT-based sentiment analysis, and large vision backbones (like those used in CLIP, DINO, SAM).
- The blind spot shows up across the board and can be reduced with PMH.
Why this matters (implications)
- For practitioners: Supervised learning has a built-in geometric blind spot. If your training data has spurious but label-linked patterns, your model will likely rely on them—and be fragile in those directions. Simply scaling up models or data won’t reliably remove this.
- For evaluation: Use TDI or similar geometry-aware metrics, not just accuracy or gradient size, to see if a model’s internal map is stable in all directions.
- For training: Consider adding PMH (or a similar isotropic geometry-regularizing term) to make representations smoother and more robust, with minimal changes and cost.
- For research: This reframes adversarial robustness and related issues as consequences of a single structural fact about supervised objectives. It suggests focusing on representation geometry, not only on outputs.
- For safety and reliability: Models whose internal maps are smoother in all directions tend to be more reliable under everyday noise and unexpected shifts.
Note: The paper is a preprint (not yet peer-reviewed), so the community will still need to validate, test limits, and refine these ideas. However, the core message is clear and practical: supervised training installs a geometric blind spot, TDI can measure it, and PMH offers a minimal, principled way to repair it.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise, actionable list of what remains uncertain or unexplored in the paper and where future research can extend, test, or refine the claims.
- Realism of the correlated-nuisance assumption: The definition requires I(n;y|s)=0 (nuisance predictive but redundant given signal), which is strong. How do the theorem and bounds change when n and s interact (e.g., synergistic or interaction terms), or when I(n;y|s)>0?
- Identifiability of s(x) and n(x): The theory presumes these factors exist; the experiments often require domain knowledge to “name” n (e.g., QM9). How can one discover or estimate nuisance subspaces automatically in high-dimensional data without supervision?
- Estimation of distributional constants: The bounds depend on ρ, C(P), or Δ(P,ℒ) and on the decoder Lipschitz constant L. How can these be estimated robustly from finite samples? Are there practical estimators with error bars and guidance on sample complexity?
- Tightness of the lower bound: How tight is D(φ*,σ) ≥ σ2ρ2C(P)/L2 in realistic settings? Can one derive matching upper bounds or show regimes where the bound is vacuous or conservative?
- Dependence on decoder Lipschitz constant L: The theory uses L explicitly, but deep decoders’ L is unknown and typically large. Can architectures or training procedures that directly control L (e.g., spectral normalization, Lipschitz networks) reduce the blind-spot bound in practice?
- Beyond the linearized regime: Results rely on σ→0 linearization with O(σ4) remainder and often assume a Lipschitz Jacobian. How do the guarantees behave at finite σ used in training (e.g., 0.1–0.2), especially for ReLU networks with piecewise-constant, non-Lipschitz Jacobians?
- Proposition 5 “Gaussian uniqueness”: The proof appeals only to the covariance Σδ, implying any zero-mean spherical distribution with Σδ=σ2I yields the same first-order penalty. Is Gaussian truly unique beyond the second-moment criterion? Empirically compare Gaussian vs. Rademacher, uniform-on-sphere, or Student-t noise under identical covariance.
- Finite-sample and optimization effects: Theorem 1 concerns population ERM minimizers, but practical training finds approximate minima on finite data. How do optimization error and sampling noise alter the bound and the observed TDI?
- Scope of adversarial baselines: The paper mainly evaluates VAT/PGD. How do other robust objectives (TRADES, MART, ALP, MaxUp, GAT, consistency-regularized adversarial training) affect TDI, anisotropy, and the theorem’s predicted “balloon-squeezing” effect?
- Standard augmentation and OOD methods: The claim that standard augmentation cannot close the blind spot is asserted rather than proven generally. Provide either a formal impossibility result or broad empirical tests including strong augmentation pipelines (e.g., AugMix, DeepAugment) and OOD methods (IRM, GroupDRO, Fishr, REx).
- Breadth of domains: Experiments span several tasks, but open questions remain for detection/segmentation, speech/audio, time series, RL, generative modeling, and large-scale NLP beyond SST-2. Does the blind spot and PMH behavior generalize to these settings?
- Foundation-scale verification: Claims of presence at “foundation-model scale” need systematic, detailed evaluations (e.g., CLIP, DINO, SAM backbones) with TDI, Jacobian statistics, and robustness metrics across diverse datasets and perturbations.
- TDI validity and invariances: Analyze TDI’s sensitivity to representation scaling, normalization layers, pooling, residual paths, and layer dimensionality. Is TDI comparable across architectures and depths? Would per-layer normalization choices or rescaling game the metric?
- Directional diagnostics: TDI aggregates over directions and layers. Develop diagnostic tools that localize which nuisance-aligned directions dominate (e.g., Jacobian singular spectrum, directional TDI, anisotropy maps) to guide targeted repairs and interpretability.
- Statistical robustness: Many results are reported with limited seeds. Provide confidence intervals, significance tests, and effect sizes, plus cross-seed variability for TDI, Jacobian Frobenius, and robustness metrics.
- Predictivity of TDI for robustness: Quantify the correlation between TDI and downstream robustness across datasets and perturbation families with rigorous statistical analysis (correlation coefficients, confidence bounds), probing failure cases.
- Tradeoff characterization beyond O(ρ2): Corollary 3 gives an O(ρ2) cost for nuisance suppression. What are the constants, and how does the tradeoff behave when ρ is large, multi-nuisance factors exist, or nuisances overlap with salient fine-grained cues?
- Hyperparameter selection and automation: PMH needs σtrain, cap, λ, and schedule w(t). Provide automatic selection procedures (e.g., TDI-guided tuning or bilevel optimization) and study robustness to mis-specification in non-Gaussian deployment conditions.
- Discovery of nuisance structure: In tasks where nuisance factors are unknown (QM9 case study), develop methods to infer nuisance-aligned subspaces (e.g., causal discovery, gradient-based attribution clustering, unsupervised latent disentanglement) and to adapt PMH accordingly.
- Interaction with self-supervised pretraining: Contrastive/self-distillation pretraining often stabilizes representations. How does pretraining alter the blind-spot bound, TDI, and PMH gains during fine-tuning? Are there synergies or redundancies?
- Architectural levers: Investigate if architectural constraints (e.g., spatial pooling strategies, equivariant layers, low-pass filters, spectral norms) reduce Jacobian anisotropy and TDI without hurting accuracy, and whether they complement PMH.
- Calibration and uncertainty: Study how PMH and adversarial training affect calibration, entropy, and selective prediction under noise/shift, especially given changes in Jacobian geometry.
- Robustness beyond isotropic noise: Many real shifts are structured (texture, occlusion, color casts). How does PMH trained with spherical perturbations transfer to structured, non-isotropic shifts, and can multi-perturbation or learned perturbation families better target nuisance subspaces?
- Measuring and controlling anisotropy: Proposition 6 defines an anisotropy index but is not empirically reported widely. Provide practical estimators for Jacobian singular value spectra and test whether minimizing anisotropy aligns with improved TDI and robustness.
- Avoiding over-smoothing: PMH suppresses Jacobian uniformly; in tasks that rely on fine-grained details, how do we prevent useful high-frequency features from being erased? Develop constraints or adaptive penalties that preserve signal-sensitive directions.
- Causality perspective: The nuisance/signal split is implicitly causal. Can causal invariance objectives (IRM, Invariant Causal Prediction) be combined with PMH to target stable causal features while controlling geometric distortion?
- Practical compute and deployment costs: Quantify PMH’s training/inference overhead, TDI evaluation cost, and memory footprint across scales, and compare to adversarial/consistency-training baselines.
- Extension to non-proper scoring rules and structured losses: The general corollary covers strictly proper scoring rules. What happens with hinge/margin losses, detection/segmentation losses, or multi-task objectives with mixed losses?
- Handling discrete or non-Euclidean inputs: For graphs, text tokens, or categorical features, Gaussian perturbations may be unnatural. What are principled perturbation families on manifolds or discrete spaces that preserve the “uniform Jacobian penalty” property?
- Scale laws and extrapolation: The paper notes blind-spot worsening with model scale in limited ranges. Establish scale laws for TDI/aniso vs. parameter count and data size, and clarify when PMH reverses the trend.
- Formal limits of augmentation repair: Provide a theoretical statement clarifying when (and why) any finite augmentation set cannot remove nuisance-correlated Jacobian sensitivity implied by Theorem 1.
- Safety and fairness: Since the blind spot is tied to label–nuisance correlation, study demographic subgroups and fairness metrics. Does PMH reduce spurious-correlation harms across groups without masking minority signals?
Practical Applications
Immediate Applications
The following applications can be deployed now using the paper’s findings, metrics, and PMH training recipe. When applicable, we note sectors, concrete tools/workflows that could emerge, and assumptions/dependencies affecting feasibility.
- MLOps and Model Governance (software, cross-industry)
- Use-case: Add Trajectory Deviation Index (TDI) as a gating and monitoring metric alongside accuracy, loss, and Jacobian Frobenius for model selection and CI/CD.
- Workflow/tool: “Geometry audit” step in training pipelines that computes TDI@0 and TDI-vs-σ curves; fail builds if TDI worsens vs baseline; publish “Geometry section” in model cards with TDI curves and anisotropy index.
- Assumptions/dependencies: Requires implementing TDI probes with small Gaussian evaluation noise (σ→0); adds compute overhead for inference-only measurements; assumes differentiable encoders and supervised ERM training.
- Robust Supervised Fine-Tuning via PMH (software, vision, NLP, graphs)
- Use-case: Improve out-of-distribution and corruption robustness with minimal accuracy cost by adding a single PMH term to supervised fine-tuning.
- Workflow/tool: Plug-in training module that:
- Adds representation matching loss L_PMH = ||φ(x) − φ(x+δ)||² with δ ∼ N(0, σ²I),
- Uses warm-up scheduling w(t) and a cap so L_PMH ≤ cap × L_task (cap/(1+cap) rule gives exact training share),
- Tunes σ_train to the largest value that does not reduce clean accuracy (or to expected deployment noise—“T-alignment”).
- Sectors:
- Vision (e.g., classifiers, re-id, pose, medical imaging),
- NLP (e.g., BERT fine-tuning to reduce sensitivity to spurious artifacts like sentence length or punctuation),
- Graphs (e.g., node/graph classification robustness).
- Assumptions/dependencies: Gaussian noise should be applied in an input space aligned with nuisance (e.g., in QM9-style tasks, apply noise to node features rather than 3D coordinates if geometry is the signal). Requires differentiable encoders and proper scoring rules (standard cross-entropy/MSE fit).
- Replace or Augment Adversarial Training in Production Classifiers (software, security, vision/NLP)
- Use-case: Improve clean-input geometry and general corruption robustness where PGD-based adversarial training hurts isometry (as shown by higher TDI despite lower Jacobian Frobenius).
- Workflow/tool: Compare PMH vs. PGD using TDI and anisotropy index; favor PMH where clean-input geometry matters and adversarial point-defense reduces isometry.
- Assumptions/dependencies: For threat models requiring worst-case guarantees, PMH complements rather than replaces certified methods; validate against relevant attacks and corruptions.
- Foundation-Model Fine-Tuning Hygiene (software, enterprise AI)
- Use-case: Prevent blind-spot amplification when fine-tuning large backbones (e.g., ViT-B/16, BERT variants).
- Workflow/tool: Add PMH during task-specific fine-tuning and track “blind-spot ratio” and TDI across model scales to detect geometry drift; prefer PMH-tuned checkpoints for downstream tasks.
- Assumptions/dependencies: Larger models can encode nuisances more precisely; expecting scale to “fix” robustness is unsafe—must measure and mitigate via PMH/TDI.
- Data Audits for Spurious Correlations (industry, policy-facing, data-centric AI)
- Use-case: Quantify nuisance–label correlation (ρ) to forecast robustness–accuracy trade-offs before training interventions.
- Workflow/tool:
- Estimate ρ by probing correlation between candidate nuisance factors and labels,
- Use corollary-based scaling (trade-off ∝ ρ²) to plan interventions (PMH, data rebalancing, targeted augmentation),
- Prioritize data collection that reduces ρ where feasible.
- Assumptions/dependencies: Requires domain knowledge to enumerate plausible nuisances; estimates are approximate and task-specific.
- Sector-specific deployments
- Healthcare imaging: Reduce scanner/site artifacts in diagnostic encoders via PMH; monitor TDI to ensure representational smoothness and better corruption robustness without major accuracy loss.
- Dependencies: Regulatory validation; confirm that applied noise targets nuisance dimensions (e.g., pixel-space augmentation may be acceptable; confirm with clinical stakeholders).
- Autonomous driving and robotics: Improve stability to lighting/texture/weather noise by PMH fine-tuning on camera/sensor inputs; use TDI to regression-test geometry across firmware updates.
- Dependencies: Map Gaussian noise to nuisance-relevant input spaces (e.g., pixel intensity, sensor features); simulation-in-the-loop to validate.
- Finance/risk modeling: Reduce reliance on spurious historical proxies (e.g., zip code) by auditing ρ and tracking TDI during supervised training; PMH to suppress sensitivity to broad nuisance directions.
- Dependencies: Must align perturbations to non-causal features; fairness constraints and legal compliance required.
- Manufacturing/quality control: Stabilize sensor-based defect detectors against benign variations (temperature, vibration) using PMH; gate model rollout on TDI improvements.
- Dependencies: Identify nuisance factors; ensure noise injection mirrors realistic perturbations.
- Education/EdTech/NLP: Fine-tune text classifiers with PMH to reduce sensitivity to formatting/length artifacts; track TDI to choose robust checkpoints for deployment at scale.
- Dependencies: Define perturbation space (token embeddings or subword-level).
- Academic Practice and Benchmarking (academia, open-source)
- Use-case: Add TDI curves and anisotropy metrics to benchmark leaderboards; run “geometry audits” in model ablations; publish TDI alongside accuracy and robustness.
- Workflow/tool: Lightweight library to compute TDI@0, TDI-vs-σ, anisotropy index; templates for reporting in papers and model cards.
- Assumptions/dependencies: Community adoption; modest compute for TDI probes; standard differentiable models.
- Developer Tools
- TDI + Anisotropy Toolkit: A Python package (e.g., PyTorch/TensorFlow) to compute TDI curves, Jacobian Frobenius, and anisotropy indices; simple API hooks for training/eval.
- PMH Trainer Plug-in: Drop-in module implementing Gaussian perturbation, schedule w(t), and cap/(1+cap) control; presets for vision, NLP, and graph tasks; “Auto-σ” tuner that selects σ_train to the largest value that preserves clean accuracy.
- CI/CD Integration: GitHub Actions templates to run geometry audits and fail builds when TDI regresses.
Long-Term Applications
The following rely on further research, scaling, validation, or standardization before broad deployment.
- Standards and Regulation for Robustness Reporting (policy, safety-critical sectors)
- Use-case: Require reporting of TDI curves and geometry metrics in regulated ML (healthcare, automotive, finance) alongside traditional accuracy and calibration metrics.
- Potential outcome: “Geometric robustness” sections in audit reports; procurement checklists that specify TDI thresholds for acceptance.
- Dependencies: Consensus on metric definitions and thresholds; sector-specific validation that TDI correlates with field robustness outcomes.
- New Training Objectives and Theory Beyond ERM (academia, foundational research)
- Use-case: Design and analyze objectives that explicitly counter the geometric bound (e.g., causality-aware objectives, structured invariances, or isotropy-promoting penalties that go beyond Gaussian PMH).
- Potential outcomes: Alternatives or complements to ERM for supervision; theory that extends to non-differentiable pipelines, sequence-to-sequence, or generative settings.
- Dependencies: Theoretical advances and large-scale empirical validation; careful trade-off studies vs. accuracy.
- Automated Geometry-Aware AutoML/NAS (software, platform vendors)
- Use-case: Automated selection of σ_train, cap, and schedules; architecture and augmentation search guided by TDI and anisotropy scores.
- Potential outcomes: “Geometry-aware” AutoML that balances accuracy and isometry; controller policies that forecast robustness–accuracy trade-offs given data ρ estimates.
- Dependencies: Efficient TDI proxies (random projections, low-variance estimators) for fast search; robust cross-task generalization.
- Hardware/Systems Support for Geometry Probing (semiconductors, cloud providers)
- Use-case: Accelerate Jacobian/TDI estimation in training/evaluation via specialized kernels or on-device probes (e.g., random directional derivatives).
- Potential outcomes: Real-time geometry monitors in edge devices; low-overhead TDI gating in large-scale training.
- Dependencies: Kernel engineering, support in ML frameworks, demonstrated performance/benefit at scale.
- Data-Centric Strategies to Reduce ρ (industry, data collection/design)
- Use-case: Measurement and active reduction of nuisance–label correlation in training data via sampling, relabeling, or targeted data acquisition.
- Potential outcomes: Curated datasets with lower ρ that inherently decrease the blind-spot lower bound; improved robustness without heavy regularization.
- Dependencies: Reliable estimation of nuisances; acquisition budgets; risk of unintended shifts in label distribution.
- Fairness and Compliance Auditing Using Geometry (policy, compliance teams)
- Use-case: Use the blind-spot framework to audit reliance on proxy features correlated with protected attributes; complement fairness tests with geometry metrics and nuisance-aligned perturbations.
- Potential outcomes: Early-warning diagnostics for proxy dependence; mitigation workflows that pair data balancing with PMH-like isotropic penalties.
- Dependencies: TDI is not a fairness metric; requires identifying and perturbing proxies aligned with protected attributes; legal and ethical oversight.
- Certified and Multimodal Robustness (safety, foundation models)
- Use-case: Combine isotropic penalties (PMH) with certified robustness methods; extend TDI to multimodal inputs (vision–language–audio), each with modality-specific nuisance spaces.
- Potential outcomes: “Geometry certificates” for large multimodal models; robustness portfolios that generalize across unforeseen corruptions.
- Dependencies: New multimodal TDI definitions; certification techniques that incorporate isotropy; extensive validation.
- Continuous Geometry Monitoring in Production (DevOps, AIOps)
- Use-case: Long-horizon tracking of TDI and anisotropy to detect drift, especially after fine-tuning or model updates; canary deployments gated by geometry thresholds.
- Potential outcomes: Reduced post-deployment fragility; quicker detection of regressions that accuracy metrics miss.
- Dependencies: Instrumentation, logging budgets, and alerting policies; stakeholder training.
- Education and Practitioner Training (academia, industry skilling)
- Use-case: Incorporate “geometric blind spot” concepts into ML curricula and internal trainings; teach practitioners to measure TDI, estimate ρ, and use PMH.
- Potential outcomes: Widespread adoption of geometry-aware training and evaluation; improved robustness culture.
- Dependencies: Community materials, open-source exemplars, and reproducible labs at scale.
Cross-Cutting Assumptions and Dependencies
- The theorem applies to supervised ERM with strictly proper losses and differentiable encoders; robustness bounds and TDI are most meaningful under small isotropic perturbations.
- PMH assumes Gaussian perturbations in an input space that captures nuisance; domain knowledge may be required to choose the correct space (e.g., node features vs. coordinates).
- TDI adds computation for evaluation; lightweight approximations (random projections, subsampling layers) may be necessary in large-scale settings.
- In safety-critical contexts, combine PMH/TDI with domain-specific validation and, where needed, certified defenses—PMH is not a silver bullet for worst-case guarantees.
- Larger models and task-specific fine-tuning can amplify the blind spot; plan for geometry monitoring and mitigation at these stages.
Glossary
- Adversarial training (PGD): A robustness method that trains models on worst-case input perturbations within a norm ball, often using Projected Gradient Descent to generate adversarial examples. "Adversarial training (PGD)."
- Anisotropy (Jacobian anisotropy): Directional imbalance in sensitivity of a model’s representation to input changes, often concentrating sensitivity in few directions. "increases the anisotropy index "
- Barlow Twins: A self-supervised contrastive method that encourages invariance by reducing redundancy between representations of augmented views. "SimCLR, BYOL, VICReg, and Barlow Twins"
- Bayes: Refers to the Bayes-optimal predictor or minimal risk achievable with full knowledge of the data distribution. "an -independent predictor pays excess loss above Bayes."
- Bregman divergence: A measure of discrepancy between probability distributions induced by a convex function; used here to quantify deviation from an -blind conditional. "(Bregman divergence of true conditional from -blind conditional; see Lemma~\ref{lem:bregman})."
- BYOL: A self-supervised learning method (Bootstrap Your Own Latent) that learns representations by predicting target network outputs without negative pairs. "SimCLR, BYOL, VICReg, and Barlow Twins"
- CAE (Contractive autoencoder): An autoencoder variant that penalizes the Jacobian of the encoder to encourage locally invariant representations. "Contractive autoencoders (CAE,~\citep{rifai2011contractive})"
- Centered Kernel Alignment (CKA): A similarity measure between representation spaces based on centered kernel alignment. "Centered Kernel Alignment (CKA,~\citep{kornblith2019similarity})"
- Contrastive learning: A self-supervised paradigm that brings augmented views of the same data closer while pushing different data apart. "Self-supervised and contrastive learning."
- Correlated-Nuisance Distribution: A distributional condition where a nuisance variable correlates with labels but is redundant given the signal. "[Correlated-Nuisance Distribution]"
- Decoder (L-Lipschitz decoder): The classifier/regressor mapping from representation to output, constrained to have bounded change under input changes. "with -Lipschitz decoder "
- Denoising autoencoder (DAE): An autoencoder trained to reconstruct clean inputs from their noisy versions, often implicitly penalizing the Jacobian. "denoising autoencoders (DAE,~\citep{vincent2008extracting})"
- Empirical Risk Minimisation (ERM): The standard supervised learning objective of minimizing average loss on labeled data. "empirical risk minimisation (ERM)"
- Embedding Drift: Expected squared change in representations under small isotropic input perturbations; approximated by the Jacobian Frobenius norm. "[Embedding Drift]"
- FGSM: Fast Gradient Sign Method, a single-step adversarial attack used to evaluate model robustness. "FGSM robustness"
- Frobenius norm (Jacobian Frobenius norm): The sum of squared entries of the Jacobian matrix; measures overall sensitivity magnitude. "Jacobian Frobenius norm."
- Gaussian noise: Isotropic normal perturbations used to regularize and uniformly penalize the encoder Jacobian. "Gaussian noise is the unique perturbation family that suppresses the Jacobian uniformly"
- Geometric blind spot: A necessary geometric flaw induced by supervised objectives where nuisance-correlated directions retain sensitivity. "We call this the geometric blind spot of supervised learning"
- Intrinsic dimensionality: The effective dimensionality of representations, capturing manifold complexity. "intrinsic dimensionality~\citep{ansuini2019intrinsic,pope2021intrinsic}"
- Isometry (non-isometry): Property of distance preservation; non-isometry here means representation distances are distorted, especially along nuisance directions. "This structural non-isometry has a concrete geometric consequence"
- Isotropic: Uniform in all directions; used to describe perturbations or penalties that do not favor specific directions. "isotropic path-length distortion"
- Jacobian: The matrix of partial derivatives of the encoder’s representation with respect to inputs, measuring local sensitivity. "Jacobian regularisation"
- Lipschitz (L-Lipschitz): A smoothness condition bounding how much a function can change relative to input changes. "encoders with Lipschitz Jacobian"
- Mechanistic interpretability: The study of what specific computations and features a trained model has learned internally. "Mechanistic interpretability asks what a trained model did learn."
- Mutual information: A measure of shared information between variables; used to formalize label–nuisance correlation. ""
- Nuisance direction: Input directions correlated with labels in training but irrelevant or unstable at test time. "label-correlated nuisance directions"
- Path-length distortion: The increase in representation-space path length under small input changes; a measure of non-isometry. "path-length distortion "
- Proper scoring rule (strictly proper): A loss function that is minimized uniquely by the true conditional distribution, ensuring calibrated predictions. "strictly proper scoring rule."
- Representation manifold: The set of encoded representations of inputs; its geometry reflects learned invariances and sensitivities. "the representation manifold "
- Rank-1 Jacobian: A Jacobian matrix with a single non-zero singular value, indicating sensitivity concentrated in one direction. "rank-1 Jacobian"
- Right singular vector: The input-space direction corresponding to a singular value of the Jacobian, indicating a principal sensitivity direction. "with as right singular vector."
- Self-supervised and contrastive learning: Training without labels by enforcing consistency or discrimination between augmented views. "Self-supervised and contrastive learning."
- SimCLR: A contrastive self-supervised method using InfoNCE on augmented image pairs. "SimCLR, BYOL, VICReg, and Barlow Twins"
- Sub-block inequality: A linear algebra bound relating norms of matrix blocks to the whole, used in bounding Jacobian contributions. "By the sub-block inequality"
- Trajectory Deviation Index (TDI): A diagnostic measuring expected squared path-length distortion under isotropic perturbations across network layers. "Trajectory Deviation Index (TDI)"
- VAT (Virtual Adversarial Training): A robustness method encouraging stable predictions under adversarially chosen small perturbations without labels. "VAT"
- ViT (Vision Transformer): A transformer-based vision architecture that operates on image patches. "a small ViT~\citep{dosovitskiy2021vit}"
- VICReg: A self-supervised method (Variance-Invariance-Covariance Regularization) promoting informative and invariant representations without collapse. "SimCLR, BYOL, VICReg, and Barlow Twins"
Collections
Sign up for free to add this paper to one or more collections.