Neural Network Feature Learning
- Feature learning in neural networks is the adaptive process where models develop internal, task-specific representations through gradient-based optimization.
- It leverages hierarchical abstraction to build robust, invariant features that effectively capture complex data structures, outperforming fixed-kernel methods.
- Recent theoretical and empirical analyses reveal dynamic phase transitions and layerwise specialization that are critical for understanding and improving model generalization.
Feature learning in neural networks refers to the process by which the internal representations—features at hidden layers—become progressively more informative about the task during training. Unlike kernel methods, where features are fixed and linear readouts suffice, neural nets adapt their features in a data-dependent manner, enabling the representation of highly discriminative, robust, and invariant descriptions of the input. This adaptation allows neural networks to surpass the limitations of purely kernel-based learning, particularly in tasks demanding hierarchical abstraction, nonlinearity, and invariance to complex transformations.
1. Mechanisms of Feature Learning Across Architectures
Feature learning mechanisms differ by architecture but commonly revolve around gradient-based adaptation of weights to maximize task-specific objectives:
- Deep Fully Connected Networks: The Deep Neural Feature Ansatz (Radhakrishnan et al., 2022) states that each layer's weight Gram matrix aligns with the average gradient outer product (AGOP) evaluated at that layer:
This alignment prioritizes directions in input space along which the output is maximally sensitive.
- Convolutional Neural Networks: The Convolutional Neural Feature Ansatz (CNFA) generalizes this principle: convolutional filter covariances align with the patch-based AGOP (Beaglehole et al., 2023). This mechanism explains empirically observed emergence of edge detectors and local feature hierarchies in trained CNNs.
- Mixture-of-Experts/Path-based Models: In Deep Linearly Gated Networks (DLGN), feature learning corresponds to adjusting the location and orientation of half-space gates in the input space; each feature is identified with an intersection of half-spaces, forming convex polyhedral regions tailored to regions where the target is smooth (Yadav et al., 2024).
- Teacher–Student and Bayes-optimal Perspectives: In Bayesian or mean-field frameworks, feature learning is quantified as symmetry breaking in the posterior measure—a transition from random (uninformative) features to ones aligned with target-induced directions (Göring et al., 16 Oct 2025, Corti et al., 28 Aug 2025). This can be mechanistically attributed to dynamics captured by forward–backward equations aligning networks' effective kernels to the target function (Fischer et al., 2024).
- Layerwise Phenomenology: The “spring-block” theory provides a macroscopic, mechanical analogy: feature separation builds up progressively layer by layer, driven by the interplay of nonlinearity (activation-induced friction), optimization noise, and architectural depth (Shi et al., 2024).
2. Feature Learning versus Kernel Methods
Feature learning in neural networks is fundamentally distinguished from fixed-feature (kernel) approaches by the adaptivity of internal representations:
- Fixed Kernels (NTK, NNGP): These regimes correspond to "lazy training," where the tangent feature space at initialization is preserved throughout training. The operator kernel (e.g. the Neural Tangent Kernel) is fixed, and functional improvement is limited to projections already present at initialization. No feature selection or alignment to the data distribution occurs (Shi et al., 2022, Rubin et al., 5 Feb 2025).
- Adaptive Features: Under mean-field or proportional scaling, finite-width corrections enable features to adapt directionally, not just via scalar kernel rescaling. This directional adaptation can create functional capacity not present at initialization, as evidenced by the emergence of O(1) off-diagonal weight correlations and manifold separation in feature space (Corti et al., 28 Aug 2025, Fischer et al., 2024).
- Empirical Separations: Tasks such as multi-index regression or parity learning are provably intractable for kernel methods but efficiently solvable by gradient-trained neural networks that learn data-dependent features (Shi et al., 2022, Shi et al., 2023).
3. Mathematical Formulations and Invariance
A central hallmark of learned features is their increased invariance to input perturbations not relevant for the task. In deep networks:
- Input Perturbation Attenuation: Consider layerwise propagation of small input disturbance :
With increasing depth, the average spectral norm can drop below 1, so higher-layer features progressively suppress small, task-irrelevant input variations (Yu et al., 2013).
- Internal Representations: Empirical studies show marked decreases in Euclidean and KL distances between representations of inputs differing in nuisance factors (speaker, bandwidth, noise), quantifying built-in invariance even without explicit normalization or adaptation (Yu et al., 2013).
- Kernel Alignment: Adaptive kernel and tangent-feature analyses formalize feature learning as shifts in the eigenstructure of the tangent kernel—high-rank adjustments, particularly aligned to challenging label kernels, accelerate learning and capture residual variance not addressed by final-layer adaptation (LeJeune et al., 2023).
4. Dynamics and Phase Transitions in Feature Learning
Feature learning often exhibits dynamically staged behavior and phase transitions:
- Alternating Gradient Flows (AGF): Feature modes are acquired in a staircase fashion, with sharp transitions ("jumps") at times when previously dormant neurons/heads align to new, highly-ranked residual components. Precise predictions for the order, timing, and magnitude of feature acquisition steps can be made, with jump times set by the ordering of feature "utility" (Kunin et al., 6 Jun 2025).
- Critical and Proportional Regimes: Statistical mechanics and mean-field analyses reveal phase boundaries (e.g., in sample-to-dimension ratio) across which neural nets transition from unguided to target-aligned features (Montanari et al., 1 Feb 2026, Göring et al., 16 Oct 2025). The critical point for feature learning is marked by the emergence of negative curvature (outlier eigenvalues) in the Hessian, signifying entry into previously "hard" directions in feature space.
- Self-reinforcing Input Feature Selection (IFS): In the post-transition regime, neurons and input coordinates most strongly aligned with the target accumulate larger updates—captured in models via Automatic Relevance Determination (ARD)—and exhibit specialization, leading to sharp improvements in generalization (Göring et al., 16 Oct 2025).
5. Practical Implications and Limitations
Feature learning is not universally beneficial—its impact is data- and architecture-dependent:
- Hierarchical Features: In speech and vision tasks, deep networks outperform shallow and GMM models due to their ability to construct hierarchical invariant features, obviating the need for explicit model adaptation or normalization (Yu et al., 2013).
- Overfitting via Sparse Representations: However, in settings where the target function is smooth along many directions, feature learning can induce overly sparse, high-curvature representations, increasing susceptibility to overfitting and reducing generalization compared to lazy/kernel regimes (Petrini et al., 2022).
- Quality versus Strength: Not all feature learning is equal. Empirical observables that quantify the strength of representation drift (e.g., NTK alignment metrics) decouple from actual improvements in generalization (quality), especially in high-capacity settings (Göring et al., 25 Jul 2025). Only the feature learning gap (generalization difference relative to the best kernel method) reliably quantifies practical benefit.
- Data Structure Dependence: Feature learning efficacy relies crucially on the structure in the input data. When special combinatorial or hierarchical structure exists, neural nets exploit it efficiently, whereas fixed-feature methods become statistically intractable (Shi et al., 2022). If structure is absent, feature adaptation may offer no advantage or even degrade performance.
6. Unified and Provable Frameworks
Recent frameworks provide rigorous, unified accounts of feature learning phenomena:
- Gradient Feature Learning Framework: Generalization guarantees for SGD-trained two-layer ReLU nets are derived based on the emergence, alignment, and convex optimization over gradient-induced features. The same framework underpins explanations of the lottery ticket hypothesis and establishes strict separation over all kernel-based learners in explicit families (Shi et al., 2023).
- Recursive Feature Machines (RFM) and ConvRFM: The average gradient outer product mechanistically explains empirical feature selection in both fully connected and convolutional nets. The RFM approach applies this principle to kernel methods, enabling deep, data-adaptive feature learning in non-deep-net settings, with competitive or superior performance on tabular and vision tasks (Radhakrishnan et al., 2022, Beaglehole et al., 2023).
- Phenomenological Theories: The spring-block model (Shi et al., 2024) and forward–backward kernel alignment equations (Fischer et al., 2024, Rubin et al., 5 Feb 2025, Corti et al., 28 Aug 2025) furnish a macro- to micro-level continuum of explanations, from global geometric data separation to detailed equilibrium and out-of-equilibrium feature statistics.
In summary, feature learning in neural networks is a mathematically and empirically rich set of phenomena encompassing adaptive kernel geometry, invariance, hierarchical abstraction, staged dynamic transitions, and global-to-local specialization. Theoretical frameworks and empirical investigations demonstrate both its power—especially in structured-data, high-capacity, and deep regimes—and its subtle limitations, especially where sparsification and ill-posed generalization emerge. Recent advances provide multi-scale, unified descriptions and practical algorithms, linking statistical physics, optimization, and machine learning in a comprehensive theory of neural feature learning (Yu et al., 2013, Radhakrishnan et al., 2022, Beaglehole et al., 2023, Yadav et al., 2024, Rubin et al., 5 Feb 2025, Göring et al., 16 Oct 2025, Montanari et al., 1 Feb 2026).