Self-Paced Learning & Redundant Regularization
- Self-Paced Learning and Redundant Regularization is a framework that marries adaptive sample weighting with penalization of feature or task redundancy to enhance learning outcomes.
- It employs alternating optimization for updating sample weights, reconstruction, and projection, ensuring convergence and robustness in feature selection and task sampling.
- SPLR methods demonstrate superior clustering accuracy in unsupervised feature selection and improved sample efficiency in reinforcement learning benchmarks such as USPS and BipedalWalker.
Self-paced learning with redundant regularization (SPLR) refers to a class of algorithms that integrate curriculum-based sample selection (self-paced learning, SPL) with explicit control over feature or task redundancy via regularization schemes. SPLR methods have been developed for unsupervised feature selection in high-dimensional settings and for curriculum learning in reinforcement learning, sharing the central principle of adaptively weighting training data or tasks to promote robustness and efficiency, while suppressing redundancy in either feature or task domains (Li et al., 2021, Niehues et al., 2023).
1. Fundamental Principles
SPLR combines two core ideas: (1) self-paced learning, which incrementally incorporates samples or tasks from easy to hard by assigning adaptive weights, and (2) redundant regularization, which introduces penalization terms—either for feature or task redundancy. In unsupervised feature selection, SPL restricts training to easy samples by sample-wise weights (), while a low-redundant regularizer penalizes the selection of mutually similar features to avoid redundancy. In curriculum learning, SPLR (such as Self-Paced Absolute Learning Progress, SPALP) regularizes task sampling, squashing measures of learning progress where the agent already performs poorly, thus avoiding the repeated exploration of unlearned regions with little true progress.
2. Mathematical Formulations
Unsupervised Feature Selection SPLR
For data , SPLR seeks a row-sparse projection and a reconstruction , with self-paced sample weights . Its objective is
subject to , , , where:
- is the SPL regularizer.
- promotes row-sparsity.
- is the feature-feature similarity, enforcing low redundancy.
- The graph Laplacian preserves the data manifold structure.
Curriculum Learning SPLR
SPALP (Niehues et al., 2023) uses a self-paced regularizer for the teacher’s task-sampling policy. For student policy , the objective is:
ALP computations are regularized using a function . The self-paced ALP is
The squashing parameter is dynamically adjusted to meet a mean reward bound , turning off regularization when mean rewards exceed .
3. Optimization Procedures
SPLR in feature selection is optimized via block coordinate descent, alternating between , , and :
- Sample weight update (): Closed-form, depending on current reconstruction losses and age parameter , where hard samples are gradually introduced.
- Reconstruction update (): Solved via multiplicative updates ensuring nonnegativity.
- Projection update (): Solved via multiplicative rule under nonnegativity, orthogonality, and sparsity constraints, accounting for redundancy and manifold penalties.
This alternating scheme is proven to guarantee descent of the overall objective, which is bounded below, ensuring convergence in practice (Li et al., 2021).
For SPALP, the teacher and student are alternately updated: student policy via standard reinforcement learning, teacher task-distribution via GMM fitted over self-paced ALP scores.
4. Redundant Regularization Mechanisms
In unsupervised feature selection, redundancy is quantified using the feature-feature similarity matrix ; strong penalties discourage the joint selection of highly correlated features. The regularizer
where is the sum of projection weights for feature , ensures that simultaneously selecting redundant features is penalized. In curriculum learning, the squashing of low-reward ALP values avoids oversampling from tasks where the policy's learning progress is misleadingly large but the absolute performance is low; redundancy is thus regularized in task space.
5. Empirical Performance and Benchmarking
Unsupervised Feature Selection
SPLR was benchmarked on nine public datasets—image (USPS, COIL20, ORL, UMIST, warpPIE10P), artificial (Madelon), speech (Isolet), and biology (Colon, GLIOMA). Compared with baseline (no selection), Laplacian Score, MCFS, UDFS, DISR, RNE, and SGFS, SPLR achieved highest clustering accuracy (ACC) and normalized mutual information (NMI) on 7/9 datasets (K-means) and 6/9 (PAM). Wilcoxon signed-rank tests confirmed statistical significance () over most baselines. The nonconvex penalty consistently outperformed alternatives (, ). Convergence is rapid, with stabilization in fewer than 200-600 iterations on all datasets. SPLR is relatively insensitive to sparsity and orthogonality hyperparameters, but benefits from moderate tuning of redundancy and manifold terms (Li et al., 2021).
Curriculum Learning
In three domains (toy 2D/3D grids, MuJoCo BipedalWalker, sparse-reward ball-catching), SPALP matches or exceeds the final mastery of standard ALP-GMM, but requires fewer interactions. In BipedalWalker, SPALP reaches 85% mastery in 2,000 episodes versus 64% for ALP-GMM, with final mastery of 93% and 92% at 6,000 episodes, respectively. In sparse reward settings, SPALP reaches equivalent performance with approximately half the gradient updates required by ALP-GMM (Niehues et al., 2023).
| Setting | Baseline / ALP-GMM | SPLR / SPALP | Metric |
|---|---|---|---|
| USPS, K-means | Lower ACC/NMI | Highest ACC/NMI (7/9) | Clustering Quality |
| BipedalWalker | 64% mastery/2k eps | 85% mastery/2k eps | Sample Efficiency |
| Ball-catching | ~70% after X steps | ~70% at ~½ updates | RL Sample Efficiency |
6. Theoretical Guarantees and Interpretations
Block coordinate descent in SPLR is theoretically guaranteed to converge, as each update step (for , , ) lowers a bounded-below objective. The mixture SPL regularizer ensures a rigorous curriculum, where the age parameter monotonically anneals the model's tolerance to harder samples.
SPALP inherits the variational framework of self-paced deep learning: the regularizer enforces that the teacher distribution does not deviate too far from a goal distribution, and annealing ensures adaptive curriculum progression. As , SPALP reduces to standard ALP-GMM, guaranteeing it is a strict generalization (Niehues et al., 2023).
A notable implication is that both SPLR variants realize data- or task-dependent curricula while preventing wasted computation on redundant, uninformative features or tasks. The redundancy-aware regularization yields superior generalization and performance, particularly in high-dimensional or sparse-reward settings.
7. Practical Significance and Applications
SPLR methods provide robust unsupervised feature selection, especially suited to large-scale, high-dimensional unlabeled datasets with noisy or redundant features. Empirically, SPLR demonstrates superior clustering performance and rapid convergence on a variety of benchmarks without requiring labels. In reinforcement learning, self-paced redundancy regularization in curriculum design enables more efficient policy learning, particularly when task difficulty can be structured and redundancy in exploration would otherwise slow mastery.
A plausible implication is that further development of redundancy-aware self-paced curricula could extend SPLR approaches to supervised settings, federated learning, and complex hierarchical or compositional learning scenarios, wherever redundancy and difficulty interact to impede efficient learning.
References:
- (Li et al., 2021) Unsupervised feature selection via self-paced learning and low-redundant regularization
- (Niehues et al., 2023) Self-Paced Absolute Learning Progress as a Regularized Approach to Curriculum Learning