Principled Tuning of the ε1 Threshold in NESS

Determine a principled procedure for selecting the threshold ε1 used in NESS (Null-space Estimated from Small Singular values) to define the small-singular-value subspace at each layer via the criterion σ_{t,i} ≤ ε1 · ||I_t||_F. The procedure must balance stability (low interference with previously learned tasks) and plasticity (sufficient capacity to learn new tasks), avoiding both unconstrained updates when ε1 is too large and overly restrictive updates when ε1 is too small, and should account for per-layer differences in dimensionality and singular value spectra.

Background

NESS constructs an update subspace by selecting singular vectors associated with small singular values of the concatenated previous inputs at each layer, controlled by a threshold ε1. The index j is chosen as the smallest i such that σ_{t,i} ≤ ε1 * ||I_t||_F, and updates are parameterized as ΔW_t = U_t V_t with U_t fixed and V_t trainable.

Empirically, the paper observes that large ε1 values lead the model to behave nearly unconstrained and increase forgetting, whereas very small ε1 values overly limit adaptation, potentially harming learning on new tasks. Additionally, because different layers have different shapes and spectra, some layers may end up with nearly full-rank trainable matrices under certain ε1 choices, complicating uniform threshold selection.

References

Therefore, tuning $\varepsilon_1$ remains an open problem.

— Learning in the Null Space: Small Singular Values for Continual Learning (2602.21919 - Pham et al., 25 Feb 2026) in Appendix, Subsection "Limitations and Future Work"

Principled Tuning of the ε1 Threshold in NESS

Background

References

Related Problems