Conjectured α^{-1/2} generalization rate for hinge-loss kernel methods on RAF data

Prove that, for support vector machines trained with the hinge loss on the Rules-and-Facts (RAF) data model with any fraction of facts ε>0, any regularization parameter λ>0, and any dot-product kernel characterized by coefficients μ1 and μ⋆, the generalization error decays as α^{-1/2} in the large-sample-complexity limit α→∞.

Background

The RAF model mixes learnable labels generated by a teacher perceptron with a fraction ε of random factual labels to be memorized. For kernel ridge regression (square loss), the paper derives an α{-1/2} generalization decay, while the Bayes-optimal benchmark achieves α{-1}. Numerical evidence suggests that hinge-loss SVMs also decay close to α{-1/2} across kernels and regularizations, but a proof is lacking.

Formalizing and establishing this decay would clarify whether hinge-loss kernel methods fundamentally cannot reach the Bayes-optimal α{-1} rate on RAF data while retaining factual memorization, thereby delineating the limits of convex kernel approaches in this setting.

References

Based on the above evidence, we thus conjecture that also for the hinge loss for any ε>0 and any regularization λ and kernel given by μ1, μ⋆, the generalization decay rate is α{-1/2}.

The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks  (2603.25579 - Farné et al., 26 Mar 2026) in Section 3.4 (The large-α generalization rate)