Curvature Tuning: Provable Training-free Model Steering From a Single Parameter

Published 11 Feb 2025 in cs.LG | (2502.07783v4)

Abstract: The scaling of model and data sizes has reshaped the AI landscape, establishing finetuning pretrained models as the standard paradigm for solving downstream tasks. However, dominant finetuning methods typically rely on weight adaptation, often lack interpretability, and depend on heuristically chosen hyperparameters. In this paper, we take a different perspective and shift the focus from weights to activation functions, viewing them through the lens of spline operators. We propose Curvature Tuning (CT), an interpretable and principled steering method that modulates a model's decision boundary by injecting a single hyperparameter into its activation functions. We show that CT provably adjusts model decision boundary curvature and, more fundamentally, projects a model onto a space of smooth functions-thereby complementing current finetuning methods, whose effect lies primarily in feature adaptation. Making this hyperparameter trainable gives rise to a novel and highly parameter-efficient finetuning method. Empirically, CT improves both generalization and robustness. For example, it boosts downstream accuracy of ResNet-50/152 by 7.14%/8.46% over linear probing and 4.64%/1.70% over LoRA across 12 datasets, and improves robust accuracy on the $\ell_\infty$ benchmark from RobustBench by 1032.64%/1494.46%. Our code is available at https://github.com/Leon-Leyang/curvature-tuning.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel single-parameter (β) mechanism to modulate the curvature of decision boundaries without gradient-based fine-tuning.
It designs a custom activation function that blends reparameterized Swish and SoftPlus, allowing a smooth transition between linear and nonlinear regimes for enhanced interpretability.
Empirical evaluations demonstrate consistent improvements in accuracy and robustness across models like ResNets and transformers on various challenging datasets.

The paper presents a theoretically grounded method for training‐free model steering that leverages the intrinsic connection between deep neural networks and max‐affine spline operators. The authors introduce a single-parameter mechanism—denoted by β—to modulate the curvature of a network’s decision boundary by systematically adjusting its activation functions. This curvature tuning (CT) framework alters the nonlinearity of standard activations (e.g., ReLU) without any gradient-based fine-tuning, thereby preserving efficiency while offering enhanced interpretability.

Theoretical Foundations and Methodology

The work builds on the observation that many layers in deep networks can be exactly represented as max-affine spline operators. By interpreting the network’s activation non-linearities as a piecewise-affine function, the CT approach modulates the transition between affine regions.
Two complementary smoothing strategies are proposed. One approach smooths the region assignment of the max operator in the max-affine spline formulation, while the other directly smooths the maximum operator via techniques akin to the log-sum-exp approximation.
To counteract the shift in the mean output induced by each individual smoothing mechanism (which would lead to an undesirable drift in the decision boundary), the method takes an average of these two parametrizations. This combination provides provable guarantees on the modulation of decision boundary curvature such that β → 0 recovers fully linear behavior while β = 1 corresponds to the original ReLU-based nonlinearity.

Activation Function Design

The authors design a novel activation function as a convex combination of a reparameterized Swish function and a reparameterized SoftPlus function.
Mathematically, the custom activation function is expressed as
- $x$ denotes the input,
- $\beta \in [0, 1)$ controls the curvature,
- $\sigma(\cdot)$ is the sigmoid function,
- and $c$ is a coefficient that balances the contributions of the two components.
This formulation has the appealing property that as β is tuned, the map transitions continuously between the piecewise affine and globally affine regimes, offering a direct handle on the curvature of the decision boundary.

Empirical Evaluation and Key Findings

Extensive experiments are reported across a wide range of models (including ResNets and ReLU-based transformers) and datasets (natural images, medical imaging tasks, and fine-grained classification benchmarks).
In standard transfer settings, applying CT to pretrained models like ResNet-18, ResNet-50, and ResNet-152 results in consistent improvements in test accuracy. For example, average relative accuracy gains of approximately 1.68%–3.53% are observed when transferring across datasets ranging from MNIST and CIFAR to ImageNet derivatives.
Robustness experiments under adversarial and corruption-based attacks demonstrate significant improvements. For instance, robust accuracy improvements on benchmarks such as RobustBench reach relative gains of 11.76% for smaller models and dramatically higher gains (up to nearly 500%) for larger ones, indicating that CT’s effect becomes more pronounced with increased model capacity.
The paper also includes ablation studies that validate the combined use of reparameterized Swish and SoftPlus. Individually, these components improve performance modestly (e.g., around 0.23% to 2.96% improvements), but their combination leads to the most substantial gains (e.g., approximately 3.46% in generalization and 11.76% in robustness).
In the case of transformers, the authors modify architectures (such as Swin-T and Swin-S) by replacing GELU with ReLU so that the CT framework is applicable. Despite only attaining partial theoretical guarantees for these models, CT still yields relative improvements on downstream datasets, demonstrating its versatility.

Concluding Remarks

The proposed CT method offers a provable training-free alternative to conventional fine-tuning techniques. By controlling the curvature of the decision boundary with a single hyperparameter, the approach circumvents the need for extensive backpropagation-based updates, while also bolstering both generalization and robustness.
This work not only provides solid theoretical insights into the role of activation curvature in deep networks but also empirically validates that a principled, activation-based strategy can effectively fine-tune pretrained models across diverse tasks.
The method’s reliance on replacing standard activations makes it particularly appealing from an interpretability standpoint, as the impact on the decision boundary can be directly quantified and understood.

Overall, the paper contributes a theoretically sound, computationally efficient, and empirically validated strategy for training-free model adaptation, spotlighting a novel avenue for parameter-efficient fine-tuning in deep learning.

Markdown Report Issue