- The paper introduces an 'autotune' approach that alternates between updating regression coefficients and noise variance with bias correction to efficiently tune Lasso.
- The paper demonstrates significant computational speed-ups (15x to 200x) and improved variable selection accuracy compared to traditional tuning methods.
- The paper shows that autotune yields robust noise variance estimates and better generalization in high-dimensional regression and time series applications.
Fast, Accurate, and Automatic Tuning Parameter Selection for Lasso: Analysis of the Autotune Algorithm
Introduction
Accurate and efficient selection of the regularization parameter is a central problem in Lasso-based high-dimensional regression, especially for applications such as Vector Autoregression (VAR) in time series analysis. Existing tuning methods, notably cross-validation (CV) and information criteria (AIC, BIC), exhibit significant shortcomings in both computational efficiency and statistical performance as the problem dimension increases. Numerous alternatives, including Scaled and Square Root Lasso variants, have been proposed, but typically involve modified loss functions with theoretical and practical penalties, particularly in the context of time series data.
This work introduces "autotune," a computationally efficient strategy for automatic and adaptive tuning of the Lasso penalty parameter. The approach leverages alternating minimization of the penalized Gaussian log-likelihood with respect to both the regression coefficients and the noise variance, supported by novel bias correction using partial residuals. The method demonstrates superior computational efficiency and generalization performance to established alternatives—especially in low signal-to-noise regime—across regression and VAR models.
Methodology
The fundamental insight underpinning autotune is an alternating scheme:
- For a fixed value of the noise variance, solve a Lasso problem using coordinate descent with penalty parameter λ=λ0​σ^2.
- For the resulting regression coefficients, update the noise variance estimator via a procedure that corrects for the inherent bias introduced by Lasso shrinkage.
Bias correction is achieved by utilizing the distribution and magnitude of partial residuals (PR) associated with the coordinate descent updates. Rather than using raw residual sum of squares, the authors propose ranking predictors by PR standard deviations and sequentially introducing predictors via F-tests, effectively isolating the non-zero support and mitigating the Lasso's shrinkage bias in σ^2 estimation.
The autotune loop alternates between updating the model coefficients with coordinate descent (stopping early when σ^2 estimates converge) and performing a fast, Gram-Schmidt-orthogonalized series of tests to update the estimate of the residual variance. Early termination and adaptive grid search mean autotune quickly converges to a strong solution without exhaustive evaluation over all possible penalty parameters.
Empirical Results
The substantive contribution is an extensive simulation-based comparison on both standard regression and high-dimensional VAR setups, including realistic financial data. The following are strong claims substantiated by numerical results:
- Computational speed: autotune attains between 15x and 200x speed-ups over CV Lasso and Scaled Lasso in high-dimensional regression settings, with even more pronounced gains in high-dimensional VAR and time series contexts.
- Estimation and selection accuracy: In low SNR regimes and across multiple designs and sparsity patterns, autotune consistently outperforms CV Lasso and Scaled Lasso in RMSE and relative test error (RTE). Variable selection performance, as measured by AUROC and Matthews Correlation Coefficient (MCC), shows comparable or superior results to state-of-the-art, especially in very high-dimensional or low-sample regimes.
- Noise variance estimation: autotune's estimator for σ2 is shown to be less biased and more concentrated around the empirical "oracle" value, compared to Scaled Lasso, Organic Lasso, and Natural Lasso variants, enabling superior high-dimensional inference procedures.
- Practical generalization: On real-world financial data (SP500+DJIA time series), autotune achieves lower out-of-sample prediction error (RTE) and improved out-of-sample R2, indicating more stable and sparser model selection.
- Visual diagnostics: The PR-driven ranking allows derivation of a "scree-plot"-like diagnostic that empirically identifies the elbow point associated with sparse support, serving as the first practical tool for visually checking the sparsity assumption in Lasso regression.
Theoretical and Practical Implications
The autotune framework preserves the original Lasso's convex optimization structure and does not require modification of the objective function, in contrast to tuning-free alternatives such as Square Root or Scaled Lasso. The coordinate descent algorithm is leveraged efficiently with adaptive early stopping and data-driven penalization, yielding both computational benefits and model selection improvements.
A key theoretical advance is the bias correction via PR-based ranking and sequential F-testing. This provides a mechanism for consistently accurate noise variance estimation without the substantial bias exhibited by classical Lasso strategies. This innovation opens up pathways for improved debiasing and confidence interval procedures in high-dimensional inference, an area critically reliant on sound σ2 estimation.
In time series and VAR estimation, where model dimension grows quadratically with the number of series, autotune's speed and robustness become even more pronounced, facilitating methodologies that would be computationally infeasible with K-fold CV or TSCV. The possibility to adaptively regularize each VAR equation independently enables more accurate characterization of lead-lag relationships, critical for, e.g., Granger causality analysis.
Limitations and Prospects
The algorithmic improvements shown here are empirically strong but lack an in-depth convergence analysis. The behavior, limit points, and robustness of the alternation between coefficient and variance updates deserve future theoretical scrutiny, particularly under correlated predictors or model misspecification. Additionally, integrating active-set selection strategies could further reduce computational burden, though more work is needed to robustly address performance in the presence of highly correlated features.
The public availability of an R/C++ implementation encourages further adoption in large-scale statistical pipelines, especially in genomics, finance, and network inference where high-dimensional regularization is standard.
Conclusion
Autotune provides a technically sound and empirically validated strategy for fast, automatic, and accurate Lasso tuning. It overcomes longstanding limitations of cross-validation and information-theoretic approaches in high dimensions, simultaneously advancing state-of-the-art in computational efficiency and statistical inference. The approach holds significant implications for practical deployment of sparse regularization in large-scale and time-dependent data contexts, representing an important step toward robust, scalable, high-dimensional modeling (2512.11139).