Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Note on Coding and Standardization of Categorical Variables in (Sparse) Group Lasso Regression

Published 17 May 2018 in stat.CO, stat.ME, and stat.ML | (1805.06915v1)

Abstract: Categorical regressor variables are usually handled by introducing a set of indicator variables, and imposing a linear constraint to ensure identifiability in the presence of an intercept, or equivalently, using one of various coding schemes. As proposed in Yuan and Lin [J. R. Statist. Soc. B, 68 (2006), 49-67], the group lasso is a natural and computationally convenient approach to perform variable selection in settings with categorical covariates. As pointed out by Simon and Tibshirani [Stat. Sin., 22 (2011), 983-1001], "standardization" by means of block-wise orthonormalization of column submatrices each corresponding to one group of variables can substantially boost performance. In this note, we study the aspect of standardization for the special case of categorical predictors in detail. The main result is that orthonormalization is not required; column-wise scaling of the design matrix followed by re-scaling and centering of the coefficients is shown to have exactly the same effect. Similar reductions can be achieved in the case of interactions. The extension to the so-called sparse group lasso, which additionally promotes within-group sparsity, is considered as well. The importance of proper standardization is illustrated via extensive simulations.

Citations (12)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.