### abstract ###
We consider the least-square linear regression problem with regularization by the  SYMBOL -norm, a problem usually referred to as the Lasso
In this paper, we first present a detailed asymptotic analysis of model consistency of the Lasso in low-dimensional settings
For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection
For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability
We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection
This novel variable selection procedure, referred to as the Bolasso, is extended to high-dimensional settings by a provably consistent two-step procedure
### introduction ###
Regularization by the  SYMBOL -norm has attracted a lot of interest in recent years in statistics, machine learning and signal processing
In the context of least-square linear regression, the problem is usually referred to as the  Lasso ~ CITATION  or  basis pursuit ~ CITATION
Much of the early effort has been dedicated to algorithms to solve the optimization problem efficiently, either through first-order methods~ CITATION , or through homotopy methods that leads to the entire regularization path (i e , the set of solutions for all values of the regularization parameters) at the cost of a single matrix inversion~ CITATION
A well-known property of the  regularization by the  SYMBOL -norm is the  sparsity  of the solutions, i e ,  it leads to  loading vectors with many zeros, and thus performs model selection on top of regularization
Recent works  CITATION  have looked precisely at the model consistency of the Lasso, i e , if we know that the data were generated from a sparse loading vector, does the Lasso actually recover the sparsity pattern when the number of observations grows
In the case of a fixed number of covariates (i e , low-dimensional settings), the Lasso does recover the sparsity pattern if and only if a certain simple condition on the generating covariance matrices is satisfied~ CITATION
In particular, in low correlation settings, the Lasso is indeed consistent
However, in presence of strong correlations between relevant variables and irrelevant variables, the Lasso cannot be model-consistent, shedding light on potential problems of such procedures for variable selection
Various extensions of the Lasso have been designed to fix its inconsistency, based on thresholding~ CITATION , data-dependent weights~ CITATION  or two-step procedures~ CITATION
The main contribution of this paper is to propose and analyze an alternative approach based on resampling
Note that recent work~ CITATION  has also looked at resampling methods for the Lasso, but focuses on resampling the weights of the  SYMBOL -norm rather than resampling the observations (see \mysec{support} for more details)
In this paper, we first derive a detailed asymptotic analysis of sparsity pattern selection of the Lasso estimation procedure, that extends previous analysis~ CITATION  by focusing on a specific decay of the regularization parameter
Namely, in  low-dimensional  settings where the number of variables  SYMBOL  is much smaller than the number of observations  SYMBOL , we show that when the decay of  SYMBOL  is proportional to  SYMBOL , then the Lasso will select all the variables that should enter the model (the  relevant  variables) with probability tending to one exponentially fast with~ SYMBOL , while it selects all other variables (the  irrelevant  variables) with strictly positive probability
If several datasets generated from the same distribution were available, then the latter property  would suggest to consider the intersection of the supports of the Lasso estimates for each dataset: all relevant variables would always be selected for all datasets, while irrelevant variables would enter the models randomly, and intersecting the supports from sufficiently many different datasets would simply eliminate them
However, in practice, only one dataset is given; but resampling methods such as the  bootstrap  are exactly dedicated to mimic the availability of several datasets by resampling from the same unique dataset~ CITATION
In this paper, we show that when using the bootstrap and intersecting the supports, we actually get a consistent model estimate,  without  the consistency condition required by the regular Lasso
We refer to this new procedure as the  Bolasso  ( bo otstrap-enhanced  l east  a b s olute  s hrinkage  o perator)
Finally, our Bolasso framework could be seen as a voting scheme applied to the supports of the bootstrap Lasso estimates; however, our procedure may rather be considered as a consensus combination scheme, as we keep the (largest) subset of variables on which  all  regressors agree in terms of variable selection, which is in our case provably consistent and also allows to get rid of a potential additional hyperparameter
We consider the two usual ways of using the bootstrap in regression settings, namely bootstrapping pairs and bootstrapping residuals~ CITATION
In \mysec{support}, we show that the two types of bootstrap lead to consistent model selection in low-dimensional settings
Moreover, in \mysec{simulations}, we provide empirical evidence that in high-dimensional settings, bootstrapping pairs does not lead to consistent estimation, while bootstrapping residuals still does
While we are currently unable to prove the consistency of bootstrapping residuals in high-dimensional settings, we prove in \mysec{highdim} the model consistency of a related two-step procedure:  the Lasso is run once on the original data, with a larger regularization parameter, and then bootstrap replications (pairs or residuals) are run within the support of the first Lasso estimation
We show in \mysec{highdim} that this procedure is consistent
In order to do so, we consider new sufficient conditions for the consistency of the Lasso, which do not rely on sparse eigenvalues~ CITATION , low correlations~ CITATION  or finer conditions~ CITATION
In particular, our new assumptions allow to prove that the Lasso will select not only a few variables when the regularization parameter is properly chosen, but always the same variables with high probability
In \mysec{algorithms}, we derive efficient algorithms for the bootstrapped versions of the Lasso
When bootstrapping pairs, we simply run an efficient homotopy algorithm, such as  Lars ~ CITATION , multiple times; however, when bootstrapping residuals, more efficient ways may be designed to obtain a running time complexity which is less than running Lars multiple times
Finally, in \mysec{experiments-low} and \mysec{experiments-high}, we illustrate our results on synthetic examples, in low-dimensional and high-dimensional settings
This work is a follow-up to earlier work~ CITATION : in particular, it refines and extends the analysis to high-dimensional settings and boostrapping of the residuals \paragraph{Notations} For  SYMBOL  and  SYMBOL , we  denote by  SYMBOL  its  SYMBOL -norm, defined as  SYMBOL
We also denote by  SYMBOL  its  SYMBOL -norm
For rectangular matrices  SYMBOL , we denote by  SYMBOL  its largest singular value,  SYMBOL  the largest magnitude of all its elements, and  SYMBOL  its Frobenius norm
We let denote  SYMBOL  and  SYMBOL  the largest and smallest eigenvalue of a symmetric matrix  SYMBOL
For    SYMBOL ,  SYMBOL  denotes the sign of  SYMBOL , defined as  SYMBOL  if  SYMBOL ,  SYMBOL  if  SYMBOL , and  SYMBOL  if  SYMBOL
For a vector  SYMBOL ,  SYMBOL  denotes the   vector of signs of elements of  SYMBOL
Given a set  SYMBOL ,  SYMBOL  is the indicator function of the set  SYMBOL
We also denote, for  SYMBOL , by  SYMBOL , the smallest (in magnitude) of non-zero elements of  SYMBOL
Moreover, given a vector  SYMBOL  and a subset  SYMBOL  of  SYMBOL ,  SYMBOL  denotes the vector in  SYMBOL  of elements of  SYMBOL  indexed by  SYMBOL
Similarly, for a matrix  SYMBOL ,  SYMBOL   denotes the submatrix of   SYMBOL  composed of elements of  SYMBOL  whose rows are in  SYMBOL  and columns are in  SYMBOL
Moreover,  SYMBOL  denotes the cardinal of the set  SYMBOL
For a positive definite matrix  SYMBOL  of size  SYMBOL , and two disjoint subsets of indices  SYMBOL  and  SYMBOL  included in  SYMBOL , we denote  SYMBOL  the matrix  SYMBOL , which is the conditional covariance of variables indexed by  SYMBOL  given variables indexed by  SYMBOL , for a Gaussian vector with covariance matrix  SYMBOL
Finally, we let denote  SYMBOL  and  SYMBOL  general probability measures and expectations \paragraph{Least-square regression with  SYMBOL -norm penalization} Throughout this paper,  we consider  SYMBOL  pairs of observations    SYMBOL ,  SYMBOL
The data are given in the form of a vector  SYMBOL  and a design matrix  SYMBOL
We consider the normalized square loss function   SYMBOL  SYMBOL \ell^1 SYMBOL 0 SYMBOL SYMBOL \hat{J} = \{ j  \{1,\dots,p\}, \  \hat{w}_j  0\} SYMBOL SYMBOL p/n$
When this ratio is much smaller than one, as in \mysec{lowdim}, we refer to this setting as low-dimensional estimation, while in other cases, where this ratio is potentially much larger than one, we refer to this setting as a high-dimensional problem (see \mysec{highdim})
