diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst index 9d585c16e482681c55f3977157fe35ff34b7b5ce..0f0adecdd3cf30799e3b7503edf42a334bf1a6c9 100644 --- a/doc/modules/feature_selection.rst +++ b/doc/modules/feature_selection.rst @@ -227,67 +227,6 @@ alpha parameter, the fewer features selected. Processing Magazine [120] July 2007 http://dsp.rice.edu/sites/dsp.rice.edu/files/cs/baraniukCSlecture07.pdf -.. _randomized_l1: - -Randomized sparse models -------------------------- - -.. currentmodule:: sklearn.linear_model - -In terms of feature selection, there are some well-known limitations of -L1-penalized models for regression and classification. For example, it is -known that the Lasso will tend to select an individual variable out of a group -of highly correlated features. Furthermore, even when the correlation between -features is not too high, the conditions under which L1-penalized methods -consistently select "good" features can be restrictive in general. - -To mitigate this problem, it is possible to use randomization techniques such -as those presented in [B2009]_ and [M2010]_. The latter technique, known as -stability selection, is implemented in the module :mod:`sklearn.linear_model`. -In the stability selection method, a subsample of the data is fit to a -L1-penalized model where the penalty of a random subset of coefficients has -been scaled. Specifically, given a subsample of the data -:math:`(x_i, y_i), i \in I`, where :math:`I \subset \{1, 2, \ldots, n\}` is a -random subset of the data of size :math:`n_I`, the following modified Lasso -fit is obtained: - -.. math:: \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{ \vert w_j \vert}{s_j}, - -where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli -random variable, and :math:`0<s<1` is the scaling factor. By repeating this -procedure across different random subsamples and Bernoulli trials, one can -count the fraction of times the randomized procedure selected each feature, -and used these fractions as scores for feature selection. - -:class:`RandomizedLasso` implements this strategy for regression -settings, using the Lasso, while :class:`RandomizedLogisticRegression` uses the -logistic regression and is suitable for classification tasks. To get a full -path of stability scores you can use :func:`lasso_stability_path`. - -.. figure:: ../auto_examples/linear_model/images/sphx_glr_plot_sparse_recovery_003.png - :target: ../auto_examples/linear_model/plot_sparse_recovery.html - :align: center - :scale: 60 - -Note that for randomized sparse models to be more powerful than standard -F statistics at detecting non-zero features, the ground truth model -should be sparse, in other words, there should be only a small fraction -of features non zero. - -.. topic:: Examples: - - * :ref:`sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py`: An example - comparing different feature selection approaches and discussing in - which situation each approach is to be favored. - -.. topic:: References: - - .. [B2009] F. Bach, "Model-Consistent Sparse Estimation through the - Bootstrap." https://hal.inria.fr/hal-00354771/ - - .. [M2010] N. Meinshausen, P. Buhlmann, "Stability selection", - Journal of the Royal Statistical Society, 72 (2010) - http://arxiv.org/pdf/0809.2932.pdf Tree-based feature selection ---------------------------- diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst index 0696b4f9f5697d531cab5a473dc07bc72b6c2da9..e6d0ea882f6d35473241b92f4cab8b64cfc95bea 100644 --- a/doc/modules/linear_model.rst +++ b/doc/modules/linear_model.rst @@ -205,11 +205,6 @@ computes the coefficients along the full path of possible values. thus be used to perform feature selection, as detailed in :ref:`l1_feature_selection`. -.. note:: **Randomized sparsity** - - For feature selection or sparse recovery, it may be interesting to - use :ref:`randomized_l1`. - Setting regularization parameter -------------------------------- diff --git a/doc/whats_new.rst b/doc/whats_new.rst index ecfc65de356f8e10d62b90dba9df62af5d8453d2..a9601419c9edd2edda768fae4c732e6e88f09919 100644 --- a/doc/whats_new.rst +++ b/doc/whats_new.rst @@ -575,6 +575,7 @@ API changes summary - ``utils.sparsetools.connected_components`` - ``utils.stats.rankdata`` - ``neighbors.approximate.LSHForest`` + - ``linear_model.randomized_l1`` - Deprecate the ``y`` parameter in `transform` and `inverse_transform`. The method should not accept ``y`` parameter, as it's used at the prediction time. @@ -1306,6 +1307,9 @@ Model evaluation and meta-estimators the parameter ``n_labels`` is renamed to ``n_groups``. :issue:`6660` by `Raghav RV`_. + - The :mod:`sklearn.linear_model.randomized_l1` is deprecated. + :issue: `8995` by :user:`Ramana.S <sentient07>`. + Code Contributors ----------------- Aditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander diff --git a/examples/linear_model/plot_sparse_recovery.py b/examples/linear_model/plot_sparse_recovery.py deleted file mode 100644 index 3039b46ce6bd80969e83240bd7a187a6d5d7a65a..0000000000000000000000000000000000000000 --- a/examples/linear_model/plot_sparse_recovery.py +++ /dev/null @@ -1,173 +0,0 @@ -""" -============================================================ -Sparse recovery: feature selection for sparse linear models -============================================================ - -Given a small number of observations, we want to recover which features -of X are relevant to explain y. For this :ref:`sparse linear models -<l1_feature_selection>` can outperform standard statistical tests if the -true model is sparse, i.e. if a small fraction of the features are -relevant. - -As detailed in :ref:`the compressive sensing notes -<compressive_sensing>`, the ability of L1-based approach to identify the -relevant variables depends on the sparsity of the ground truth, the -number of samples, the number of features, the conditioning of the -design matrix on the signal subspace, the amount of noise, and the -absolute value of the smallest non-zero coefficient [Wainwright2006] -(http://statistics.berkeley.edu/sites/default/files/tech-reports/709.pdf). - -Here we keep all parameters constant and vary the conditioning of the -design matrix. For a well-conditioned design matrix (small mutual -incoherence) we are exactly in compressive sensing conditions (i.i.d -Gaussian sensing matrix), and L1-recovery with the Lasso performs very -well. For an ill-conditioned matrix (high mutual incoherence), -regressors are very correlated, and the Lasso randomly selects one. -However, randomized-Lasso can recover the ground truth well. - -In each situation, we first vary the alpha parameter setting the sparsity -of the estimated model and look at the stability scores of the randomized -Lasso. This analysis, knowing the ground truth, shows an optimal regime -in which relevant features stand out from the irrelevant ones. If alpha -is chosen too small, non-relevant variables enter the model. On the -opposite, if alpha is selected too large, the Lasso is equivalent to -stepwise regression, and thus brings no advantage over a univariate -F-test. - -In a second time, we set alpha and compare the performance of different -feature selection methods, using the area under curve (AUC) of the -precision-recall. -""" -print(__doc__) - -# Author: Alexandre Gramfort and Gael Varoquaux -# License: BSD 3 clause - -import warnings - -import matplotlib.pyplot as plt -import numpy as np -from scipy import linalg - -from sklearn.linear_model import (RandomizedLasso, lasso_stability_path, - LassoLarsCV) -from sklearn.feature_selection import f_regression -from sklearn.preprocessing import StandardScaler -from sklearn.metrics import auc, precision_recall_curve -from sklearn.ensemble import ExtraTreesRegressor -from sklearn.exceptions import ConvergenceWarning - - -def mutual_incoherence(X_relevant, X_irelevant): - """Mutual incoherence, as defined by formula (26a) of [Wainwright2006]. - """ - projector = np.dot(np.dot(X_irelevant.T, X_relevant), - linalg.pinvh(np.dot(X_relevant.T, X_relevant))) - return np.max(np.abs(projector).sum(axis=1)) - - -for conditioning in (1, 1e-4): - ########################################################################### - # Simulate regression data with a correlated design - n_features = 501 - n_relevant_features = 3 - noise_level = .2 - coef_min = .2 - # The Donoho-Tanner phase transition is around n_samples=25: below we - # will completely fail to recover in the well-conditioned case - n_samples = 25 - block_size = n_relevant_features - - rng = np.random.RandomState(42) - - # The coefficients of our model - coef = np.zeros(n_features) - coef[:n_relevant_features] = coef_min + rng.rand(n_relevant_features) - - # The correlation of our design: variables correlated by blocs of 3 - corr = np.zeros((n_features, n_features)) - for i in range(0, n_features, block_size): - corr[i:i + block_size, i:i + block_size] = 1 - conditioning - corr.flat[::n_features + 1] = 1 - corr = linalg.cholesky(corr) - - # Our design - X = rng.normal(size=(n_samples, n_features)) - X = np.dot(X, corr) - # Keep [Wainwright2006] (26c) constant - X[:n_relevant_features] /= np.abs( - linalg.svdvals(X[:n_relevant_features])).max() - X = StandardScaler().fit_transform(X.copy()) - - # The output variable - y = np.dot(X, coef) - y /= np.std(y) - # We scale the added noise as a function of the average correlation - # between the design and the output variable - y += noise_level * rng.normal(size=n_samples) - mi = mutual_incoherence(X[:, :n_relevant_features], - X[:, n_relevant_features:]) - - ########################################################################### - # Plot stability selection path, using a high eps for early stopping - # of the path, to save computation time - alpha_grid, scores_path = lasso_stability_path(X, y, random_state=42, - eps=0.05) - - plt.figure() - # We plot the path as a function of alpha/alpha_max to the power 1/3: the - # power 1/3 scales the path less brutally than the log, and enables to - # see the progression along the path - hg = plt.plot(alpha_grid[1:] ** .333, scores_path[coef != 0].T[1:], 'r') - hb = plt.plot(alpha_grid[1:] ** .333, scores_path[coef == 0].T[1:], 'k') - ymin, ymax = plt.ylim() - plt.xlabel(r'$(\alpha / \alpha_{max})^{1/3}$') - plt.ylabel('Stability score: proportion of times selected') - plt.title('Stability Scores Path - Mutual incoherence: %.1f' % mi) - plt.axis('tight') - plt.legend((hg[0], hb[0]), ('relevant features', 'irrelevant features'), - loc='best') - - ########################################################################### - # Plot the estimated stability scores for a given alpha - - # Use 6-fold cross-validation rather than the default 3-fold: it leads to - # a better choice of alpha: - # Stop the user warnings outputs- they are not necessary for the example - # as it is specifically set up to be challenging. - with warnings.catch_warnings(): - warnings.simplefilter('ignore', UserWarning) - warnings.simplefilter('ignore', ConvergenceWarning) - lars_cv = LassoLarsCV(cv=6).fit(X, y) - - # Run the RandomizedLasso: we use a paths going down to .1*alpha_max - # to avoid exploring the regime in which very noisy variables enter - # the model - alphas = np.linspace(lars_cv.alphas_[0], .1 * lars_cv.alphas_[0], 6) - clf = RandomizedLasso(alpha=alphas, random_state=42).fit(X, y) - trees = ExtraTreesRegressor(100).fit(X, y) - # Compare with F-score - F, _ = f_regression(X, y) - - plt.figure() - for name, score in [('F-test', F), - ('Stability selection', clf.scores_), - ('Lasso coefs', np.abs(lars_cv.coef_)), - ('Trees', trees.feature_importances_), - ]: - precision, recall, thresholds = precision_recall_curve(coef != 0, - score) - plt.semilogy(np.maximum(score / np.max(score), 1e-4), - label="%s. AUC: %.3f" % (name, auc(recall, precision))) - - plt.plot(np.where(coef != 0)[0], [2e-4] * n_relevant_features, 'mo', - label="Ground truth") - plt.xlabel("Features") - plt.ylabel("Score") - # Plot only the 100 first coefficients - plt.xlim(0, 100) - plt.legend(loc='best') - plt.title('Feature selection scores - Mutual incoherence: %.1f' - % mi) - -plt.show() diff --git a/sklearn/linear_model/__init__.py b/sklearn/linear_model/__init__.py index 86aa17dea56b24dc6fb2ab1eafa352ae70703f1a..cd1c616f15bc4a97bcc205f7b03f561a255dccd2 100644 --- a/sklearn/linear_model/__init__.py +++ b/sklearn/linear_model/__init__.py @@ -30,8 +30,10 @@ from .omp import (orthogonal_mp, orthogonal_mp_gram, OrthogonalMatchingPursuit, from .passive_aggressive import PassiveAggressiveClassifier from .passive_aggressive import PassiveAggressiveRegressor from .perceptron import Perceptron + from .randomized_l1 import (RandomizedLasso, RandomizedLogisticRegression, lasso_stability_path) + from .ransac import RANSACRegressor from .theil_sen import TheilSenRegressor diff --git a/sklearn/linear_model/randomized_l1.py b/sklearn/linear_model/randomized_l1.py index 27ec90aa49e6aa30ba397792dafd95dfe91ffd2c..28a861f024bcd8cea7b89302069b7197977f000e 100644 --- a/sklearn/linear_model/randomized_l1.py +++ b/sklearn/linear_model/randomized_l1.py @@ -6,9 +6,10 @@ sparse Logistic Regression # Author: Gael Varoquaux, Alexandre Gramfort # # License: BSD 3 clause + +import warnings import itertools from abc import ABCMeta, abstractmethod -import warnings import numpy as np from scipy.sparse import issparse @@ -20,7 +21,8 @@ from ..base import BaseEstimator from ..externals import six from ..externals.joblib import Memory, Parallel, delayed from ..feature_selection.base import SelectorMixin -from ..utils import (as_float_array, check_random_state, check_X_y, safe_mask) +from ..utils import (as_float_array, check_random_state, check_X_y, safe_mask, + deprecated) from ..utils.validation import check_is_fitted from .least_angle import lars_path, LassoLarsIC from .logistic import LogisticRegression @@ -58,6 +60,8 @@ def _resample_model(estimator_func, X, y, scaling=.5, n_resampling=200, return scores_ +@deprecated("The class BaseRandomizedLinearModel is deprecated in 0.19" + " and will be removed in 0.21.") class BaseRandomizedLinearModel(six.with_metaclass(ABCMeta, BaseEstimator, SelectorMixin)): """Base class to implement randomized linear models for feature selection @@ -178,6 +182,8 @@ def _randomized_lasso(X, y, weights, mask, alpha=1., verbose=False, return scores +@deprecated("The class RandomizedLasso is deprecated in 0.19" + " and will be removed in 0.21.") class RandomizedLasso(BaseRandomizedLinearModel): """Randomized Lasso. @@ -388,6 +394,8 @@ def _randomized_logistic(X, y, weights, mask, C=1., verbose=False, return scores +@deprecated("The class RandomizedLogisticRegression is deprecated in 0.19" + " and will be removed in 0.21.") class RandomizedLogisticRegression(BaseRandomizedLinearModel): """Randomized Logistic Regression @@ -573,6 +581,8 @@ def _lasso_stability_path(X, y, mask, weights, eps): return alphas, coefs +@deprecated("The function lasso_stability_path is deprecated in 0.19" + " and will be removed in 0.21.") def lasso_stability_path(X, y, scaling=0.5, random_state=None, n_resampling=200, n_grid=100, sample_fraction=0.75, diff --git a/sklearn/linear_model/tests/test_randomized_l1.py b/sklearn/linear_model/tests/test_randomized_l1.py index 37eb66faab3393408295469122402f30ca889b9a..c783bfc7d4933fb7e8375ab3b304e64264c81bb5 100644 --- a/sklearn/linear_model/tests/test_randomized_l1.py +++ b/sklearn/linear_model/tests/test_randomized_l1.py @@ -11,10 +11,13 @@ from sklearn.utils.testing import assert_array_equal from sklearn.utils.testing import assert_raises from sklearn.utils.testing import assert_raises_regex from sklearn.utils.testing import assert_allclose +from sklearn.utils.testing import ignore_warnings +from sklearn.utils.testing import assert_warns_message -from sklearn.linear_model.randomized_l1 import (lasso_stability_path, +from sklearn.linear_model.randomized_l1 import(lasso_stability_path, RandomizedLasso, RandomizedLogisticRegression) + from sklearn.datasets import load_diabetes, load_iris from sklearn.feature_selection import f_regression, f_classif from sklearn.preprocessing import StandardScaler @@ -30,6 +33,7 @@ X = X[:, [2, 3, 6, 7, 8]] F, _ = f_regression(X, y) +@ignore_warnings(category=DeprecationWarning) def test_lasso_stability_path(): # Check lasso stability path # Load diabetes data and add noisy features @@ -42,6 +46,7 @@ def test_lasso_stability_path(): np.argsort(np.sum(scores_path, axis=1))[-3:]) +@ignore_warnings(category=DeprecationWarning) def test_randomized_lasso_error_memory(): scaling = 0.3 selection_threshold = 0.5 @@ -55,6 +60,7 @@ def test_randomized_lasso_error_memory(): clf.fit, X, y) +@ignore_warnings(category=DeprecationWarning) def test_randomized_lasso(): # Check randomized lasso scaling = 0.3 @@ -124,6 +130,7 @@ def test_randomized_lasso_precompute(): assert_array_equal(feature_scores_1, feature_scores_2) +@ignore_warnings(category=DeprecationWarning) def test_randomized_logistic(): # Check randomized sparse logistic regression iris = load_iris() @@ -153,6 +160,7 @@ def test_randomized_logistic(): assert_raises(ValueError, clf.fit, X, y) +@ignore_warnings(category=DeprecationWarning) def test_randomized_logistic_sparse(): # Check randomized sparse logistic regression on sparse data iris = load_iris() @@ -179,3 +187,31 @@ def test_randomized_logistic_sparse(): tol=1e-3) feature_scores_sp = clf.fit(X_sp, y).scores_ assert_array_equal(feature_scores, feature_scores_sp) + + +def test_warning_raised(): + + scaling = 0.3 + selection_threshold = 0.5 + tempdir = 5 + assert_warns_message(DeprecationWarning, "The function" + " lasso_stability_path is " + "deprecated in 0.19 and will be removed in 0.21.", + lasso_stability_path, X, y, scaling=scaling, + random_state=42, n_resampling=30) + + assert_warns_message(DeprecationWarning, "Class RandomizedLasso is" + " deprecated; The class RandomizedLasso is " + "deprecated in 0.19 and will be removed in 0.21.", + RandomizedLasso, verbose=False, alpha=[1, 0.8], + random_state=42, scaling=scaling, + selection_threshold=selection_threshold, + memory=tempdir) + + assert_warns_message(DeprecationWarning, "The class" + " RandomizedLogisticRegression is " + "deprecated in 0.19 and will be removed in 0.21.", + RandomizedLogisticRegression, + verbose=False, C=1., random_state=42, + scaling=scaling, n_resampling=50, + tol=1e-3)