diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst
index 9d585c16e482681c55f3977157fe35ff34b7b5ce..0f0adecdd3cf30799e3b7503edf42a334bf1a6c9 100644
--- a/doc/modules/feature_selection.rst
+++ b/doc/modules/feature_selection.rst
@@ -227,67 +227,6 @@ alpha parameter, the fewer features selected.
    Processing Magazine [120] July 2007
    http://dsp.rice.edu/sites/dsp.rice.edu/files/cs/baraniukCSlecture07.pdf
 
-.. _randomized_l1:
-
-Randomized sparse models
--------------------------
-
-.. currentmodule:: sklearn.linear_model
-
-In terms of feature selection, there are some well-known limitations of
-L1-penalized models for regression and classification. For example, it is
-known that the Lasso will tend to select an individual variable out of a group
-of highly correlated features. Furthermore, even when the correlation between
-features is not too high, the conditions under which L1-penalized methods
-consistently select "good" features can be restrictive in general.
-
-To mitigate this problem, it is possible to use randomization techniques such
-as those presented in [B2009]_ and [M2010]_. The latter technique, known as
-stability selection, is implemented in the module :mod:`sklearn.linear_model`.
-In the stability selection method, a subsample of the data is fit to a
-L1-penalized model where the penalty of a random subset of coefficients has
-been scaled. Specifically, given a subsample of the data
-:math:`(x_i, y_i), i \in I`, where :math:`I \subset \{1, 2, \ldots, n\}` is a
-random subset of the data of size :math:`n_I`, the following modified Lasso
-fit is obtained:
-
-.. math::   \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{ \vert w_j \vert}{s_j},
-
-where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli
-random variable, and :math:`0<s<1` is the scaling factor. By repeating this
-procedure across different random subsamples and Bernoulli trials, one can
-count the fraction of times the randomized procedure selected each feature,
-and used these fractions as scores for feature selection.
-
-:class:`RandomizedLasso` implements this strategy for regression
-settings, using the Lasso, while :class:`RandomizedLogisticRegression` uses the
-logistic regression and is suitable for classification tasks. To get a full
-path of stability scores you can use :func:`lasso_stability_path`.
-
-.. figure:: ../auto_examples/linear_model/images/sphx_glr_plot_sparse_recovery_003.png
-   :target: ../auto_examples/linear_model/plot_sparse_recovery.html
-   :align: center
-   :scale: 60
-
-Note that for randomized sparse models to be more powerful than standard
-F statistics at detecting non-zero features, the ground truth model
-should be sparse, in other words, there should be only a small fraction
-of features non zero.
-
-.. topic:: Examples:
-
-   * :ref:`sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py`: An example
-     comparing different feature selection approaches and discussing in
-     which situation each approach is to be favored.
-
-.. topic:: References:
-
-  .. [B2009] F. Bach, "Model-Consistent Sparse Estimation through the
-        Bootstrap." https://hal.inria.fr/hal-00354771/
-
-  .. [M2010] N. Meinshausen, P. Buhlmann, "Stability selection",
-       Journal of the Royal Statistical Society, 72 (2010)
-       http://arxiv.org/pdf/0809.2932.pdf
 
 Tree-based feature selection
 ----------------------------
diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
index 0696b4f9f5697d531cab5a473dc07bc72b6c2da9..e6d0ea882f6d35473241b92f4cab8b64cfc95bea 100644
--- a/doc/modules/linear_model.rst
+++ b/doc/modules/linear_model.rst
@@ -205,11 +205,6 @@ computes the coefficients along the full path of possible values.
       thus be used to perform feature selection, as detailed in
       :ref:`l1_feature_selection`.
 
-.. note:: **Randomized sparsity**
-
-      For feature selection or sparse recovery, it may be interesting to
-      use :ref:`randomized_l1`.
-
 
 Setting regularization parameter
 --------------------------------
diff --git a/doc/whats_new.rst b/doc/whats_new.rst
index ecfc65de356f8e10d62b90dba9df62af5d8453d2..a9601419c9edd2edda768fae4c732e6e88f09919 100644
--- a/doc/whats_new.rst
+++ b/doc/whats_new.rst
@@ -575,6 +575,7 @@ API changes summary
      - ``utils.sparsetools.connected_components``
      - ``utils.stats.rankdata``
      - ``neighbors.approximate.LSHForest``
+     - ``linear_model.randomized_l1``
 
     - Deprecate the ``y`` parameter in `transform` and `inverse_transform`.
       The method  should not accept ``y`` parameter, as it's used at the prediction time.
@@ -1306,6 +1307,9 @@ Model evaluation and meta-estimators
      the parameter ``n_labels`` is renamed to ``n_groups``.
      :issue:`6660` by `Raghav RV`_.
 
+   - The :mod:`sklearn.linear_model.randomized_l1` is deprecated.
+     :issue: `8995` by :user:`Ramana.S <sentient07>`.
+
 Code Contributors
 -----------------
 Aditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander
diff --git a/examples/linear_model/plot_sparse_recovery.py b/examples/linear_model/plot_sparse_recovery.py
deleted file mode 100644
index 3039b46ce6bd80969e83240bd7a187a6d5d7a65a..0000000000000000000000000000000000000000
--- a/examples/linear_model/plot_sparse_recovery.py
+++ /dev/null
@@ -1,173 +0,0 @@
-"""
-============================================================
-Sparse recovery: feature selection for sparse linear models
-============================================================
-
-Given a small number of observations, we want to recover which features
-of X are relevant to explain y. For this :ref:`sparse linear models
-<l1_feature_selection>` can outperform standard statistical tests if the
-true model is sparse, i.e. if a small fraction of the features are
-relevant.
-
-As detailed in :ref:`the compressive sensing notes
-<compressive_sensing>`, the ability of L1-based approach to identify the
-relevant variables depends on the sparsity of the ground truth, the
-number of samples, the number of features, the conditioning of the
-design matrix on the signal subspace, the amount of noise, and the
-absolute value of the smallest non-zero coefficient [Wainwright2006]
-(http://statistics.berkeley.edu/sites/default/files/tech-reports/709.pdf).
-
-Here we keep all parameters constant and vary the conditioning of the
-design matrix. For a well-conditioned design matrix (small mutual
-incoherence) we are exactly in compressive sensing conditions (i.i.d
-Gaussian sensing matrix), and L1-recovery with the Lasso performs very
-well. For an ill-conditioned matrix (high mutual incoherence),
-regressors are very correlated, and the Lasso randomly selects one.
-However, randomized-Lasso can recover the ground truth well.
-
-In each situation, we first vary the alpha parameter setting the sparsity
-of the estimated model and look at the stability scores of the randomized
-Lasso. This analysis, knowing the ground truth, shows an optimal regime
-in which relevant features stand out from the irrelevant ones. If alpha
-is chosen too small, non-relevant variables enter the model. On the
-opposite, if alpha is selected too large, the Lasso is equivalent to
-stepwise regression, and thus brings no advantage over a univariate
-F-test.
-
-In a second time, we set alpha and compare the performance of different
-feature selection methods, using the area under curve (AUC) of the
-precision-recall.
-"""
-print(__doc__)
-
-# Author: Alexandre Gramfort and Gael Varoquaux
-# License: BSD 3 clause
-
-import warnings
-
-import matplotlib.pyplot as plt
-import numpy as np
-from scipy import linalg
-
-from sklearn.linear_model import (RandomizedLasso, lasso_stability_path,
-                                  LassoLarsCV)
-from sklearn.feature_selection import f_regression
-from sklearn.preprocessing import StandardScaler
-from sklearn.metrics import auc, precision_recall_curve
-from sklearn.ensemble import ExtraTreesRegressor
-from sklearn.exceptions import ConvergenceWarning
-
-
-def mutual_incoherence(X_relevant, X_irelevant):
-    """Mutual incoherence, as defined by formula (26a) of [Wainwright2006].
-    """
-    projector = np.dot(np.dot(X_irelevant.T, X_relevant),
-                       linalg.pinvh(np.dot(X_relevant.T, X_relevant)))
-    return np.max(np.abs(projector).sum(axis=1))
-
-
-for conditioning in (1, 1e-4):
-    ###########################################################################
-    # Simulate regression data with a correlated design
-    n_features = 501
-    n_relevant_features = 3
-    noise_level = .2
-    coef_min = .2
-    # The Donoho-Tanner phase transition is around n_samples=25: below we
-    # will completely fail to recover in the well-conditioned case
-    n_samples = 25
-    block_size = n_relevant_features
-
-    rng = np.random.RandomState(42)
-
-    # The coefficients of our model
-    coef = np.zeros(n_features)
-    coef[:n_relevant_features] = coef_min + rng.rand(n_relevant_features)
-
-    # The correlation of our design: variables correlated by blocs of 3
-    corr = np.zeros((n_features, n_features))
-    for i in range(0, n_features, block_size):
-        corr[i:i + block_size, i:i + block_size] = 1 - conditioning
-    corr.flat[::n_features + 1] = 1
-    corr = linalg.cholesky(corr)
-
-    # Our design
-    X = rng.normal(size=(n_samples, n_features))
-    X = np.dot(X, corr)
-    # Keep [Wainwright2006] (26c) constant
-    X[:n_relevant_features] /= np.abs(
-        linalg.svdvals(X[:n_relevant_features])).max()
-    X = StandardScaler().fit_transform(X.copy())
-
-    # The output variable
-    y = np.dot(X, coef)
-    y /= np.std(y)
-    # We scale the added noise as a function of the average correlation
-    # between the design and the output variable
-    y += noise_level * rng.normal(size=n_samples)
-    mi = mutual_incoherence(X[:, :n_relevant_features],
-                            X[:, n_relevant_features:])
-
-    ###########################################################################
-    # Plot stability selection path, using a high eps for early stopping
-    # of the path, to save computation time
-    alpha_grid, scores_path = lasso_stability_path(X, y, random_state=42,
-                                                   eps=0.05)
-
-    plt.figure()
-    # We plot the path as a function of alpha/alpha_max to the power 1/3: the
-    # power 1/3 scales the path less brutally than the log, and enables to
-    # see the progression along the path
-    hg = plt.plot(alpha_grid[1:] ** .333, scores_path[coef != 0].T[1:], 'r')
-    hb = plt.plot(alpha_grid[1:] ** .333, scores_path[coef == 0].T[1:], 'k')
-    ymin, ymax = plt.ylim()
-    plt.xlabel(r'$(\alpha / \alpha_{max})^{1/3}$')
-    plt.ylabel('Stability score: proportion of times selected')
-    plt.title('Stability Scores Path - Mutual incoherence: %.1f' % mi)
-    plt.axis('tight')
-    plt.legend((hg[0], hb[0]), ('relevant features', 'irrelevant features'),
-               loc='best')
-
-    ###########################################################################
-    # Plot the estimated stability scores for a given alpha
-
-    # Use 6-fold cross-validation rather than the default 3-fold: it leads to
-    # a better choice of alpha:
-    # Stop the user warnings outputs- they are not necessary for the example
-    # as it is specifically set up to be challenging.
-    with warnings.catch_warnings():
-        warnings.simplefilter('ignore', UserWarning)
-        warnings.simplefilter('ignore', ConvergenceWarning)
-        lars_cv = LassoLarsCV(cv=6).fit(X, y)
-
-    # Run the RandomizedLasso: we use a paths going down to .1*alpha_max
-    # to avoid exploring the regime in which very noisy variables enter
-    # the model
-    alphas = np.linspace(lars_cv.alphas_[0], .1 * lars_cv.alphas_[0], 6)
-    clf = RandomizedLasso(alpha=alphas, random_state=42).fit(X, y)
-    trees = ExtraTreesRegressor(100).fit(X, y)
-    # Compare with F-score
-    F, _ = f_regression(X, y)
-
-    plt.figure()
-    for name, score in [('F-test', F),
-                        ('Stability selection', clf.scores_),
-                        ('Lasso coefs', np.abs(lars_cv.coef_)),
-                        ('Trees', trees.feature_importances_),
-                        ]:
-        precision, recall, thresholds = precision_recall_curve(coef != 0,
-                                                               score)
-        plt.semilogy(np.maximum(score / np.max(score), 1e-4),
-                     label="%s. AUC: %.3f" % (name, auc(recall, precision)))
-
-    plt.plot(np.where(coef != 0)[0], [2e-4] * n_relevant_features, 'mo',
-             label="Ground truth")
-    plt.xlabel("Features")
-    plt.ylabel("Score")
-    # Plot only the 100 first coefficients
-    plt.xlim(0, 100)
-    plt.legend(loc='best')
-    plt.title('Feature selection scores - Mutual incoherence: %.1f'
-              % mi)
-
-plt.show()
diff --git a/sklearn/linear_model/__init__.py b/sklearn/linear_model/__init__.py
index 86aa17dea56b24dc6fb2ab1eafa352ae70703f1a..cd1c616f15bc4a97bcc205f7b03f561a255dccd2 100644
--- a/sklearn/linear_model/__init__.py
+++ b/sklearn/linear_model/__init__.py
@@ -30,8 +30,10 @@ from .omp import (orthogonal_mp, orthogonal_mp_gram, OrthogonalMatchingPursuit,
 from .passive_aggressive import PassiveAggressiveClassifier
 from .passive_aggressive import PassiveAggressiveRegressor
 from .perceptron import Perceptron
+
 from .randomized_l1 import (RandomizedLasso, RandomizedLogisticRegression,
                             lasso_stability_path)
+
 from .ransac import RANSACRegressor
 from .theil_sen import TheilSenRegressor
 
diff --git a/sklearn/linear_model/randomized_l1.py b/sklearn/linear_model/randomized_l1.py
index 27ec90aa49e6aa30ba397792dafd95dfe91ffd2c..28a861f024bcd8cea7b89302069b7197977f000e 100644
--- a/sklearn/linear_model/randomized_l1.py
+++ b/sklearn/linear_model/randomized_l1.py
@@ -6,9 +6,10 @@ sparse Logistic Regression
 # Author: Gael Varoquaux, Alexandre Gramfort
 #
 # License: BSD 3 clause
+
+import warnings
 import itertools
 from abc import ABCMeta, abstractmethod
-import warnings
 
 import numpy as np
 from scipy.sparse import issparse
@@ -20,7 +21,8 @@ from ..base import BaseEstimator
 from ..externals import six
 from ..externals.joblib import Memory, Parallel, delayed
 from ..feature_selection.base import SelectorMixin
-from ..utils import (as_float_array, check_random_state, check_X_y, safe_mask)
+from ..utils import (as_float_array, check_random_state, check_X_y, safe_mask,
+                     deprecated)
 from ..utils.validation import check_is_fitted
 from .least_angle import lars_path, LassoLarsIC
 from .logistic import LogisticRegression
@@ -58,6 +60,8 @@ def _resample_model(estimator_func, X, y, scaling=.5, n_resampling=200,
     return scores_
 
 
+@deprecated("The class BaseRandomizedLinearModel is deprecated in 0.19"
+            " and will be removed in 0.21.")
 class BaseRandomizedLinearModel(six.with_metaclass(ABCMeta, BaseEstimator,
                                                    SelectorMixin)):
     """Base class to implement randomized linear models for feature selection
@@ -178,6 +182,8 @@ def _randomized_lasso(X, y, weights, mask, alpha=1., verbose=False,
     return scores
 
 
+@deprecated("The class RandomizedLasso is deprecated in 0.19"
+            " and will be removed in 0.21.")
 class RandomizedLasso(BaseRandomizedLinearModel):
     """Randomized Lasso.
 
@@ -388,6 +394,8 @@ def _randomized_logistic(X, y, weights, mask, C=1., verbose=False,
     return scores
 
 
+@deprecated("The class RandomizedLogisticRegression is deprecated in 0.19"
+            " and will be removed in 0.21.")
 class RandomizedLogisticRegression(BaseRandomizedLinearModel):
     """Randomized Logistic Regression
 
@@ -573,6 +581,8 @@ def _lasso_stability_path(X, y, mask, weights, eps):
     return alphas, coefs
 
 
+@deprecated("The function lasso_stability_path is deprecated in 0.19"
+            " and will be removed in 0.21.")
 def lasso_stability_path(X, y, scaling=0.5, random_state=None,
                          n_resampling=200, n_grid=100,
                          sample_fraction=0.75,
diff --git a/sklearn/linear_model/tests/test_randomized_l1.py b/sklearn/linear_model/tests/test_randomized_l1.py
index 37eb66faab3393408295469122402f30ca889b9a..c783bfc7d4933fb7e8375ab3b304e64264c81bb5 100644
--- a/sklearn/linear_model/tests/test_randomized_l1.py
+++ b/sklearn/linear_model/tests/test_randomized_l1.py
@@ -11,10 +11,13 @@ from sklearn.utils.testing import assert_array_equal
 from sklearn.utils.testing import assert_raises
 from sklearn.utils.testing import assert_raises_regex
 from sklearn.utils.testing import assert_allclose
+from sklearn.utils.testing import ignore_warnings
+from sklearn.utils.testing import assert_warns_message
 
-from sklearn.linear_model.randomized_l1 import (lasso_stability_path,
+from sklearn.linear_model.randomized_l1 import(lasso_stability_path,
                                                 RandomizedLasso,
                                                 RandomizedLogisticRegression)
+
 from sklearn.datasets import load_diabetes, load_iris
 from sklearn.feature_selection import f_regression, f_classif
 from sklearn.preprocessing import StandardScaler
@@ -30,6 +33,7 @@ X = X[:, [2, 3, 6, 7, 8]]
 F, _ = f_regression(X, y)
 
 
+@ignore_warnings(category=DeprecationWarning)
 def test_lasso_stability_path():
     # Check lasso stability path
     # Load diabetes data and add noisy features
@@ -42,6 +46,7 @@ def test_lasso_stability_path():
                        np.argsort(np.sum(scores_path, axis=1))[-3:])
 
 
+@ignore_warnings(category=DeprecationWarning)
 def test_randomized_lasso_error_memory():
     scaling = 0.3
     selection_threshold = 0.5
@@ -55,6 +60,7 @@ def test_randomized_lasso_error_memory():
                         clf.fit, X, y)
 
 
+@ignore_warnings(category=DeprecationWarning)
 def test_randomized_lasso():
     # Check randomized lasso
     scaling = 0.3
@@ -124,6 +130,7 @@ def test_randomized_lasso_precompute():
         assert_array_equal(feature_scores_1, feature_scores_2)
 
 
+@ignore_warnings(category=DeprecationWarning)
 def test_randomized_logistic():
     # Check randomized sparse logistic regression
     iris = load_iris()
@@ -153,6 +160,7 @@ def test_randomized_logistic():
     assert_raises(ValueError, clf.fit, X, y)
 
 
+@ignore_warnings(category=DeprecationWarning)
 def test_randomized_logistic_sparse():
     # Check randomized sparse logistic regression on sparse data
     iris = load_iris()
@@ -179,3 +187,31 @@ def test_randomized_logistic_sparse():
                                        tol=1e-3)
     feature_scores_sp = clf.fit(X_sp, y).scores_
     assert_array_equal(feature_scores, feature_scores_sp)
+
+
+def test_warning_raised():
+
+    scaling = 0.3
+    selection_threshold = 0.5
+    tempdir = 5
+    assert_warns_message(DeprecationWarning, "The function"
+                         " lasso_stability_path is "
+                         "deprecated in 0.19 and will be removed in 0.21.",
+                         lasso_stability_path, X, y, scaling=scaling,
+                         random_state=42, n_resampling=30)
+
+    assert_warns_message(DeprecationWarning, "Class RandomizedLasso is"
+                         " deprecated; The class RandomizedLasso is "
+                         "deprecated in 0.19 and will be removed in 0.21.",
+                         RandomizedLasso, verbose=False, alpha=[1, 0.8],
+                         random_state=42, scaling=scaling,
+                         selection_threshold=selection_threshold,
+                         memory=tempdir)
+
+    assert_warns_message(DeprecationWarning, "The class"
+                         " RandomizedLogisticRegression is "
+                         "deprecated in 0.19 and will be removed in 0.21.",
+                         RandomizedLogisticRegression,
+                         verbose=False, C=1., random_state=42,
+                         scaling=scaling, n_resampling=50,
+                         tol=1e-3)