diff --git a/doc/datasets/index.rst b/doc/datasets/index.rst
index b128e097feb0e57c5e0206e6fe198f7c3be71f57..7588fb1d1a34cc91d99e33c8559d1d5ce159674d 100644
--- a/doc/datasets/index.rst
+++ b/doc/datasets/index.rst
@@ -125,6 +125,7 @@ can be used to build artifical datasets of controled size and complexity.
    make_friedman1
    make_friedman2
    make_friedman3
+   make_hastie_10_2
    make_low_rank_matrix
    make_sparse_coded_signal
    make_sparse_uncorrelated
@@ -171,7 +172,7 @@ features::
  _`Faster API-compatible implementation`: https://github.com/mblondel/svmlight-loader
 
 
-.. include:: olivetti_faces.inc 
+.. include:: olivetti_faces.inc
 
 .. include:: twenty_newsgroups.inc
 
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
index 8c67b580eebb74eb6bed57c2258f892f47950aea..b50fac9a4d39c31e69b255881cd3f1981ee1c763 100644
--- a/doc/modules/classes.rst
+++ b/doc/modules/classes.rst
@@ -27,7 +27,7 @@ uses.
 .. autosummary::
    :toctree: generated/
    :template: class.rst
-    
+
    cluster.AffinityPropagation
    cluster.DBSCAN
    cluster.KMeans
@@ -239,6 +239,8 @@ Samples generator
    ensemble.RandomForestRegressor
    ensemble.ExtraTreesClassifier
    ensemble.ExtraTreesRegressor
+   ensemble.GradientBoostingClassifier
+   ensemble.GradientBoostingRegressor
 
 .. autosummary::
    :toctree: generated/
diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
index 92a49dcc4b3710d72f29f9a693f7c51e1b9f7de9..f950cd37dac52c1da76cb10a0631b2034f2ea1f5 100644
--- a/doc/modules/ensemble.rst
+++ b/doc/modules/ensemble.rst
@@ -219,7 +219,7 @@ learners::
 The number of weak learners (i.e. regression trees) is controlled by the
 parameter ``n_estimators``; The maximum depth of each tree is controlled via
 ``max_depth``. ``Learn_rate`` is a hyper-parameter in the range (0.0, 1.0]
-that controls overfitting via shrinkage.
+that controls overfitting via :ref:`shrinkage <gradient_boosting_shrinkage>`.
 
 Regression
 ==========
@@ -243,6 +243,20 @@ outliers. See [F2001]_ for detailed information.
     >>> mean_squared_error(y_test, clf.predict(X_test))    # doctest: +ELLIPSIS
     6.90...
 
+The figure below shows the results of applying :class:`GradientBoostingRegressor`
+with least squares loss and 500 base learners to the boston house-price dataset
+(see :func:`sklearn.datasets.load_boston`).
+The plot on the left shows the train and test error at each iteration.
+Plots like these are often used for early stopping. The plot on the right
+shows the feature importances which can be optained via the ``feature_importance``
+property.
+
+.. figure:: ../auto_examples/ensemble/images/plot_gradient_boosting_regression_1.png
+   :target: ../auto_examples/ensemble/plot_gradient_boosting_regression.html
+   :align: center
+   :scale: 75
+
+
 Mathematical formulation
 ========================
 
@@ -327,10 +341,19 @@ the parameter ``loss``:
       log-likelihood loss function for binary classification (provides
       probability estimates).  The initial model is given by the
       probability of the positive class.
+    * Multinomial deviance (``'deviance'``): The negative multinomial
+      log-likelihood loss function for ``K``-class classification (provides
+      probability estimates).  The initial model is given by the
+      prior probability of each class. At each iteration ``K`` regression
+      trees have to be constructed.
+
+Regularization
+==============
 
+.. _gradient_boosting_shrinkage:
 
-Regularization via Shrinkage
-============================
+Shrinkage
+---------
 
 [F2001]_ proposed a simple regularization strategy that scales
 the contribution of each weak learner by a factor :math:`\nu`:
@@ -353,10 +376,32 @@ recommend to set the learning rate to a small constant
 stopping. For a more detailed discussion of the interaction between
 ``learn_rate`` and ``n_estimators`` see [R2007]_.
 
+Subsampling
+-----------
+
+[F1999]_ propsed stochastic gradient boosting, which combines gradient
+boosting with bootstrap averaging (bagging). At each iteration
+the base classifier is trained on a fraction ``subsample`` of
+the available training data.
+The subsample is drawn without replacement.
+A typical value of ``subsample`` is 0.5.
+
+The figure below illustrates the effect of shrinkage and subsampling
+on the goodness-of-fit of the model. We can clearly see that shrinkage
+outperforms no-shrinkage. Subsampling with shrinkage can further increase
+the accuracy of the model. Subsampling without shrinkage, on the other hand,
+does poorly.
+
+.. figure:: ../auto_examples/ensemble/images/plot_gradient_boosting_regularization_1.png
+   :target: ../auto_examples/ensemble/plot_gradient_boosting_regularization.html
+   :align: center
+   :scale: 75
+
 
 .. topic:: Examples:
 
  * :ref:`example_ensemble_plot_gradient_boosting_regression.py`
+ * :ref:`example_ensemble_plot_gradient_boosting_regularization.py`
 
 .. topic:: References