diff --git a/examples/applications/plot_stock_market.py b/examples/applications/plot_stock_market.py
index 0622300dcb2e59e9227cabc2e7a670291deb1bc4..01e79a38f1e66061acfc93ed7f96e7b834b12937 100644
--- a/examples/applications/plot_stock_market.py
+++ b/examples/applications/plot_stock_market.py
@@ -1,7 +1,4 @@
 """
-
-.. _stock_market:
-
 =======================================
 Visualizing the stock market structure
 =======================================
@@ -12,6 +9,7 @@ the stock market structure from variations in historical quotes.
 The quantity that we use is the daily variation in quote price: quotes
 that are linked tend to cofluctuate during a day.
 
+.. _stock_market:
 
 Learning a graph structure
 --------------------------
diff --git a/examples/cluster/plot_segmentation_toy.py b/examples/cluster/plot_segmentation_toy.py
index e0d9e91d27e1e6db7004fbc530657e0ea51db0a6..87e552ac88fc16147fe56eb3fb8774b2f6c4792f 100644
--- a/examples/cluster/plot_segmentation_toy.py
+++ b/examples/cluster/plot_segmentation_toy.py
@@ -4,9 +4,9 @@ Spectral clustering for image segmentation
 ===========================================
 
 In this example, an image with connected circles is generated and
-:ref:`spectral_clustering` is used to separate the circles.
+spectral clustering is used to separate the circles.
 
-In these settings, the spectral clustering approach solves the problem
+In these settings, the :ref:`spectral_clustering` approach solves the problem
 know as 'normalized graph cuts': the image is seen as a graph of
 connected voxels, and the spectral clustering algorithm amounts to
 choosing graph cuts defining regions while minimizing the ratio of the
diff --git a/examples/cluster/plot_ward_structured_vs_unstructured.py b/examples/cluster/plot_ward_structured_vs_unstructured.py
index 7b6f719a60540791cc70eff5482c2f9f3a884706..2f47ec03af0e6dc03b67d01faf4a15844b9da3ae 100644
--- a/examples/cluster/plot_ward_structured_vs_unstructured.py
+++ b/examples/cluster/plot_ward_structured_vs_unstructured.py
@@ -4,9 +4,9 @@ Hierarchical clustering: structured vs unstructured ward
 ===========================================================
 
 Example builds a swiss roll dataset and runs
-:ref:`hierarchical_clustering` on their position.
+hierarchical clustering on their position.
 
-In a first step, the hierarchical clustering without connectivity
+In a first step, the :ref:`hierarchical_clustering` without connectivity
 constraints on structure, solely based on distance, whereas in a second
 step clustering restricted to the k-Nearest Neighbors graph: it's a
 hierarchical clustering with structure prior.
diff --git a/examples/covariance/plot_covariance_estimation.py b/examples/covariance/plot_covariance_estimation.py
index 26e495da5ea64fc84b905306efcdef12fa3dee46..5d8c5112b39641aba521d84a3122e326a0532af5 100644
--- a/examples/covariance/plot_covariance_estimation.py
+++ b/examples/covariance/plot_covariance_estimation.py
@@ -3,7 +3,8 @@
 Shrinkage covariance estimation: LedoitWolf vs OAS and max-likelihood
 =======================================================================
 
-The usual estimator for covariance is the maximum likelihood estimator,
+When working with covariance estimation, the usual approach is to use
+a maximum likelihood estimator, such as the
 :class:`sklearn.covariance.EmpiricalCovariance`. It is unbiased, i.e. it
 converges to the true (population) covariance when given many
 observations. However, it can also be beneficial to regularize it, in
diff --git a/examples/covariance/plot_mahalanobis_distances.py b/examples/covariance/plot_mahalanobis_distances.py
index 25b43ede418042f077289109c2fe9c7aed787132..99730e2d4f66e83f0343b4b1c7a2a4f7af19cc05 100644
--- a/examples/covariance/plot_mahalanobis_distances.py
+++ b/examples/covariance/plot_mahalanobis_distances.py
@@ -3,6 +3,9 @@
 Robust covariance estimation and Mahalanobis distances relevance
 ================================================================
 
+An example to show covariance estimation with the Mahalanobis
+distances on Gaussian distributed data.
+
 For Gaussian distributed data, the distance of an observation
 :math:`x_i` to the mode of the distribution can be computed using its
 Mahalanobis distance: :math:`d_{(\mu,\Sigma)}(x_i)^2 = (x_i -
diff --git a/examples/covariance/plot_outlier_detection.py b/examples/covariance/plot_outlier_detection.py
index d152d9c393e13444b0c291d9e1062e3fd5e63a01..9af56c93c7ba88e1fc18bcc901ee468b3de89549 100644
--- a/examples/covariance/plot_outlier_detection.py
+++ b/examples/covariance/plot_outlier_detection.py
@@ -3,8 +3,8 @@
 Outlier detection with several methods.
 ==========================================
 
-This example illustrates two ways of performing :ref:`outlier_detection`
-when the amount of contamination is known:
+When the amount of contamination is known, this example illustrates two
+different ways of performing :ref:`outlier_detection`:
 
 - based on a robust estimator of covariance, which is assuming that the
   data are Gaussian distributed and performs better than the One-Class SVM
diff --git a/examples/datasets/plot_digits_last_image.py b/examples/datasets/plot_digits_last_image.py
index ce4e41bff7a79dcfed60265871b2848dd02111b3..9b31c12b1e4022f84388a1581be1dcf57793a46d 100644
--- a/examples/datasets/plot_digits_last_image.py
+++ b/examples/datasets/plot_digits_last_image.py
@@ -5,6 +5,7 @@
 =========================================================
 The Digit Dataset
 =========================================================
+
 This dataset is made up of 1797 8x8 images. Each image,
 like the one shown below, is of a hand-written digit.
 In order to ultilise an 8x8 figure like this, we'd have to
diff --git a/examples/decomposition/plot_ica_blind_source_separation.py b/examples/decomposition/plot_ica_blind_source_separation.py
index dda2dd2d0ea60e3c3373d376a1a371b7308a42ef..0a8ae1569a62607b4b09e490240519e6ab1fb026 100644
--- a/examples/decomposition/plot_ica_blind_source_separation.py
+++ b/examples/decomposition/plot_ica_blind_source_separation.py
@@ -3,6 +3,8 @@
 Blind source separation using FastICA
 =====================================
 
+An example of estimating sources from noisy data.
+
 :ref:`ICA` is used to estimate sources given noisy measurements.
 Imagine 2 instruments playing simultaneously and 2 microphones
 recording the mixed signals. ICA is used to recover the sources
diff --git a/examples/decomposition/plot_ica_vs_pca.py b/examples/decomposition/plot_ica_vs_pca.py
index 78f4270493fc4ae0114432bb1d738d380db82bd5..86117adf1f36a5a9fc7bd1fab75a460c3e0d35d2 100644
--- a/examples/decomposition/plot_ica_vs_pca.py
+++ b/examples/decomposition/plot_ica_vs_pca.py
@@ -3,8 +3,10 @@
 FastICA on 2D point clouds
 ==========================
 
-Illustrate visually the results of :ref:`ICA` vs :ref:`PCA` in the
-feature space.
+This example illustrates visually in the feature space a comparison by
+results using two different component analysis techniques.
+
+:ref:`ICA` vs :ref:`PCA`.
 
 Representing ICA in the feature space gives the view of 'geometric ICA':
 ICA is an algorithm that finds directions in the feature space
diff --git a/examples/decomposition/plot_image_denoising.py b/examples/decomposition/plot_image_denoising.py
index 5e4caea7d3d819637b605360ece20f9a8d82bc1c..38a2a408b90a6c4dd2c814865117321548b53f3b 100644
--- a/examples/decomposition/plot_image_denoising.py
+++ b/examples/decomposition/plot_image_denoising.py
@@ -4,7 +4,8 @@ Image denoising using dictionary learning
 =========================================
 
 An example comparing the effect of reconstructing noisy fragments
-of Lena using online :ref:`DictionaryLearning` and various transform methods.
+of the Lena image using firstly online :ref:`DictionaryLearning` and
+various transform methods.
 
 The dictionary is fitted on the distorted left half of the image, and
 subsequently used to reconstruct the right half. Note that even better
diff --git a/examples/exercises/plot_cv_diabetes.py b/examples/exercises/plot_cv_diabetes.py
index 527d0443253d62e203a071f35d4c84ffc8d4b267..8fe4b1d9a7659b6d28399cfaa55848a0d5358b9f 100644
--- a/examples/exercises/plot_cv_diabetes.py
+++ b/examples/exercises/plot_cv_diabetes.py
@@ -3,6 +3,8 @@
 Cross-validation on diabetes Dataset Exercise
 ===============================================
 
+A tutorial excercise which uses cross-validation with linear models.
+
 This exercise is used in the :ref:`cv_estimators_tut` part of the
 :ref:`model_selection_tut` section of the :ref:`stat_learn_tut_index`.
 """
diff --git a/examples/exercises/plot_cv_digits.py b/examples/exercises/plot_cv_digits.py
index dde717902d6b1a302d6aba018f1fae98657a6022..6861a3354a2b614d6482293ddda55f073c9c6747 100644
--- a/examples/exercises/plot_cv_digits.py
+++ b/examples/exercises/plot_cv_digits.py
@@ -3,6 +3,8 @@
 Cross-validation on Digits Dataset Exercise
 =============================================
 
+A tutorial excercise using Cross-validation with an SVM on the Digits dataset.
+
 This exercise is used in the :ref:`cv_generators_tut` part of the
 :ref:`model_selection_tut` section of the :ref:`stat_learn_tut_index`.
 """
diff --git a/examples/exercises/plot_digits_classification_exercise.py b/examples/exercises/plot_digits_classification_exercise.py
index 17e11e55b9d8a0fb11eee6741636e158c4a3618e..a1f0b84fd1fd21fa0100ceefe917911b806b4efd 100644
--- a/examples/exercises/plot_digits_classification_exercise.py
+++ b/examples/exercises/plot_digits_classification_exercise.py
@@ -3,6 +3,9 @@
 Digits Classification Exercise
 ================================
 
+A tutorial exercise regarding the use of classification techniques on
+the Digits dataset.
+
 This exercise is used in the :ref:`clf_tut` part of the
 :ref:`supervised_learning_tut` section of the
 :ref:`stat_learn_tut_index`.
diff --git a/examples/exercises/plot_iris_exercise.py b/examples/exercises/plot_iris_exercise.py
index 2427180e9dcd2d61f6e26641e2f5b2b3efcfb693..4226fc2f3337abd734fbe3d7c9631ad7b587e63d 100644
--- a/examples/exercises/plot_iris_exercise.py
+++ b/examples/exercises/plot_iris_exercise.py
@@ -3,6 +3,8 @@
 SVM Exercise
 ================================
 
+A tutorial exercise for using different SVM kernels.
+
 This exercise is used in the :ref:`using_kernels_tut` part of the
 :ref:`supervised_learning_tut` section of the :ref:`stat_learn_tut_index`.
 """
diff --git a/examples/grid_search_digits.py b/examples/grid_search_digits.py
index a4914609f9225dcab415f558bf8e6bf6a77e4533..fc7d441a5020b9a1d1aa9861c4ec5191ea24fd40 100644
--- a/examples/grid_search_digits.py
+++ b/examples/grid_search_digits.py
@@ -3,7 +3,8 @@
 Parameter estimation using grid search with a nested cross-validation
 =====================================================================
 
-The classifier is optimized by "nested" cross-validation using the
+This examples shows how a classifier is optimized by "nested"
+cross-validation, which is done using the
 :class:`sklearn.grid_search.GridSearchCV` object on a development set
 that comprises only half of the available labeled data.
 
diff --git a/examples/linear_model/plot_ard.py b/examples/linear_model/plot_ard.py
index dd62b70e7f74164eada81e118fe1b1e22caf6d7f..53487556c7e29045d99c2991d28642dddde1aa78 100644
--- a/examples/linear_model/plot_ard.py
+++ b/examples/linear_model/plot_ard.py
@@ -3,7 +3,9 @@
 Automatic Relevance Determination Regression (ARD)
 ==================================================
 
-Fit regression model with :ref:`bayesian_ridge_regression`.
+Fit regression model with Bayesian Ridge Regression.
+
+See :ref:`bayesian_ridge_regression` for more information on the regressor.
 
 Compared to the OLS (ordinary least squares) estimator, the coefficient
 weights are slightly shifted toward zeros, which stabilises them.
diff --git a/examples/linear_model/plot_bayesian_ridge.py b/examples/linear_model/plot_bayesian_ridge.py
index 59a0269654fbbaa076691840f5d74917a471bc89..eedd5a7b24a9e4ae6639efe570e65bb25f658e8c 100644
--- a/examples/linear_model/plot_bayesian_ridge.py
+++ b/examples/linear_model/plot_bayesian_ridge.py
@@ -3,7 +3,9 @@
 Bayesian Ridge Regression
 =========================
 
-Computes a :ref:`bayesian_ridge_regression` on a synthetic dataset.
+Computes a Bayesian Ridge Regression on a synthetic dataset.
+
+See :ref:`bayesian_ridge_regression` for more information on the regressor.
 
 Compared to the OLS (ordinary least squares) estimator, the coefficient
 weights are slightly shifted toward zeros, which stabilises them.
diff --git a/examples/linear_model/plot_ridge_path.py b/examples/linear_model/plot_ridge_path.py
index 28840c6961813d7bb0fe3095807a39cf1d7bcccd..0ad04fd01f053586fc64681228af1b44e1f656b3 100644
--- a/examples/linear_model/plot_ridge_path.py
+++ b/examples/linear_model/plot_ridge_path.py
@@ -3,10 +3,12 @@
 Plot Ridge coefficients as a function of the regularization
 ===========================================================
 
+Shows the effect of collinearity in the coefficients of an estimator.
+
 .. currentmodule:: sklearn.linear_model
 
-Shows the effect of collinearity in the coefficients or the
-:class:`Ridge`. Each color represents a different feature of the
+:class:`Ridge` Regression is the estimator used in this example.
+Each color represents a different feature of the
 coefficient vector, and this is displayed as a function of the
 regularization parameter.
 
diff --git a/examples/linear_model/plot_sgd_loss_functions.py b/examples/linear_model/plot_sgd_loss_functions.py
index 8980311d6cfbccaae7aa0186dd68ccb409fa2343..14adec0144d9afb80f07182d3583fd9bce3bcee5 100644
--- a/examples/linear_model/plot_sgd_loss_functions.py
+++ b/examples/linear_model/plot_sgd_loss_functions.py
@@ -3,8 +3,11 @@
 SGD: Convex Loss Functions
 ==========================
 
-Plot the convex loss functions supported by
-`sklearn.linear_model.stochastic_gradient`.
+An example that compares various convex loss functions.
+
+
+All of the above loss functions are supported by
+:class:`sklearn.linear_model.stochastic_gradient` .
 """
 print(__doc__)
 
diff --git a/examples/linear_model/plot_sgd_penalties.py b/examples/linear_model/plot_sgd_penalties.py
index 7a626e63f329b8dc6d6f6c14344bb4b414dbf2c7..c804d9b4b3c818c8e72f1b1757a5400d423f552d 100644
--- a/examples/linear_model/plot_sgd_penalties.py
+++ b/examples/linear_model/plot_sgd_penalties.py
@@ -3,8 +3,10 @@
 SGD: Penalties
 ==============
 
-Plot the contours of the three penalties supported by
-`sklearn.linear_model.stochastic_gradient`.
+Plot the contours of the three penalties.
+
+All of the above are supported by
+:class:`sklearn.linear_model.stochastic_gradient`.
 
 """
 from __future__ import division
diff --git a/examples/mixture/plot_gmm_classifier.py b/examples/mixture/plot_gmm_classifier.py
index e273aa703251dfbf6e79f8f73b1ef3d88402e0fa..682f0671a157757ef9417f1b06d45886dd3614cd 100644
--- a/examples/mixture/plot_gmm_classifier.py
+++ b/examples/mixture/plot_gmm_classifier.py
@@ -3,7 +3,9 @@
 GMM classification
 ==================
 
-Demonstration of :ref:`gmm` for classification.
+Demonstration of Gaussian mixture models for classification.
+
+See :ref:`gmm` for more information on the estimator.
 
 Plots predicted labels on both training and held out test data using a
 variety of GMM classifiers on the iris dataset.
diff --git a/examples/plot_kernel_approximation.py b/examples/plot_kernel_approximation.py
index c7119d701256e9758e1d34d9c3e2cfa0c8d74748..6ffe46aa119e825ea23f900134b92e7e04abcb32 100644
--- a/examples/plot_kernel_approximation.py
+++ b/examples/plot_kernel_approximation.py
@@ -3,9 +3,12 @@
 Explicit feature map approximation for RBF kernels
 ==================================================
 
+An example illustrating the approximation of the feature map
+of an RBF kernel.
+
 .. currentmodule:: sklearn.kernel_approximation
 
-An example shows how to use :class:`RBFSampler` and :class:`Nystrom` to
+It shows how to use :class:`RBFSampler` and :class:`Nystrom` to
 appoximate the feature map of an RBF kernel for classification with an SVM on
 the digits dataset. Results using a linear SVM in the original space, a linear
 SVM using the approximate mappings and using a kernelized SVM are compared.
diff --git a/examples/plot_pls.py b/examples/plot_pls.py
index 7187ab6fcc47477567701a4aada40113efa3ab68..5f3c4ba198f61e178795eaa70c2c1c78126f63cd 100644
--- a/examples/plot_pls.py
+++ b/examples/plot_pls.py
@@ -3,11 +3,12 @@
 PLS Partial Least Squares
 =========================
 
-Simple usage of various PLS flavor:
-- PLSCanonical
-- PLSRegression, with multivariate response, a.k.a. PLS2
-- PLSRegression, with univariate response, a.k.a. PLS1
-- CCA
+Simple usage of various PLS flavors:
+
+* PLSCanonical
+* PLSRegression, with multivariate response, a.k.a. PLS2
+* PLSRegression, with univariate response, a.k.a. PLS1
+* CCA
 
 Given 2 multivariate covarying two-dimensional datasets, X, and Y,
 PLS extracts the 'directions of covariance', i.e. the components of each
diff --git a/examples/svm/plot_oneclass.py b/examples/svm/plot_oneclass.py
index d7f56ba97fe2c753ffd470b3103c06bd0b989f43..7351ee26f037b87809a8c35a60e156dade206eee 100644
--- a/examples/svm/plot_oneclass.py
+++ b/examples/svm/plot_oneclass.py
@@ -3,6 +3,8 @@
 One-class SVM with non-linear kernel (RBF)
 ==========================================
 
+An example using a one-class SVM for novelty detection.
+
 :ref:`One-class SVM <svm_outlier_detection>` is an unsupervised
 algorithm that learns a decision function for novelty detection:
 classifying new data as similar or different to the training set.
diff --git a/examples/tree/plot_iris.py b/examples/tree/plot_iris.py
index c30618e6fbefb362dbdcd035b7b5a17d9d772dae..c4b046d87f0e7507e03e0131e150acb10480eec4 100644
--- a/examples/tree/plot_iris.py
+++ b/examples/tree/plot_iris.py
@@ -3,9 +3,11 @@
 Plot the decision surface of a decision tree on the iris dataset
 ================================================================
 
-Plot the decision surface of a :ref:`decision tree <tree>` trained on pairs
+Plot the decision surface of a decision tree trained on pairs
 of features of the iris dataset.
 
+See :ref:`decision tree <tree>` for more information on the estimator.
+
 For each pair of iris features, the decision tree learns decision
 boundaries made of combinations of simple thresholding rules inferred from
 the training samples.
diff --git a/examples/tree/plot_tree_regression.py b/examples/tree/plot_tree_regression.py
index 349db2553523cba7632ac95f59cdd6764506f3a8..8f693975367d7b5248392717cfece821b4ab4691 100644
--- a/examples/tree/plot_tree_regression.py
+++ b/examples/tree/plot_tree_regression.py
@@ -3,7 +3,9 @@
 Decision Tree Regression
 ===================================================================
 
-1D regression with :ref:`decision trees <tree>`: the decision tree is
+A 1D regression with decision tree.
+
+The :ref:`decision trees <tree>` is
 used to fit a sine curve with addition noisy observation. As a result, it
 learns local linear regressions approximating the sine curve.
 
diff --git a/examples/tree/plot_tree_regression_multioutput.py b/examples/tree/plot_tree_regression_multioutput.py
index 0fc63d3f2b15184effb776984381b46bad6c0ddd..d4d332f5a6349abb02d3200dca7b4d546f05490e 100644
--- a/examples/tree/plot_tree_regression_multioutput.py
+++ b/examples/tree/plot_tree_regression_multioutput.py
@@ -3,7 +3,9 @@
 Multi-output Decision Tree Regression
 ===================================================================
 
-Multi-output regression with :ref:`decision trees <tree>`: the decision tree
+An example to illustrate multi-output regression with decision tree.
+
+The :ref:`decision trees <tree>`
 is used to predict simultaneously the noisy x and y observations of a circle
 given a single underlying feature. As a result, it learns local linear
 regressions approximating the circle.