diff --git a/doc/modules/hmm.rst b/doc/modules/hmm.rst
index 23c08ee2f58c121fc53ff1d4c08f3a09113353f7..609f567b0b3a82cef9349e79675ef07cd995edb8 100644
--- a/doc/modules/hmm.rst
+++ b/doc/modules/hmm.rst
@@ -19,23 +19,23 @@ Hidden Markov Models
 `sklearn.hmm` implements the Hidden Markov Models (HMMs).
 The HMM is a generative probabilistic model, in which a sequence of observable
 :math:`\mathbf{X}` variable is generated by a sequence of internal hidden
-state :math:`\mathbf{Z}`. The hidden states can not be observed directly. 
+state :math:`\mathbf{Z}`. The hidden states can not be observed directly.
 The transitions between hidden states are assumed to have the form of a
 (first-order) Markov chain. They can be specified by the start probability
-vector :math:`\boldsymbol{\Pi}` and a transition probability matrix 
+vector :math:`\boldsymbol{\Pi}` and a transition probability matrix
 :math:`\mathbf{A}`.
 The emission probability of an observable can be any distribution with
 parameters :math:`\boldsymbol{{\Theta}_i}`
 conditioned on the current hidden state (e.g. multinomial, Gaussian).
-The HMM is completely determined by 
+The HMM is completely determined by
 :math:`\boldsymbol{\Pi, \mathbf{A}}` and :math:`\boldsymbol{{\Theta}_i}`.
 
 There are three fundamental problems for HMMs:
 
-* Given the model parameters and observed data, estimate the optimal 
+* Given the model parameters and observed data, estimate the optimal
   sequence of hidden states.
 
-* Given the model parameters and observed data, calculate the likelihood 
+* Given the model parameters and observed data, calculate the likelihood
   of the data.
 
 * Given just the observed data, estimate the model parameters.
@@ -58,7 +58,7 @@ See the ref listed below for further detailed information.
 Using HMM
 =========
 
-Classes in this module include :class:`MultinomialHMM`, :class:`GaussianHMM`, 
+Classes in this module include :class:`MultinomialHMM`, :class:`GaussianHMM`,
 and :class:`GMMHMM`. They implement HMM with emission probabilities
 determined by multimomial distributions, Gaussian distributions
 and mixtures of Gaussian distributions.
@@ -72,7 +72,7 @@ constructor. Then, you can generate samples from the HMM by calling `sample`.::
 
     >>> import numpy as np
     >>> from sklearn import hmm
-    
+
     >>> startprob = np.array([0.6, 0.3, 0.1])
     >>> transmat = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2], [0.3, 0.3, 0.4]])
     >>> means = np.array([[0.0, 0.0], [3.0, -3.0], [5.0, 10.0]])
@@ -83,23 +83,14 @@ constructor. Then, you can generate samples from the HMM by calling `sample`.::
     >>> X, Z = model.sample(100)
 
 
-.. figure:: ../auto_examples/images/plot_hmm_sampling_1.png
-   :target: ../auto_examples/plot_hmm_sampling.html
-   :align: center
-   :scale: 75%
-
-.. topic:: Examples:
-
- * :ref:`example_plot_hmm_sampling.py`
-
 Training HMM parameters and inferring the hidden states
 -------------------------------------------------------
 
-You can train an HMM by calling the `fit` method. The input is "the list" of 
+You can train an HMM by calling the `fit` method. The input is "the list" of
 the sequence of observed value. Note, since the EM algorithm is a gradient-based
 optimization method, it will generally get stuck in local optima. You should try
 to run `fit` with various initializations and select the highest scored model.
-The score of the model can be calculated by the `score` method. 
+The score of the model can be calculated by the `score` method.
 The inferred optimal hidden states can be obtained by calling `predict` method.
 The `predict` method can be specified with decoder algorithm.
 Currently the Viterbi algorithm (`viterbi`), and maximum a posteriori