diff --git a/doc/modules/mixture.rst b/doc/modules/mixture.rst
index 8bffc0be95808442b05d785cf69478e90061324f..e13d17407a9600b9707134f37648d13f56c03b4b 100644
--- a/doc/modules/mixture.rst
+++ b/doc/modules/mixture.rst
@@ -121,7 +121,9 @@ The main difficulty in learning Gaussian mixture models from unlabeled
 data is that it is one usually doesn't know which points came from
 which latent component (if one has access to this information it gets
 very easy to fit a separate Gaussian distribution to each set of
-points). Expectation-maximization is a well-fundamented statistical
+points). `Expectation-maximization
+<http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm>`_
+is a well-fundamented statistical
 algorithm to get around this problem by an iterative process. First
 one assumes random components (randomly centered on data points,
 learned from k-means, or even just normally distributed around the
@@ -129,8 +131,7 @@ origin) and computes for each point a probability of being generated by
 each component of the model. Then, one tweaks the
 parameters to maximize the likelihood of the data given those
 assignments. Repeating this process is guaranteed to always converge
-to a local optimum. In the `scikit-learn` this algorithm in
-implemented in the :class:`GMM` class.
+to a local optimum. 
 
 
 VBGMM classifier: variational Gaussian mixtures
@@ -209,25 +210,34 @@ components, and at the expense of extra computational time the user
 only needs to specify a loose upper bound on this number and a
 concentration parameter.
 
-.. figure:: ../auto_examples/mixture/images/plot_gmm_1.png
+.. |plot_gmm| image:: ../auto_examples/mixture/images/plot_gmm_1.png
    :target: ../auto_examples/mixture/plot_gmm.html
-   :align: center
-   :scale: 70%
+   :scale: 48%
+
+.. |plot_gmm_sin| image:: ../auto_examples/mixture/images/plot_gmm_sin_1.png
+   :target: ../auto_examples/mixture/plot_gmm_sin.html
+   :scale: 48%
+
+.. centered:: |plot_gmm| |plot_gmm_sin|
+
 
-The example above compares a Gaussian mixture model fitted with 5
-components on a dataset, to a DPGMM model. We can see that the DPGMM is
-able to limit itself to only 2 components. With very little observations,
-the DPGMM can take a conservative stand, and fit only one component.
+The examples above compare Gaussian mixture models with fixed number of
+components, to DPGMM models. **On the left** the GMM is fitted with 5
+components on a dataset composed of 2 clusters. We can see that the DPGMM is
+able to limit itself to only 2 components whereas the GMM fits the data fit too
+many components. Note that with very little observations, the DPGMM can take a
+conservative stand, and fit only one component. **On the right** we are fitting
+a dataset not well-depicted by a mixture of Gaussian. Adjusting the `alpha`
+parameter of the DPGMM controls the number of components used to fit this
+data.
 
 .. topic:: Examples:
 
     * See :ref:`example_mixture_plot_gmm.py` for an example on plotting the
       confidence ellipsoids for both :class:`GMM` and :class:`DPGMM`.
 
-.. topic:: Derivation:
-
-   * See `here <dp-derivation.html>`_ the full derivation of this
-     algorithm.
+    * :ref:`example_mixture_plot_gmm_sin.py` shows using :class:`GMM` and
+      :class:`DPGMM` to fit a sine wave
 
 Pros and cons of class :class:`DPGMM`: Diriclet process mixture model
 ----------------------------------------------------------------------
@@ -266,25 +276,21 @@ The Dirichlet Process
 ---------------------
 
 Here we describe variational inference algorithms on Dirichlet process
-mixtures.
-
-One of the main advantages of variational techniques is that they can
-incorporate prior information to the model in many different ways. The
-Dirichlet process is a prior probability distribution on *clusterings
-with an infinite, unbounded, number of partitions*. Variational
-techniques let us incorporate this prior structure on Gaussian mixture
-models at almost no penalty in inference time, comparing with a finite
-Gaussian mixture model.
+mixtures. The Dirichlet process is a prior probability distribution on
+*clusterings with an infinite, unbounded, number of partitions*.
+Variational techniques let us incorporate this prior structure on
+Gaussian mixture models at almost no penalty in inference time, comparing
+with a finite Gaussian mixture model.
 
 An important question is how can the Dirichlet process use an
 infinite, unbounded number of clusters and still be consistent. While
 a full explanation doesn't fit this manual, one can think of its
-`Chinese restaurant process
-<http://en.wikipedia.org/wiki/Chinese_restaurant_process>`_ analogy
-to help understanding it. The Chinese restaurant process is a
-generative construction for the Dirichlet process (see for a detailed
-introduction).  Imagine a Chinese restaurant with an infinite number
-of tables, at first all empty. When the first customer of the day
+`chinese restaurant process
+<http://en.wikipedia.org/wiki/Chinese_restaurant_process>`_ 
+analogy to help understanding it. The
+chinese restaurant process is a generative story for the Dirichlet
+process. Imagine a chinese restaurant with an infinite number of
+tables, at first all empty. When the first customer of the day
 arrives, he sits at the first table. Every following customer will
 then either sit on an occupied table with probability proportional to
 the number of customers in that table or sit in an entirely new table
@@ -307,6 +313,11 @@ on the number of mixture components (this upper bound, assuming it is
 higher than the "true" number of components, affects only algorithmic
 complexity, not the actual number of components used).
 
+.. topic:: Derivation:
+
+   * See `here <dp-derivation.html>`_ the full derivation of this
+     algorithm.
+
 .. toctree::
     :hidden:
 
diff --git a/examples/mixture/plot_gmm_sin.py b/examples/mixture/plot_gmm_sin.py
index 27462eda0788a4b65f18f96f2d6f2459f97b68d8..251f224ae7b727375d2516c13071ae81e0e3ad86 100644
--- a/examples/mixture/plot_gmm_sin.py
+++ b/examples/mixture/plot_gmm_sin.py
@@ -74,5 +74,7 @@ for i, (clf, title) in enumerate([
     pl.xlim(-6, 4 * np.pi - 6)
     pl.ylim(-5, 5)
     pl.title(title)
+    pl.xticks(())
+    pl.yticks(())
 
 pl.show()