diff --git a/doc/datasets/index.rst b/doc/datasets/index.rst
index 992b8eb9bfd61651b4697f04c3ce07373962ba50..f44ce2de9c991759daa7b590511d189bdff3ae18 100644
--- a/doc/datasets/index.rst
+++ b/doc/datasets/index.rst
@@ -26,6 +26,28 @@ This package also features helpers to fetch larger datasets commonly
 used by the machine learning community to benchmark algorithm on data
 that comes from the 'real world'.
 
+General dataset API
+===================
+There are three distinct kinds of dataset interfaces used at the moment.
+The simplest one is the interface for sample images, which is described
+below in the :ref: _Sample_images section.
+
+The dataset generation functions and the svmlight loader share a simplistic
+interface, returning a tuple ``(X, y)`` consisting of a n_samples x n_features
+numpy array X and an array of length n_samples containing the targets y.
+
+The toy datasets as well as the 'real world' datasets and the datasets
+fetched from mldata.org have more sophisticated structure.
+These functions return a ``bunch`` (which is a dictionary that is
+accessible with the 'dict.key' syntax).
+All datasets have at least two keys, ``data``, containg an array of shape
+``n_samples x n_features`` and ``target``, a numpy array of length ``n_features``,
+containing the targets.
+The datasets also contain a description in ``DESC`` and some contain
+``feature_names`` and ``target_names``.
+See the dataset descriptions below for details.
+
+
 Toy datasets
 ============