diff --git a/doc/datasets/index.rst b/doc/datasets/index.rst
index b3f329e943cdb79657065d751e1baadbc8dca337..c624fdb55f2e5e8321c41369a64c6dfae31c6014 100644
--- a/doc/datasets/index.rst
+++ b/doc/datasets/index.rst
@@ -254,6 +254,58 @@ features::
 
  _`Faster API-compatible implementation`: https://github.com/mblondel/svmlight-loader
 
+.. _external_datasets:
+
+Loading from external datasets
+==============================
+
+scikit-learn works on any numeric data stored as numpy arrays or scipy sparse
+matrices. Other types that are convertible to numeric arrays such as pandas
+DataFrame are also acceptable.
+ 
+Here are some recommended ways to load standard columnar data into a 
+format usable by scikit-learn: 
+
+* `pandas.io <http://pandas.pydata.org/pandas-docs/stable/io.html>`_ 
+  provides tools to read data from common formats including CSV, Excel, JSON
+  and SQL. DataFrames may also be constructed from lists of tuples or dicts.
+  Pandas handles heterogeneous data smoothly and provides tools for
+  manipulation and conversion into a numeric array suitable for scikit-learn.
+* `scipy.io <http://docs.scipy.org/doc/scipy/reference/io.html>`_ 
+  specializes in binary formats often used in scientific computing 
+  context such as .mat and .arff
+* `numpy/routines.io <http://docs.scipy.org/doc/numpy/reference/routines.io.html>`_
+  for standard loading of columnar data into numpy arrays
+* scikit-learn's :func:`datasets.load_svmlight_file` for the svmlight or libSVM
+  sparse format
+* scikit-learn's :func:`datasets.load_files` for directories of text files where
+  the name of each directory is the name of each category and each file inside
+  of each directory corresponds to one sample from that category
+
+For some miscellaneous data such as images, videos, and audio, you may wish to
+refer to:
+
+* `skimage.io <http://scikit-image.org/docs/dev/api/skimage.io.html>`_ or
+  `Imageio <http://imageio.readthedocs.io/en/latest/userapi.html>`_ 
+  for loading images and videos to numpy arrays
+* `scipy.misc.imread <http://docs.scipy.org/doc/scipy/reference/generated/scipy.
+  misc.imread.html#scipy.misc.imread>`_ (requires the `Pillow
+  <https://pypi.python.org/pypi/Pillow>`_ package) to load pixel intensities
+  data from various image file formats
+* `scipy.io.wavfile.read 
+  <http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html>`_ 
+  for reading WAV files into a numpy array
+
+Categorical (or nominal) features stored as strings (common in pandas DataFrames) 
+will need converting to integers, and integer categorical variables may be best 
+exploited when encoded as one-hot variables 
+(:class:`sklearn.preprocessing.OneHotEncoder`) or similar. 
+See :ref:`preprocessing`.
+
+Note: if you manage your own numerical data it is recommended to use an 
+optimized file format such as HDF5 to reduce data load times. Various libraries
+such as H5Py, PyTables and pandas provides a Python interface for reading and 
+writing data in that format.
 
 .. make sure everything is in a toc tree
 
diff --git a/doc/faq.rst b/doc/faq.rst
index 16101bc5c9ba7ed767350c7b4c9de56c11677828..7a4a2f2a8fd4a8be0dd1f6f6e039bde1d6b6e12a 100644
--- a/doc/faq.rst
+++ b/doc/faq.rst
@@ -75,31 +75,15 @@ input variables and a 1D array ``y`` for the target variables. The array ``X``
 holds the features as columns and samples as rows . The array ``y`` contains
 integer values to encode the class membership of each sample in ``X``.
 
-To load data as numpy arrays you can use different libraries depending on the
-original data format:
-
-* `numpy.loadtxt
-  <http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html>`_ to
-  load text files (such as CSV) assuming that all the columns have an
-  homogeneous data type (e.g. all numeric values).
-
-* `scipy.io <http://docs.scipy.org/doc/scipy/reference/io.html>`_ for common
-  binary formats often used in scientific computing context.
-
-* `scipy.misc.imread <http://docs.scipy.org/doc/scipy/reference/generated/scipy.
-  misc.imread.html#scipy.misc.imread>`_ (requires the `Pillow
-  <https://pypi.python.org/pypi/Pillow>`_ package) to load pixel intensities
-  data from various image file formats.
-
-* `pandas.io <http://pandas.pydata.org/pandas-docs/stable/io.html>`_ to load
-  heterogeneously typed data from various file formats and database protocols
-  that can slice and dice before conversion to numerical features in a numpy
-  array.
-
-Note: if you manage your own numerical data it is recommended to use an
-optimized file format such as HDF5 to reduce data load times. Various libraries
-such as H5Py, PyTables and pandas provides a Python interface for reading and
-writing data in that format.
+How can I load my own datasets into a format usable by scikit-learn?
+--------------------------------------------------------------------
+
+Generally, scikit-learn works on any numeric data stored as numpy arrays
+or scipy sparse matrices. Other types that are convertible to numeric 
+arrays such as pandas DataFrame are also acceptable.
+
+For more information on loading your data files into these usable data 
+structures, please refer to :ref:`loading external datasets <external_datasets>`.
 
 What are the inclusion criteria for new algorithms ?
 ----------------------------------------------------
diff --git a/doc/tutorial/basic/tutorial.rst b/doc/tutorial/basic/tutorial.rst
index 799f05c1148406fbcabca042589337a12515ee03..439343d30c4df8951d0f9bdc993cf803204aad1e 100644
--- a/doc/tutorial/basic/tutorial.rst
+++ b/doc/tutorial/basic/tutorial.rst
@@ -136,7 +136,10 @@ learn::
     <sphx_glr_auto_examples_classification_plot_digits_classification.py>` illustrates how starting
     from the original problem one can shape the data for consumption in
     scikit-learn.
+    
+.. topic:: Loading from external datasets
 
+    To load from an external dataset, please refer to :ref:`loading external datasets <external_datasets>`.
 
 Learning and predicting
 ------------------------