DOC: Some work on the beginning of the doc.

git-svn-id: https://scikit-learn.svn.sourceforge.net/svnroot/scikit-learn/trunk@686 22fbfee3-77ab-4535-9bad-27d1bd3bc7d8

DOC: Some work on the beginning of the doc.
2d43c4d2 · Gael Varoquaux · ecb86729 · 2d43c4d2 · 2d43c4d2
Commit 2d43c4d2 authored 15 years ago by Gael Varoquaux
--- a/doc/install.rst
+++ b/doc/install.rst
@@ -4,8 +4,9 @@ Installing the `scikit.learn`
 Binary Packages
 ---------------

-There is a prebuild package for windows. See section downloads in the
-project's web page.
+There is a prebuild package for windows. See section `downloads
+<https://sourceforge.net/projects/scikit-learn/files/>`_
+in the project's web page.


 From Source

--- a/doc/tutorial.rst
+++ b/doc/tutorial.rst
-Tutorial
-========
+Getting started: an introduction to learning with the scikit
+=============================================================
+
+Machine learning: the problem setting
+---------------------------------------
+
+In general, a learning problem considers a set of n *samples* of data and
+try to predict properties of unknown data. If each sample is more than a
+single number, and for instance a multi-dimensional entry (aka
+*multivariate* data), is it said to have several attributes, or
+*features*.
+
+We can separate learning problems in a few large categories: 
+
+ * **supervised learning**, in which the data comes with additional
+   attributes that we want to predict. This problem can be either:
+   
+    * **classification**: samples belong to two or more classes and we
+      want to learn from already labeled data how to predict the class
+      of un-labeled data.
+
+    * **regression**: each sample is associated with a numerical
+      attribute, often called explanatory variable. The goal is to 
+      learn the relationship between the data and the explanatory
+      variable to be able to predict its value on new data.
+
+ * **unsupervised learning**, in which we are trying to learning a
+   synthetic representation of the data.

 Loading a sample dataset
 --------------------------

-The `scikit.learn` comes with a few standard datasets:
+The `scikit.learn` comes with a few standard datasets, for instance the
+`iris dataset <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_, or
+the `digits dataset
+<http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits>`_::

    >>> from scikits.learn import datasets
->>> iris = datasets.load('iris')
+    >>> iris = datasets.load_iris()
+    >>> digits = datasets.load_digits()

 A dataset is a dictionary-like object that holds all the samples and
 some metadata about the samples. You can access the underlying data
 with members `.data` and `.target`.

 For instance, in the case of the iris dataset, `iris.data` gives access
-to the features that can be used to classify the iris samples:
+to the features that can be used to classify the iris samples::

    >>> iris.data
    array([[ 5.1,  3.5,  1.4,  0.2],
@@ -35,6 +65,7 @@ array([ 0.,  0.,  0.,  0., ... 2.,  2.,  2.,  2.])

 Prediction
 ----------
+
 Suppose some given data points each belong to one of two classes, and
 the goal is to decide which class a new data point will be in. In
 ``scikits.learn`` this is done with an *estimator*. An *estimator* is