Skip to content
Snippets Groups Projects
Commit 2d43c4d2 authored by Gael Varoquaux's avatar Gael Varoquaux
Browse files

DOC: Some work on the beginning of the doc.


git-svn-id: https://scikit-learn.svn.sourceforge.net/svnroot/scikit-learn/trunk@686 22fbfee3-77ab-4535-9bad-27d1bd3bc7d8
parent ecb86729
No related branches found
No related tags found
No related merge requests found
......@@ -4,8 +4,9 @@ Installing the `scikit.learn`
Binary Packages
---------------
There is a prebuild package for windows. See section downloads in the
project's web page.
There is a prebuild package for windows. See section `downloads
<https://sourceforge.net/projects/scikit-learn/files/>`_
in the project's web page.
From Source
......
Tutorial
========
Getting started: an introduction to learning with the scikit
=============================================================
Machine learning: the problem setting
---------------------------------------
In general, a learning problem considers a set of n *samples* of data and
try to predict properties of unknown data. If each sample is more than a
single number, and for instance a multi-dimensional entry (aka
*multivariate* data), is it said to have several attributes, or
*features*.
We can separate learning problems in a few large categories:
* **supervised learning**, in which the data comes with additional
attributes that we want to predict. This problem can be either:
* **classification**: samples belong to two or more classes and we
want to learn from already labeled data how to predict the class
of un-labeled data.
* **regression**: each sample is associated with a numerical
attribute, often called explanatory variable. The goal is to
learn the relationship between the data and the explanatory
variable to be able to predict its value on new data.
* **unsupervised learning**, in which we are trying to learning a
synthetic representation of the data.
Loading a sample dataset
--------------------------
The `scikit.learn` comes with a few standard datasets:
The `scikit.learn` comes with a few standard datasets, for instance the
`iris dataset <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_, or
the `digits dataset
<http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits>`_::
>>> from scikits.learn import datasets
>>> iris = datasets.load('iris')
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
A dataset is a dictionary-like object that holds all the samples and
some metadata about the samples. You can access the underlying data
with members `.data` and `.target`.
For instance, in the case of the iris dataset, `iris.data` gives access
to the features that can be used to classify the iris samples:
to the features that can be used to classify the iris samples::
>>> iris.data
array([[ 5.1, 3.5, 1.4, 0.2],
......@@ -35,6 +65,7 @@ array([ 0., 0., 0., 0., ... 2., 2., 2., 2.])
Prediction
----------
Suppose some given data points each belong to one of two classes, and
the goal is to decide which class a new data point will be in. In
``scikits.learn`` this is done with an *estimator*. An *estimator* is
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment