Skip to content
Snippets Groups Projects
Select Git revision
  • 448dcd2053286ff9c5164611a976a720d2372f28
  • master default protected
  • 0.19.X
  • discrete
  • 0.18.X
  • ignore_lambda_to_diff_errors
  • 0.17.X
  • authors-update
  • 0.16.X
  • 0.15.X
  • 0.14.X
  • debian
  • 0.13.X
  • 0.12.X
  • 0.11.X
  • 0.10.X
  • 0.9.X
  • 0.6.X
  • 0.7.X
  • 0.8.X
  • 0.19.1
  • 0.19.0
  • 0.19b2
  • 0.19b1
  • 0.19-branching
  • 0.18.2
  • 0.18.1
  • 0.18
  • 0.18rc2
  • 0.18rc1
  • 0.18rc
  • 0.17.1-1
  • 0.17.1
  • debian/0.17.0-4
  • debian/0.17.0-3
  • debian/0.17.0-1
  • 0.17
  • debian/0.17.0_b1+git14-g4e6829c-1
  • debian/0.17.0_b1-1
  • 0.17b1
40 results

settings.rst

Blame
  • settings.rst 2.84 KiB

    Statistical learning: the setting and the estimator object in the scikit-learn

    Datasets

    The scikit-learn deals with learning information from one or more datasets that are represented as 2D arrays. They can be understood as a list of multi-dimensional observations. We say that the first axis of these arrays is the samples axis, while the second is the features axis.

    A simple example shipped with the scikit: iris dataset

    >>> from scikits.learn import datasets
    >>> iris = datasets.load_iris()
    >>> data = iris.data
    >>> data.shape
    (150, 4)

    It is made of 150 observations of irises, each described by 4 features: their sepal and petal length and width, as detailed in `iris.DESCR.

    When the data is not intially in the (n_samples, n_features) shape, it needs to be preprocessed to be used by the scikit.

    An example of reshaping data: the digits dataset

    ../../auto_examples/tutorial/images/plot_digits_first_image_1.png

    The digits dataset is made of 1797 8x8 images of hand-written digits

    >>> digits = datasets.load_digits()
    >>> digits.images.shape
    (1797, 8, 8)
    >>> import pylab as pl
    >>> pl.imshow(digits.images[0], cmap=pl.cm.gray_r) #doctest: +ELLIPSIS
    <matplotlib.image.AxesImage object at ...>

    To use this dataset with the scikit, we transform each 8x8 image in a feature vector of length 64

    >>> data = digits.images.reshape((digits.images.shape[0], -1))

    Estimators objects

    Fitting data: The core object of the scikit-learn is the estimator object. All estimator objects expose a fit method, that takes a dataset (2D array):

    >>> estimator.fit(data)

    Estimator parameters: All the parameters of an estimator can be set when it is instanciated, or by modifying the corresponding attribute:

    >>> estimator = Estimator(param1=1, param2=2)
    >>> estimator.param1
    1

    Estimated parameters: When data is fitted with an estimator, parameters are estimated from the data at hand. All the estimated parameters are attributes of the estimator object ending by an underscore:

    >>> estimator.estimated_param_ #doctest: +SKIP