Skip to content
Snippets Groups Projects
Select Git revision
1 result Searching

scroll_indicator.cpp

Blame
  • settings.rst NaN GiB
    
    ===============================================================================
    Statistical learning: the setting and the estimator object in the scikit-learn
    ===============================================================================
    
    Datasets
    =========
    
    The `scikit-learn` deals with learning information from one or more
    datasets that are represented as 2D arrays. They can be understood as a
    list of multi-dimensional observations. We say that the first axis of
    these arrays is the **samples** axis, while the second is the
    **features** axis.
    
    .. topic:: A simple example shipped with the scikit: iris dataset
    
        ::
    
            >>> from scikits.learn import datasets
            >>> iris = datasets.load_iris()
            >>> data = iris.data
            >>> data.shape
            (150, 4)
    
        It is made of 150 observations of irises, each described by 4
        features: their sepal and petal length and width, as detailed in
        `iris.DESCR <https://raw.github.com/GaelVaroquaux/scikit-learn/
        stat_tutorial/sklearn/datasets/descr/iris.rst>`_.
    
    When the data is not intially in the `(n_samples, n_features)` shape, it
    needs to be preprocessed to be used by the scikit.
    
    .. topic:: An example of reshaping data: the digits dataset 
    
        .. image:: ../../auto_examples/tutorial/images/plot_digits_first_image_1.png
            :align: right
            :scale: 50
    
        The digits dataset is made of 1797 8x8 images of hand-written
        digits ::
    
            >>> digits = datasets.load_digits()
            >>> digits.images.shape
            (1797, 8, 8)
            >>> import pylab as pl
            >>> pl.imshow(digits.images[0], cmap=pl.cm.gray_r) #doctest: +ELLIPSIS
            <matplotlib.image.AxesImage object at ...>
    
        To use this dataset with the scikit, we transform each 8x8 image in a
        feature vector of length 64 ::
    
            >>> data = digits.images.reshape((digits.images.shape[0], -1))
    
    
    Estimators objects
    ===================
    
    .. Some code to make the doctests run
    
       >>> from scikits.learn.base import BaseEstimator
       >>> class Estimator(BaseEstimator):
       ...      def __init__(self, param1=0, param2=0):
       ...          self.param1 = param1
       ...          self.param2 = param2
       ...      def fit(self, data):
       ...          pass
       >>> estimator = Estimator()
    
    **Fitting data**: The core object of the `scikit-learn` is the
    `estimator` object. All estimator objects expose a `fit` method, that
    takes a dataset (2D array)::
    
        >>> estimator.fit(data)
    
    **Estimator parameters**: All the parameters of an estimator can be set
    when it is instanciated, or by modifying the corresponding attribute::
    
        >>> estimator = Estimator(param1=1, param2=2)
        >>> estimator.param1
        1
    
    **Estimated parameters**: When data is fitted with an estimator,
    parameters are estimated from the data at hand. All the estimated
    parameters are attributes of the estimator object ending by an
    underscore::
    
        >>> estimator.estimated_param_ #doctest: +SKIP