Select Git revision
kernel_ridge.py
-
Kathy Chen authored
* addressed comments in the PR about parameters in check_array * update the test case for the evaluation of estimators with pandas series * bug fix, need to check for *not* None explicitly * updated with isinstance check if the documentation says there is acceptance of floats * ran pep8 linter on modified files * moving the test case to estimators_check * add a predict function into the testing pandas.Series class * avoid running anything beyond the newly added meta checks * check if pandas is installed before running the specific test * changed the order of the try-catch to check for sample_weight param beforehand * pass on import error rather than printing something to std out * improve test case naming and pd.Series check in the bad estimator class * address a pep8 linter error with unused import * pep8 warning disabled for potential unused import * throw a warning when SkipTest is raised * add a SkipTestWarning * updated the whats_new.rst with this issue * rebase and fix a spacing issue
Kathy Chen authored* addressed comments in the PR about parameters in check_array * update the test case for the evaluation of estimators with pandas series * bug fix, need to check for *not* None explicitly * updated with isinstance check if the documentation says there is acceptance of floats * ran pep8 linter on modified files * moving the test case to estimators_check * add a predict function into the testing pandas.Series class * avoid running anything beyond the newly added meta checks * check if pandas is installed before running the specific test * changed the order of the try-catch to check for sample_weight param beforehand * pass on import error rather than printing something to std out * improve test case naming and pd.Series check in the bad estimator class * address a pep8 linter error with unused import * pep8 warning disabled for potential unused import * throw a warning when SkipTest is raised * add a SkipTestWarning * updated the whats_new.rst with this issue * rebase and fix a spacing issue
settings.rst NaN GiB
===============================================================================
Statistical learning: the setting and the estimator object in the scikit-learn
===============================================================================
Datasets
=========
The `scikit-learn` deals with learning information from one or more
datasets that are represented as 2D arrays. They can be understood as a
list of multi-dimensional observations. We say that the first axis of
these arrays is the **samples** axis, while the second is the
**features** axis.
.. topic:: A simple example shipped with the scikit: iris dataset
::
>>> from scikits.learn import datasets
>>> iris = datasets.load_iris()
>>> data = iris.data
>>> data.shape
(150, 4)
It is made of 150 observations of irises, each described by 4
features: their sepal and petal length and width, as detailed in
`iris.DESCR <https://raw.github.com/GaelVaroquaux/scikit-learn/
stat_tutorial/sklearn/datasets/descr/iris.rst>`_.
When the data is not intially in the `(n_samples, n_features)` shape, it
needs to be preprocessed to be used by the scikit.
.. topic:: An example of reshaping data: the digits dataset
.. image:: ../../auto_examples/tutorial/images/plot_digits_first_image_1.png
:align: right
:scale: 50
The digits dataset is made of 1797 8x8 images of hand-written
digits ::
>>> digits = datasets.load_digits()
>>> digits.images.shape
(1797, 8, 8)
>>> import pylab as pl
>>> pl.imshow(digits.images[0], cmap=pl.cm.gray_r) #doctest: +ELLIPSIS
<matplotlib.image.AxesImage object at ...>
To use this dataset with the scikit, we transform each 8x8 image in a
feature vector of length 64 ::
>>> data = digits.images.reshape((digits.images.shape[0], -1))
Estimators objects
===================
.. Some code to make the doctests run
>>> from scikits.learn.base import BaseEstimator
>>> class Estimator(BaseEstimator):
... def __init__(self, param1=0, param2=0):
... self.param1 = param1
... self.param2 = param2
... def fit(self, data):
... pass
>>> estimator = Estimator()
**Fitting data**: The core object of the `scikit-learn` is the
`estimator` object. All estimator objects expose a `fit` method, that
takes a dataset (2D array)::
>>> estimator.fit(data)
**Estimator parameters**: All the parameters of an estimator can be set
when it is instanciated, or by modifying the corresponding attribute::
>>> estimator = Estimator(param1=1, param2=2)
>>> estimator.param1
1
**Estimated parameters**: When data is fitted with an estimator,
parameters are estimated from the data at hand. All the estimated
parameters are attributes of the estimator object ending by an
underscore::
>>> estimator.estimated_param_ #doctest: +SKIP