Skip to content
Snippets Groups Projects
Commit a7ada5b2 authored by Fabian Pedregosa's avatar Fabian Pedregosa
Browse files

Flat is better than nested.

Denest the neighbors module. For practical purposes the API will
remain the same.

From: Fabian Pedregosa <fabian.pedregosa@inria.fr>

git-svn-id: https://scikit-learn.svn.sourceforge.net/svnroot/scikit-learn/trunk@529 22fbfee3-77ab-4535-9bad-27d1bd3bc7d8
parent 3ee44033
No related branches found
No related tags found
No related merge requests found
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# The code and descriptive text is copyrighted and offered under the terms of
# the BSD License from the authors; see below. However, the actual dataset may
# have a different origin and intellectual property status. See the SOURCE and
# COPYRIGHT variables for this information.
# Copyright (c) 2007 David Cournapeau <cournape@gmail.com>
# 2010 Fabian Pedregosa <fabian.pedregosa@inria.fr>
#
Iris Plants Database
Creator: R.A. Fisher
Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
Date: July, 1988
This is a copy of UCI ML iris datasets.
References:
- Fisher,R.A. 'The use of multiple measurements in taxonomic problems'
Annual Eugenics, 7, Part II, 179-188 (1936); also in 'Contributions to
Mathematical Statistics' (John Wiley, NY, 1950).
- Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
- Dasarathy, B.V. (1980) 'Nosing Around the Neighborhood: A New System
Structure and Classification Rule for Recognition in Partially Exposed
Environments'. IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-2, No. 1, 67-71.
- Gates, G.W. (1972) 'The Reduced Nearest Neighbor Rule'. IEEE Transactions
on Information Theory, May 1972, 431-433.
- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II
conceptual clustering system finds 3 classes in the data.
- Many, many more
"""
DESCR = """
The famous Iris database, first used by Sir R.A Fisher
This is perhaps the best known database to be found in the
pattern recognition literature. Fisher's paper is a classic in the field and
is referenced frequently to this day. (See Duda & Hart, for example.) The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant. One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.
Number of Instances: 150 (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the class
Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
Summary Statistics:
Min Max Mean SD Class Correlation
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
Missing Attribute Values: None
Class Distribution: 33.3% for each of 3 classes.
"""
import numpy as np
from .base import Bunch
def load():
"""load the iris data and returns them.
Returns
-------
iris : Bunch
See docstring of bunch for a complete description of its members.
Example
-------
Let's say you are interested in the samples 10, 25, and 50, and want to
know their class name.
>>> data = load()
>>> print data.label #doctest: +ELLIPSIS
[ 0. 0. ...][ 0. 0. ...]
"""
File moved
from neighbors import Neighbors
import numpy as np
import matplotlib.pyplot as plt
from scikits.learn.neighbors import Neighbors
n = 100 # number of points
data1 = np.random.randn(n,2) + 3.0
data2 = np.random.randn(n, 2) + 5.0
data = np.concatenate((data1, data2))
labels = [0]*n + [1]*n
# we create the mesh
h = .1 # step size
x = np.arange(-2, 12, h)
y = np.arange(-2, 12, h)
X, Y = np.meshgrid(x, y)
neigh = Neighbors(data, labels=labels, k=3)
points= [(x_i, y_j) for x_i in x for y_j in y]
Z = neigh.predict(points)
Z = Z.reshape(np.shape(X))
ax = plt.subplot(111)
plt.pcolormesh(X, Y, Z.T)
# print the population points
plt.scatter(data1[:,0], data1[:,1], c='blue')
plt.scatter(data2[:,0], data2[:,1], c='red')
plt.show()
import numpy
from os.path import join
def configuration(parent_package='', top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('neighbors',parent_package,top_path)
config.add_extension('BallTree',
sources=[join('src', 'BallTree.cpp')],
include_dirs=[numpy.get_include()]
)
config.add_data_dir('tests')
config.add_data_dir('benchmarks')
return config
if __name__ == '__main__':
from numpy.distutils.core import setup
setup(**configuration(top_path='').todict())
......@@ -17,6 +17,13 @@ def configuration(parent_package='',top_path=None):
depends=[join('src', 'svm.h'),
join('src', 'libsvm_helper.c'),
])
config.add_extension('BallTree',
sources=[join('src', 'BallTree.cpp')],
include_dirs=[numpy.get_include()]
)
return config
config.add_subpackage('utils')
......
import numpy as np
from .. import neighbors
from scikits.learn import neighbors
from numpy.testing import assert_array_equal
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment