From e824c3f29c50681cb0738e6c228347b89d0835d5 Mon Sep 17 00:00:00 2001 From: Fabian Pedregosa <fabian.pedregosa@inria.fr> Date: Tue, 5 Jan 2010 13:38:10 +0000 Subject: [PATCH] Add a basic readme to learn.datasets package From: cdavid <cdavid@cb17146a-f446-4be1-a4f7-bd7c5bb65646> git-svn-id: https://scikit-learn.svn.sourceforge.net/svnroot/scikit-learn/trunk@12 22fbfee3-77ab-4535-9bad-27d1bd3bc7d8 --- scikits/learn/datasets/README.txt | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 scikits/learn/datasets/README.txt diff --git a/scikits/learn/datasets/README.txt b/scikits/learn/datasets/README.txt new file mode 100644 index 0000000000..4e8bdbafff --- /dev/null +++ b/scikits/learn/datasets/README.txt @@ -0,0 +1,28 @@ +Last Change: Tue Jul 17 04:00 PM 2007 J + +This packages datasets defines a set of packages which contain datasets useful +for demo, examples, etc... This can be seen as an equivalent of the R dataset +package, but for python. + +Each subdir is a python package, and should define the function load, returning +the corresponding data. For example, to access datasets data1, you should be able to do: + +>> from datasets.data1 import load +>> d = load() # -> d contains the data of the datasets data1 + +load can do whatever it wants: fetching data from a file (python script, csv +file, etc...), from the internet, etc... Some special variables must be defined +for each package, containing a python string: + - COPYRIGHT: copyright informations + - SOURCE: where the data are coming from + - DESCHOSRT: short description + - DESCLONG: long description + - NOTE: some notes on the datasets. + +For the datasets to be useful in the learn scikits, which is the project which initiated this datasets package, the data returned by load has to be a dict with the following conventions: + - 'data': this value should be a record array containing the actual data. + - 'label': this value should be a rank 1 array of integers, contains the + label index for each sample, that is label[i] should be the label index + of data[i]. + - 'class': a record array such as class[i] is the class name. In other + words, this makes the correspondance label index <> label name. -- GitLab