From e824c3f29c50681cb0738e6c228347b89d0835d5 Mon Sep 17 00:00:00 2001
From: Fabian Pedregosa <fabian.pedregosa@inria.fr>
Date: Tue, 5 Jan 2010 13:38:10 +0000
Subject: [PATCH] Add a basic readme to learn.datasets package

From: cdavid <cdavid@cb17146a-f446-4be1-a4f7-bd7c5bb65646>

git-svn-id: https://scikit-learn.svn.sourceforge.net/svnroot/scikit-learn/trunk@12 22fbfee3-77ab-4535-9bad-27d1bd3bc7d8
---
 scikits/learn/datasets/README.txt | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 scikits/learn/datasets/README.txt

diff --git a/scikits/learn/datasets/README.txt b/scikits/learn/datasets/README.txt
new file mode 100644
index 0000000000..4e8bdbafff
--- /dev/null
+++ b/scikits/learn/datasets/README.txt
@@ -0,0 +1,28 @@
+Last Change: Tue Jul 17 04:00 PM 2007 J
+
+This packages datasets defines a set of packages which contain datasets useful
+for demo, examples, etc... This can be seen as an equivalent of the R dataset
+package, but for python.
+
+Each subdir is a python package, and should define the function load, returning
+the corresponding data. For example, to access datasets data1, you should be able to do:
+
+>> from datasets.data1 import load
+>> d = load() # -> d contains the data of the datasets data1
+
+load can do whatever it wants: fetching data from a file (python script, csv
+file, etc...), from the internet, etc... Some special variables must be defined
+for each package, containing a python string:
+    - COPYRIGHT: copyright informations
+    - SOURCE: where the data are coming from
+    - DESCHOSRT: short description
+    - DESCLONG: long description
+    - NOTE: some notes on the datasets.
+
+For the datasets to be useful in the learn scikits, which is the project which initiated this datasets package, the data returned by load has to be a dict with the following conventions:
+    - 'data': this value should be a record array containing the actual data.
+    - 'label': this value should be a rank 1 array of integers, contains the
+      label index for each sample, that is label[i] should be the label index
+      of data[i].
+    - 'class': a record array such as class[i] is the class name. In other
+      words, this makes the correspondance label index <> label name.
-- 
GitLab