Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
  • 0.19.X
  • discrete
  • 0.18.X
  • ignore_lambda_to_diff_errors
  • 0.17.X
  • authors-update
  • 0.16.X
  • 0.15.X
  • 0.14.X
  • debian
  • 0.13.X
  • 0.12.X
  • 0.11.X
  • 0.10.X
  • 0.9.X
  • 0.6.X
  • 0.7.X
  • 0.8.X
  • 0.19.1
  • 0.19.0
  • 0.19b2
  • 0.19b1
  • 0.19-branching
  • 0.18.2
  • 0.18.1
  • 0.18
  • 0.18rc2
  • 0.18rc1
  • 0.18rc
  • 0.17.1-1
  • 0.17.1
  • debian/0.17.0-4
  • debian/0.17.0-3
  • debian/0.17.0-1
  • 0.17
  • debian/0.17.0_b1+git14-g4e6829c-1
  • debian/0.17.0_b1-1
  • 0.17b1
39 results

developers

  • Clone with SSH
  • Clone with HTTPS
  • Last Change: Tue Jul 17 04:00 PM 2007 J
    
    This packages datasets defines a set of packages which contain datasets useful
    for demo, examples, etc... This can be seen as an equivalent of the R dataset
    package, but for python.
    
    Each subdir is a python package, and should define the function load, returning
    the corresponding data. For example, to access datasets data1, you should be able to do:
    
    >> from datasets.data1 import load
    >> d = load() # -> d contains the data of the datasets data1
    
    load can do whatever it wants: fetching data from a file (python script, csv
    file, etc...), from the internet, etc... Some special variables must be defined
    for each package, containing a python string:
        - COPYRIGHT: copyright informations
        - SOURCE: where the data are coming from
        - DESCHOSRT: short description
        - DESCLONG: long description
        - NOTE: some notes on the datasets.
    
    For the datasets to be useful in the learn scikits, which is the project which initiated this datasets package, the data returned by load has to be a dict with the following conventions:
        - 'data': this value should be a record array containing the actual data.
        - 'label': this value should be a rank 1 array of integers, contains the
          label index for each sample, that is label[i] should be the label index
          of data[i].
        - 'class': a record array such as class[i] is the class name. In other
          words, this makes the correspondance label index <> label name.