diff --git a/doc/contribute.rst b/doc/contribute.rst index beafde7fe1c3869926f1f491e2d61b7f5bbbb205..ed038a5072c5529f905b589fecdf2779ebbaa380 100644 --- a/doc/contribute.rst +++ b/doc/contribute.rst @@ -1,12 +1,12 @@ -=========== -Development -=========== +=============== +Contributing +=============== This project is a community effort, and everyone is welcomed to contribute. -Bug Tracker -=========== +Submitting a bug report +========================= In case you experience difficulties using the package, do not hesitate to submit a ticket to the @@ -15,11 +15,11 @@ to submit a ticket to the You are also welcomed to post there feature requests and patches. Code -==== +====== Git repo --------- +---------- You can check the latest sources with the command:: @@ -31,7 +31,7 @@ or if you have write privileges:: If you have contributed some code and would like to have write privileges in subversion repository, please contact me (Fabian -Pedregosa <fabian.pedregosa@inria.fr>) and I'll give you write +Pedregosa <fabian.pedregosa@inria.fr>) and We'll give you write privileges. If you run the development version, it is cumbersome to re-install the @@ -47,6 +47,9 @@ Patches Patches are the prefered way to contribute to a project if you do not have write privileges. +Before submitting a patch, be sure to read the coding style guidelines +(below). + Let's suppose that you have the latest sources for subversion and that you just made some modifications that you'd like to share with the world. You might proceed as: @@ -80,7 +83,7 @@ experienced developers concentrate on other issues. Roadmap ------- -`Here <http://sourceforge.net/apps/trac/scikit-learn/roadmap`_ you +`Here <http://sourceforge.net/apps/trac/scikit-learn/roadmap>`_ you will find a detailed roadmap, with a description on what's planned to be implemented in the following releases. @@ -93,15 +96,15 @@ You can also help making binary distributions for windows, OsX or packages for s distribution. Developers web site -=================== +===================== More information can be found at the developer's web site: http://sourceforge.net/apps/trac/scikit-learn/wiki , which contains a wiki, an issue tracker, and a Roadmap Documentation -============= +=============== -I am glad to accept any sort of documentation: function docstrings, +We are glad to accept any sort of documentation: function docstrings, rst docs (like this one), tutorials, etc. Rst docs live in the source code repository, under directory doc/. @@ -111,8 +114,8 @@ directory _build/html/ with html files that are viewable in a web browser. -API guidelines -============== +Coding guidelines +=================== The following are some guidelines on how new code should be written. Of course, there are special cases and there will be @@ -120,18 +123,92 @@ exceptions to these rules. However, following these rules when submitting new code makes the review easier so new code can be integrated in less time. +Coding guidelines +------------------- + +Coding style +^^^^^^^^^^^^^ + +Uniformly formated code makes it easier to share code ownership. + +The scikit learn tries to follow closely the officiel Python guidelines +detailed in `PEP8 <http://www.python.org/dev/peps/pep-0008/>`_ that +details how code should be formatted, and indented. Please read it and +follow it. + +In addition, we add the following guidelines: + + * Use underscores to separate words in non class names: `n_samples` + rather than `nsamples`. + + * Avoid multiple statements on one line. Prefer a line return after + a control flow statement (`if`/`for`). + + * **Please don't use `import *` in any case**. It is considered harmful + by the `official Python recommandations + <http://docs.python.org/howto/doanddont.html#from-module-import>`_. + It makes the code harder to read as the origine of symbols is no + longer explicitely referenced, but most important, it prevents + using a static analysis tool like `pyflakes + <http://www.divmod.org/trac/wiki/DivmodPyflakes>`_ to automatically + find bugs in the scikit. + +A good example of code that we like can be found `here +<https://svn.enthought.com/enthought/browser/sandbox/docs/coding_standard.py>`_. + +APIs of scikit learn objects +----------------------------- + +To have a uniform API, we try to have a common basic API for all the +objects. In addition, to avoid the proliferation of framework code, we +try to adopt simple conventions and limit to a minimum the number of +methods an object has to implement. + +Different objects +^^^^^^^^^^^^^^^^^^ + +The main objects of the scikit learn are (one class can implement +multiple interfaces): + +:Estimator: + + The base object, implements:: + + obj.fit(data) + +:Predictor: + + For suppervised learning, implements:: + + target = obj.predict(data) + +:Transformer: + + For filtering or modifying the data, in a supervised or unsupervised + way, implements:: + + new_data = obj.transform(data) + +:Model: + + A model that can give a goodness of fit or a likelihood of unseen + data, implements (higher is better):: + + score = obj.score(data) Estimators ----------- +^^^^^^^^^^^ The API has one predominant object: the estimator. A estimator is an object that fits a model based on some training data and is capable of inferring some properties on new data. It can be for instance a -classifier or a regressor. +classifier or a regressor. All estimators implement the fit method:: + + estimator.fit(X, y) Instantiation -^^^^^^^^^^^^^ +................ This concerns the object creation. The object's __init__ method might accept as arguments constants that determine the estimator behavior @@ -145,8 +222,25 @@ this is leaved to the ``fit()`` method:: clf3 = SVM([[1, 2], [2, 3]], [-1, 1]) # WRONG! +The arguments that go in the `__init__` should all be keyword arguments +with a defaut value. In other words, a user should be able to instanciate +an estimator without passing to it any arguments. + +The arguments in given at instanciation of an estimator should all +correspond to hyper parameters describing the model or the optimisation +problem that estimator tries to solve. They should however not be +parameters of the estimation routine: these are passed directly to the +`fit` method. + +In addition, **every keyword argument given to the `__init__` should +correspond to an attribute on the instance**. The scikit relies on this +to find what are the relevent attributes to set on an estimator when +doing model selection. + +All estimators should inherit from `scikit.learn.base.BaseEstimator` + Fitting -^^^^^^^ +........ The next thing you'll probably want to do is to estimate some parameters in the model. This is implemented in the .fit() method. @@ -175,7 +269,7 @@ The method should return the object (self). Python tuples -^^^^^^^^^^^^^ +............... In addition to numpy arrays, all methods should be able to accept python tuples as arguments. In practice, this means you should call @@ -184,14 +278,14 @@ arrays. Optional Arguments -^^^^^^^^^^^^^^^^^^ +..................... In iterative algorithms, number of iterations should be specified by an int called ``n_iter``. TODO ----- +^^^^^ Some things are must still be decided: * what should happen when predict is called before than fit() ? @@ -200,7 +294,7 @@ Some things are must still be decided: Specific models ---------------- +^^^^^^^^^^^^^^^^ In linear models, coefficients are stored in an array called ``coef_``, and independent term is stored in ``intercept_``.