diff --git a/doc/developers/index.rst b/doc/developers/index.rst index 80e46ae41bcd0bd71711471f152f9280f4209511..78387281ba2285051b455d8c8696277c285c97de 100644 --- a/doc/developers/index.rst +++ b/doc/developers/index.rst @@ -18,6 +18,7 @@ You are also welcome to post there feature requests or links to pull-requests. .. _git_repo: + Retrieving the latest code ========================== @@ -51,18 +52,17 @@ Contributing code https://lists.sourceforge.net/lists/listinfo/scikit-learn-general - How to contribute ----------------- -The prefered way to contribute to `scikit-learn` is to fork the main +The prefered way to contribute to Scikit-Learn is to fork the main repository on `github <http://github.com/scikit-learn/scikit-learn/>`__: 1. `Create an account <https://github.com/signup/free>`_ on github if you don't have one already. - 2. Fork the `scikit-learn repo + 2. Fork the `project repository <http://github.com/scikit-learn/scikit-learn>`__: click on the 'Fork' button, at the top, center of the page. This creates a copy of the code on the github server where you can work. @@ -148,7 +148,6 @@ You can also check for common programming errors with the following tools: $ pip install pep8 $ pep8 path/to/module.py - Bonus points for contributions that include a performance analysis with a benchmark script and profiling output (please report on the mailing list or on the github wiki). @@ -163,7 +162,6 @@ details on profiling and cython optimizations. on all new contributions will get the overall code base quality in the right direction. - EasyFix Issues -------------- @@ -275,19 +273,20 @@ In addition, we add the following guidelines: A good example of code that we like can be found `here <https://svn.enthought.com/enthought/browser/sandbox/docs/coding_standard.py>`_. - Input validation ---------------- -The module ``sklearn.utils`` contains various functions for doing input +.. currentmodule:: sklearn.utils + +The module :mod:`sklearn.utils` contains various functions for doing input validation/conversion. Sometimes, ``np.asarray`` suffices for validation; do `not` use ``np.asanyarray`` or ``np.atleast_2d``, since those let NumPy's ``np.matrix`` through, which has a different API (e.g., ``*`` means dot product on ``np.matrix``, but Hadamard product on ``np.ndarray``). -In other cases, be sure to call ``safe_asarray``, ``atleast2d_or_csr``, -``as_float_array`` or ``array2d`` on any array-like argument passed to a +In other cases, be sure to call :func:`safe_asarray`, :func:`atleast2d_or_csr`, +:func:`as_float_array` or :func:`array2d` on any array-like argument passed to a scikit-learn API function. The exact function to use depends mainly on whether ``scipy.sparse`` matrices must be accepted. @@ -296,12 +295,12 @@ For more information, refer to the :ref:`developers-utils` page. Random Numbers -------------- -If your code depends on a random number generator, do not use -``numpy.random.random()`` or similar routines. To ensure +If your code depends on a random number generator, do not use +``numpy.random.random()`` or similar routines. To ensure repeatability in error checking, the routine should accept a keyword ``random_state`` and use this to construct a ``numpy.random.RandomState`` object. -See ``sklearn.utils.check_random_state`` in :ref:`developers-utils`. +See :func:`sklearn.utils.check_random_state` in :ref:`developers-utils`. Here's a simple example of code using some of the above guidelines: @@ -340,7 +339,6 @@ objects. In addition, to avoid the proliferation of framework code, we try to adopt simple conventions and limit to a minimum the number of methods an object has to implement. - Different objects ----------------- @@ -378,7 +376,6 @@ multiple interfaces): score = obj.score(data) - Estimators ---------- @@ -442,10 +439,9 @@ following is wrong:: # the argument in the constructor self.param3 = param2 -The scikit-learn relies on this mechanism to introspect object to set +Scikit-Learn relies on this mechanism to introspect object to set their parameters by cross-validation. - Fitting ^^^^^^^ @@ -501,14 +497,12 @@ Any attribute that ends with ``_`` is expected to be overridden when you call ``fit`` a second time without taking any previous value into account: **fit should be idempotent**. - Optional Arguments ^^^^^^^^^^^^^^^^^^ In iterative algorithms, number of iterations should be specified by an int called ``n_iter``. - Unresolved API issues ---------------------- @@ -518,7 +512,6 @@ Some things are must still be decided: * which exception should be raised when arrays' shape do not match in fit() ? - Working notes --------------- @@ -526,7 +519,6 @@ For unresolved issues, TODOs, remarks on ongoing work, developers are adviced to maintain notes on the github wiki: https://github.com/scikit-learn/scikit-learn/wiki - Specific models ----------------- diff --git a/doc/developers/utilities.rst b/doc/developers/utilities.rst index e1232657dcce7511f5207a0e79bf36a32ed34ef2..c85e73366e291bc61850241147fb2606722c971e 100644 --- a/doc/developers/utilities.rst +++ b/doc/developers/utilities.rst @@ -3,19 +3,22 @@ ======================== Utilities for Developers ======================== -Scikit-learn contains a number of utilities to help with development. These -are located in ``sklearn.utils``, and include tools in a number of categories. -All the following functions and classes are in the module ``sklearn.utils``. -Please note that these utilities are meant to be used internally within +Scikit-learn contains a number of utilities to help with development. These are +located in :mod:`sklearn.utils`, and include tools in a number of categories. +All the following functions and classes are in the module :mod:`sklearn.utils`. + +Please note that these utilities are meant to be used internally within scikit-learn. They are not guaranteed to be stable between versions of scikit-learn. Backports, in particular, will be removed as the scikit-learn dependencies evolve. + .. currentmodule:: sklearn.utils Validation Tools ----------------- +================ + These are tools used to check and validate input. When you write a function which accepts arrays, matrices, or sparse matrices as arguments, the following should be used when applicable. @@ -30,7 +33,7 @@ should be used when applicable. - :func:`array2d`: equivalent to ``np.atleast_2d``, but the ``order`` and ``dtype`` of the input are maintained. - + - :func:`atleast2d_or_csr`: equivalent to ``array2d``, but if a sparse matrix is passed, will convert to csr format. Also calls ``assert_all_finite``. @@ -50,7 +53,7 @@ number generator object. - :func:`check_random_state`: create a ``np.random.RandomState`` object from a parameter ``random_state``. - + - If ``random_state`` is ``None`` or ``np.random``, then a randomly-initialized ``RandomState`` object is returned. - If ``random_state`` is an integer, then it is used to seed a new @@ -67,7 +70,7 @@ For example: Efficient Linear Algebra & Array Operations -------------------------------------------- +=========================================== - :func:`extmath.randomized_range_finder`: construct an orthonormal matrix whose range approximates the range of the input. This is used in @@ -113,12 +116,13 @@ Efficient Linear Algebra & Array Operations - :func:`shuffle`: Shuffle arrays or sparse matrices in a consistent way. Used in ``sklearn.cluster.k_means``. + Graph Routines --------------- +============== - :func:`graph.single_source_shortest_path_length`: (not currently used in scikit-learn) - Return the shortest path from a single source + Return the shortest path from a single source to all connected nodes on a graph. Code is adapted from networkx. If this is ever needed again, it would be far faster to use a single iteration of Dijkstra's algorithm from ``graph_shortest_path``. @@ -127,7 +131,7 @@ Graph Routines (used in :func:`sklearn.cluster.spectral.spectral_embedding`) Return the Laplacian of a given graph. There is specialized code for both dense and sparse connectivity matrices. - + - :func:`graph_shortest_path.graph_shortest_path`: (used in :class:``sklearn.manifold.Isomap``) Return the shortest path between all pairs of connected points on a directed @@ -135,15 +139,16 @@ Graph Routines algorithm are available. The algorithm is most efficient when the connectivity matrix is a ``scipy.sparse.csr_matrix``. + Backports ---------- +========= - :class:`fixes.Counter` (partial backport of ``collections.Counter`` from Python 2.7) Used in ``sklearn.feature_extraction.text``. - :func:`fixes.unique`: (backport of ``np.unique`` from numpy 1.4). Find the unique entries in an array. In numpy versions < 1.4, ``np.unique`` is less - flexible. Used in ``sklearn.cross_validation``. + flexible. Used in :mod:`sklearn.cross_validation`. - :func:`fixes.copysign`: (backport of ``np.copysign`` from numpy 1.4). Change the sign of ``x1`` to that of ``x2``, element-wise. @@ -159,20 +164,21 @@ Backports - :func:`fixes.count_nonzero` (backport of ``np.count_nonzero`` from numpy 1.6). Count the nonzero elements of a matrix. Used in - tests of ``sklearn.linear_model``. + tests of :mod:`sklearn.linear_model`. - :func:`arrayfuncs.solve_triangular` (Back-ported from scipy v0.9) Used in ``sklearn.linear_model.omp``, - independent back-ports in ``sklearn.mixture.gmm`` and - ``sklearn.gaussian_process`` + independent back-ports in ``sklearn.mixture.gmm`` and + :mod:`sklearn.gaussian_process`. - :func:`sparsetools.cs_graph_components` (backported from ``scipy.sparse.cs_graph_components`` in scipy 0.9). Used in ``sklearn.cluster.hierarchical``, as well as in tests for - ``sklearn.feature_extraction``. + :mod:`sklearn.feature_extraction`. + ARPACK -~~~~~~ +------ - :func:`arpack.eigs` (backported from ``scipy.sparse.linalg.eigs`` in scipy 0.10) @@ -194,22 +200,24 @@ ARPACK Benchmarking -~~~~~~~~~~~~ +------------ - :func:`bench.total_seconds` (back-ported from ``timedelta.total_seconds`` - in Python 2.7). Used in ``benchmarks/bench_glm.py`` + in Python 2.7). Used in ``benchmarks/bench_glm.py``. + Testing Functions ------------------ +================= - :func:`testing.assert_in`: Compare string elements within lists. - Used in ``sklearn.datasets`` tests. + Used in :mod:`sklearn.datasets` tests. - :class:`mock_urllib2`: Object which mocks the urllib2 module to fake - requests of mldata. Used in tests of ``sklearn.datasets``. + requests of mldata. Used in tests of :mod:`sklearn.datasets`. + Helper Functions ----------------- +================ - :class:`gen_even_slices`: generator to create ``n``-packs of slices going up to ``n``. Used in ``sklearn.decomposition.dict_learning`` and @@ -217,12 +225,13 @@ Helper Functions - :class:`arraybuilder.ArrayBuilder`: Helper class to incrementally build a 1-d numpy.ndarray. Currently used in - ``sklearn.datasets._svmlight_format.pyx`` + ``sklearn.datasets._svmlight_format.pyx``. + Warnings and Exceptions ------------------------ +======================= - :class:`deprecated`: Decorator to mark a function or class as deprecated. - :class:`ConvergenceWarning`: Custom warning to catch convergence problems. - Used in ``sklearn.covariance.graph_lasso`` + Used in ``sklearn.covariance.graph_lasso``. diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst index eb4fac58bd6f89d39657ca197ae8eba4b5551226..1f69e971388628a1850963154c26f96f81910d57 100644 --- a/doc/modules/classes.rst +++ b/doc/modules/classes.rst @@ -847,6 +847,8 @@ Low-level methods :no-members: :no-inherited-members: +**Developer guide:** See the :ref:`developers-utils` page for further details. + .. currentmodule:: sklearn .. autosummary::