diff --git a/doc/modules/decomposition.rst b/doc/modules/decomposition.rst index b94d3022acc4a6cbe270863c405dd3a26192a618..4eed688551fc7468a5ba008b9e9b22236946a47c 100644 --- a/doc/modules/decomposition.rst +++ b/doc/modules/decomposition.rst @@ -235,6 +235,112 @@ factorization, while larger values shrink many coefficients to zero. R. Jenatton, G. Obozinski, F. Bach, 2009 +.. _DictionaryLearning: + +Dictionary Learning +=================== + +Generic dictionary learning +--------------------------- + +Dictionary learning (:class:`DictionaryLearning`) is a matrix factorization +problem that amounts to finding a (usually overcomplete) dictionary that will +perform good at sparsely encoding the fitted data. + +Representing data as sparse combinations of atoms from an overcomplete +dictionary is suggested to be the way the mammal primary visual cortex works. +Consequently, dictionary learning applied on image patches has been shown to +give good results in image processing tasks such as image completion, +inpainting and denoising, as well as for supervised recognition tasks. + +Dictionary learning is an optimization problem solved by alternatively updating +the sparse code, as a solution to multiple Lasso problems, considering the +dictionary fixed, and then updating the dictionary to best fit the sparse code. + +.. math:: + (U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2} + ||X-UV||_2^2+\alpha||U||_1 \\ + \text{subject to\,} & ||V_k||_2 = 1 \text{ for all } + 0 \leq k < n_{atoms} + +After using such a procedure to fit the dictionary, the fitted object can be +used to transform new data. The transformation amounts to a sparse coding +problem: finding a representation of the data as a linear combination of as few +dictionary atoms as possible. All variations of dictionary learning implement +the following transform methods, controllable via the `transform_method` +initialization parameter: + + +* Orthogonal matching pursuit (:ref:`omp`) + +* Least-angle regression (:ref:`least_angle_regression`) + +* Lasso computed by least-angle regression + +* Lasso using coordinate descent (:ref:`lasso`) + +* Thresholding + +Thresholding is very fast but it does not yield accurate reconstructions. +They have been shown useful in literature for classification tasks. For image +reconstruction tasks, orthogonal matching pursuit yields the most accurate, +unbiased reconstruction. + +The dictionary learning objects offer, via the `split_code` parameter, the +possibility to separate the positive and negative values in the results of +sparse coding. This is useful when dictionary learning is used for extracting +features that will be used for supervised learning, because it allows the +learning algorithm to assign different weights to negative loadings of a +particular atom, from to the corresponding positive loading. + +The split code for a single sample has length `2 * n_atoms` +and is constructed using the following rule: First, the regular code of length +`n_atoms` is computed. Then, the first `n_atoms` entries of the split_code are +filled with the positive part of the regular code vector. The second half of +the split code is filled with the negative part of the code vector, only with +a positive sign. Therefore, the split_code is non-negative. + +The following image shows how a dictionary learned from 4x4 pixel image patches +extracted from part of the image of Lena looks like. + + +.. figure:: ../auto_examples/decomposition/images/plot_img_denoising_1.png + :target: ../auto_examples/decomposition/plot_img_denoising.html + :align: center + :scale: 50% + + +.. topic:: Examples: + + * :ref:`example_decomposition_plot_img_denoising.py` + + +.. topic:: References: + + * `"Online dictionary learning for sparse coding" + <http://www.di.ens.fr/sierra/pdfs/icml09.pdf>`_ + J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009 + +.. _MiniBatchDictionaryLearning + +Mini-batch dictionary learning +-------------------------- + +:class:`MiniBatchDictionaryLearning` implements a faster, but less accurate +version of the dictionary learning algorithm that is better suited for large +datasets. + +By default, :class:`MiniBatchDictionaryLearning` divides the data into +mini-batches and optimizes in an online manner by cycling over the mini-batches +for the specified number of iterations. However, at the moment it does not +implement a stopping condition. + +The estimator also implements `partial_fit`, which updates the dictionary by +iterating only once over a mini-batch. This can be used for online learning +when the data is not readily available from the start, or for when the data +does not fit into the memory. + + .. _ICA: Independent component analysis (ICA) @@ -348,103 +454,3 @@ of the data. <http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf>`_ C. Boutsidis, E. Gallopoulos, 2008 - - -.. _DictionaryLearning: - -Dictionary Learning -=================== - -Generic dictionary learning ---------------------------- - -Dictionary learning (:class:`DictionaryLearning`) is a matrix factorization -problem that amounts to finding a (usually overcomplete) dictionary that will -perform good at sparsely encoding the fitted data. - -Representing data as sparse combinations of atoms from an overcomplete -dictionary is suggested to be the way the mammal primary visual cortex works. -Consequently, dictionary learning applied on image patches has been shown to -give good results in image processing tasks such as image completion, -inpainting and denoising, as well as for supervised recognition tasks. - -Dictionary learning is an optimization problem solved by alternatively updating -the sparse code, as a solution to multiple Lasso problems, considering the -dictionary fixed, and then updating the dictionary to best fit the sparse code. - -After using such a procedure to fit the dictionary, the fitted object can be -used to transform new data. The transformation amounts to a sparse coding -problem: finding a representation of the data as a linear combination of as few -dictionary atoms as possible. All variations of dictionary learning implement -the following transform methods, controllable via the `transform_method` -initialization parameter: - - -* Orthogonal matching pursuit (:ref:`omp`) - -* Least-angle regression (:ref:`least_angle_regression`) - -* Lasso computed by least-angle regression - -* Lasso using coordinate descent (:ref:`lasso`) - -* Thresholding - -Thresholding is very fast but it does not yield accurate reconstructions. -They have been shown useful in literature for classification tasks. For image -reconstruction tasks, orthogonal matching pursuit yields the most accurate, -unbiased reconstruction. - -The dictionary learning objects offer, via the `split_code` parameter, the -possibility to separate the positive and negative values in the results of -sparse coding. This is useful when dictionary learning is used for extracting -features that will be used for supervised learning, because it allows the -learning algorithm to assign different weights to negative loadings of a -particular atom, from to the corresponding positive loading. - -The split code for a single sample has length `2 * n_atoms` -and is constructed using the following rule: First, the regular code of length -`n_atoms` is computed. Then, the first `n_atoms` entries of the split_code are -filled with the positive part of the regular code vector. The second half of -the split code is filled with the negative part of the code vector, only with -a positive sign. Therefore, the split_code is non-negative. - -The following image shows how a dictionary learned from 4x4 pixel image patches -extracted from part of the image of Lena looks like. - - -.. figure:: ../auto_examples/decomposition/images/plot_img_denoising_1.png - :target: ../auto_examples/decomposition/plot_img_denoising.html - :align: center - :scale: 50% - - -.. topic:: Examples: - - * :ref:`example_decomposition_plot_img_denoising.py` - - -.. topic:: References: - - * `"Online dictionary learning for sparse coding" - <http://www.di.ens.fr/sierra/pdfs/icml09.pdf>`_ - J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009 - -.. _MiniBatchDictionaryLearning - -Mini-batch dictionary learning --------------------------- - -:class:`MiniBatchDictionaryLearning` implements a faster, but less accurate -version of the dictionary learning algorithm that is better suited for large -datasets. - -By default, :class:`MiniBatchDictionaryLearning` divides the data into -mini-batches and optimizes in an online manner by cycling over the mini-batches -for the specified number of iterations. However, at the moment it does not -implement a stopping condition. - -The estimator also implements `partial_fit`, which updates the dictionary by -iterating only once over a mini-batch. This can be used for online learning -when the data is not readily available from the start, or for when the data -does not fit into the memory.