Skip to content
Snippets Groups Projects
Commit 28bcb43d authored by Loic Esteve's avatar Loic Esteve Committed by GitHub
Browse files

[MRG + 1] DOC replace RandomizedPCA with PCA and svd_solver='randomized' in documentation (#7450)

parents 687615d3 29992d95
No related branches found
No related tags found
No related merge requests found
......@@ -98,8 +98,8 @@ number of samples to be processed in the dataset.
.. _RandomizedPCA:
Approximate PCA
---------------
PCA using randomized SVD
------------------------
It is often interesting to project data to a lower-dimensional
space that preserves most of the variance, by dropping the singular vector
......@@ -116,10 +116,11 @@ dimension (say around 200 for instance). The PCA algorithm can be used
to linearly transform the data while both reducing the dimensionality
and preserve most of the explained variance at the same time.
The class :class:`RandomizedPCA` is very useful in that case: since we
are going to drop most of the singular vectors it is much more efficient
to limit the computation to an approximated estimate of the singular
vectors we will keep to actually perform the transform.
The class :class:`PCA` used with the optional parameter
``svd_solver='randomized'`` is very useful in that case: since we are going
to drop most of the singular vectors it is much more efficient to limit the
computation to an approximated estimate of the singular vectors we will keep
to actually perform the transform.
For instance, the following shows 16 sample portraits (centered around
0.0) from the Olivetti dataset. On the right hand side are the first 16
......@@ -138,23 +139,23 @@ less than 1s:
.. centered:: |orig_img| |pca_img|
:class:`RandomizedPCA` can hence be used as a drop in replacement for
:class:`PCA` with the exception that we need to give it the size of
the lower-dimensional space ``n_components`` as a mandatory input parameter.
Note: with the optional parameter ``svd_solver='randomized'``, we also
need to give :class:`PCA` the size of the lower-dimensional space
``n_components`` as a mandatory input parameter.
If we note :math:`n_{max} = max(n_{samples}, n_{features})` and
:math:`n_{min} = min(n_{samples}, n_{features})`, the time complexity
of :class:`RandomizedPCA` is :math:`O(n_{max}^2 \cdot n_{components})`
of the randomized :class:`PCA` is :math:`O(n_{max}^2 \cdot n_{components})`
instead of :math:`O(n_{max}^2 \cdot n_{min})` for the exact method
implemented in :class:`PCA`.
The memory footprint of :class:`RandomizedPCA` is also proportional to
The memory footprint of randomized :class:`PCA` is also proportional to
:math:`2 \cdot n_{max} \cdot n_{components}` instead of :math:`n_{max}
\cdot n_{min}` for the exact method.
Note: the implementation of ``inverse_transform`` in :class:`RandomizedPCA`
is not the exact inverse transform of ``transform`` even when
``whiten=False`` (default).
Note: the implementation of ``inverse_transform`` in :class:`PCA` with
``svd_solver='randomized'`` is not the exact inverse transform of
``transform`` even when ``whiten=False`` (default).
.. topic:: Examples:
......
......@@ -38,7 +38,7 @@ from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_lfw_people
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import RandomizedPCA
from sklearn.decomposition import PCA
from sklearn.svm import SVC
......@@ -88,7 +88,8 @@ n_components = 150
print("Extracting the top %d eigenfaces from %d faces"
% (n_components, X_train.shape[0]))
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
pca = PCA(n_components=n_components, svd_solver='randomized',
whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))
eigenfaces = pca.components_.reshape((n_components, h, w))
......
......@@ -66,8 +66,9 @@ def plot_gallery(title, images, n_col=n_col, n_row=n_row):
# List of the different estimators, whether to center and transpose the
# problem, and whether the transformer uses the clustering API.
estimators = [
('Eigenfaces - RandomizedPCA',
decomposition.RandomizedPCA(n_components=n_components, whiten=True),
('Eigenfaces - PCA using randomized SVD',
decomposition.PCA(n_components=n_components, svd_solver='randomized',
whiten=True),
True),
('Non-negative components - NMF',
......@@ -122,7 +123,14 @@ for name, estimator, center in estimators:
components_ = estimator.cluster_centers_
else:
components_ = estimator.components_
if hasattr(estimator, 'noise_variance_'):
# Plot an image representing the pixelwise variance provided by the
# estimator e.g its noise_variance_ attribute. The Eigenfaces estimator,
# via the PCA decomposition, also provides a scalar noise_variance_
# (the mean of pixelwise variance) that cannot be displayed as an image
# so we skip it.
if (hasattr(estimator, 'noise_variance_') and
estimator.noise_variance_.ndim > 0): # Skip the Eigenfaces case
plot_gallery("Pixelwise variance",
estimator.noise_variance_.reshape(1, -1), n_col=1,
n_row=1)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment