From f0ce49fb2334b0defb71fa80c23ae8e2a5e706e0 Mon Sep 17 00:00:00 2001 From: Fabian Pedregosa <fabian.pedregosa@inria.fr> Date: Thu, 18 Mar 2010 13:09:12 +0000 Subject: [PATCH] Fix spelling errors in svm doc. Add a bit of doc. From: Fabian Pedregosa <fabian.pedregosa@inria.fr> git-svn-id: https://scikit-learn.svn.sourceforge.net/svnroot/scikit-learn/trunk@564 22fbfee3-77ab-4535-9bad-27d1bd3bc7d8 --- doc/modules/svm.rst | 85 +++++++++++++-------------------------------- 1 file changed, 24 insertions(+), 61 deletions(-) diff --git a/doc/modules/svm.rst b/doc/modules/svm.rst index 35de971b84..b7f26f0557 100644 --- a/doc/modules/svm.rst +++ b/doc/modules/svm.rst @@ -4,7 +4,7 @@ Support Vector Machines **Support vector machines (SVMs)** are a set of supervised learning methods used for classification and regression. In simple words, given -a set of training examples, witheach sample marked as belonging to one +a set of training examples, with each sample marked as belonging to one of the multiple categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other. @@ -13,7 +13,7 @@ More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest -distance to the nearest training datapoints of any class (so-called +distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. @@ -34,6 +34,13 @@ non-linear and the transformed space high dimensional; thus though the classifier is a hyperplane in the high-dimensional feature space, it may be non-linear in the original input space. +The decision function in this case will be: + +.. math:: sgn(\sum_{i=1}^l \alpha_i K(x_i, x) + \rho) + +where :math:`\alpha, \rho` can be accessed through fields support_ and +rho_ of the classifier instance, respectevely. + If the kernel used is a Gaussian radial basis function, the corresponding feature space is a Hilbert space of infinite dimension. Maximum margin classifiers are well regularized, so the @@ -41,10 +48,9 @@ infinite dimension does not spoil the results. Available kernels are, * linear :math:`(1 + <x, x'>)` * polynomial :math:`(1 + <x, x'>)^d` - * radial :math:`exp(-\gamma |x-x'|^2)` + * radial basis :math:`exp(-\gamma |x-x'|^2)` * sigmoid :math:`tanh(x_i x_j + c)` - The exclusive-OR is the simplest problem that cannot be solved using a linear kernel. In this problem, point (x, y) belongs has target 1 if and only if x > 0 XOR y > 0. In the following example, we create a @@ -80,7 +86,7 @@ data close to the model prediction. Distribution estimation ======================= -One-class SVM is used for outlayer detection, that is, given a set of +One-class SVM is used for out-layer detection, that is, given a set of samples, it will detect the soft boundary of that set. .. linteralinclude:: ../../examples/plot_svm_oneclass.py @@ -91,71 +97,28 @@ samples, it will detect the soft boundary of that set. Scaling ======= +Support Vector Machine algorithms are not scale-invariant, so it is +highly recommended to standarize the input vector X to have mean 0 and +variance 1. Note that the *same* scaling must be applied to the test +vector to obtain meaningful results. -TODO - -.. Mathematical formulation (Model selection) -.. ======================== - - -.. C-support vector classification (C-SVC) -.. --------------------------------------- -.. Given training vectors :math:`x_i \in \mathbb{R}^n , i=1, ..., l` in -.. two classes, and a vector :math:`y \in \mathbb{R}^l` such that -.. :math:`y_i \in {1, -1}`, C-SVC solves the following primal problem: - -.. .. math:: \min_{w, b, \xi} {1 \over 2} w^T w + C \sum_{i=1}^l \xi_i -.. .. math:: \textrm{subject to}\ y_i (w^T \phi(x_i) + b) \geq 1 - \xi_i -.. .. math:: \xi_i >= 0, i=1, .., l - -.. Here training vectors :math:`x_i` are mapped into a higher (maybe -.. infinite) dimensional space by the function :math:`phi`. The decision -.. function is - -.. .. math:: sgn(\sum_{i=0}^l y_i \alpha_i K(x_i, x) + b) - -.. This is implemented in class SVC - +See `The CookBook +<https://sourceforge.net/apps/trac/scikit-learn/wiki/CookBook>` for +some examples on scaling. -.. Nu-Support Vector Classification -.. -------------------------------- -.. The nu-Support Vector Classification uses a new parameter :math:`\nu` -.. which controls the number of support vectors and trainign errors. The -.. parameter :math:`nu \in (0, 1]` is an upper bound on the fraction of -.. training errors and a lower bound of the fraction of support vectors. -.. Given training vectors :math:`x_i \in \mathbb{R}^n , i=1, ..., l` in -.. two classes, and a vector :math:`y \in \mathbb{R}^l` such that -.. :math:`y_i \in {1, -1}`, C-SVC solves the following primal problem: - -.. .. math:: \min_{w, b, \xi} {1 \over 2} w^T w - \nu \rho + {1 \over 2} \sum_{i=1}^l \xi_i - -.. \textrm{subject to}\ y_i (w^T \phi(x_i) + b) \geq \rho - \xi_i - -.. \xi_i \geq 0, i=1, .., l, \rho \geq 0 - -.. The decision function is: - -.. .. math:: sgn(\sum_{i=1}^l y_i \alpha_i K(x_i, x) + b - -.. This is implemented in SVC(impl='nu-svc') - - - -Low-level implementation -======================== - -Internally, we use libsvm[1] to handle all computations. Libsvm is binded -through some wrappers written in C and Cython. +Implementation details +====================== +Internally, we use libsvm[1] to handle all computations. Libsvm is wrapped +using C and Cython. .. [1] http://www.csie.ntu.edu.tw/~cjlin/libsvm/ References ========== - For a description of the implementation and details of the algorithms used, please refer to -http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf -http://en.wikipedia.org/wiki/Support_vector_machine + - http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf + - http://en.wikipedia.org/wiki/Support_vector_machine -- GitLab