diff --git a/doc/tutorial/statistical_inference/supervised_learning.rst b/doc/tutorial/statistical_inference/supervised_learning.rst index 6fab7e3cbb59e7645ec287e6433682f3f214038b..e5342c5cad64a305245c47269f0ea347b3d161df 100644 --- a/doc/tutorial/statistical_inference/supervised_learning.rst +++ b/doc/tutorial/statistical_inference/supervised_learning.rst @@ -109,21 +109,21 @@ The curse of dimensionality For an estimator to be effective, you need the distance between neighboring points to be less than some value :math:`d`, which depends on the problem. -In one dimension, this requires on average :math:`n ~ 1/d` points. +In one dimension, this requires on average :math:`n \sim 1/d` points. In the context of the above :math:`k`-NN example, if the data is described by just one feature with values ranging from 0 to 1 and with :math:`n` training observations, then new data will be no further away than :math:`1/n`. Therefore, the nearest neighbor decision rule will be efficient as soon as :math:`1/n` is small compared to the scale of between-class feature variations. -If the number of features is :math:`p`, you now require :math:`n ~ 1/d^p` +If the number of features is :math:`p`, you now require :math:`n \sim 1/d^p` points. Let's say that we require 10 points in one dimension: now :math:`10^p` points are required in :math:`p` dimensions to pave the :math:`[0, 1]` space. As :math:`p` becomes large, the number of training points required for a good estimator grows exponentially. For example, if each point is just a single number (8 bytes), then an -effective :math:`k`-NN estimator in a paltry :math:`p~20` dimensions would +effective :math:`k`-NN estimator in a paltry :math:`p \sim 20` dimensions would require more training data than the current estimated size of the entire internet (±1000 Exabytes or so).