Skip to content
Snippets Groups Projects
Commit 83671099 authored by Jaques Grobler's avatar Jaques Grobler Committed by Andreas Mueller
Browse files

docstring fixes

parent e50fad65
No related branches found
No related tags found
No related merge requests found
...@@ -27,7 +27,7 @@ increase as we add more samples. The penalization term, however, will not ...@@ -27,7 +27,7 @@ increase as we add more samples. The penalization term, however, will not
increase. increase.
When using, for example, :ref:`cross validation <cross_validation>`, to When using, for example, :ref:`cross validation <cross_validation>`, to
set amount of regularization with `C`, there will be a different set the amount of regularization with `C`, there will be a different
amount of samples between every problem that we are using for model amount of samples between every problem that we are using for model
selection, as well as for the final problem that we want to use for selection, as well as for the final problem that we want to use for
training. training.
...@@ -38,7 +38,7 @@ The question that arises is `How do we optimally adjust C to ...@@ -38,7 +38,7 @@ The question that arises is `How do we optimally adjust C to
account for the different training samples?` account for the different training samples?`
The figures below are used to illustrate the effect of scaling our The figures below are used to illustrate the effect of scaling our
`C` to compensate for the change in the amount of samples, in the `C` to compensate for the change in the number of samples, in the
case of using an `L1` penalty, as well as the `L2` penalty. case of using an `L1` penalty, as well as the `L2` penalty.
L1-penalty case L1-penalty case
...@@ -47,7 +47,7 @@ In the `L1` case, theory says that prediction consistency ...@@ -47,7 +47,7 @@ In the `L1` case, theory says that prediction consistency
(i.e. that under given hypothesis, the estimator (i.e. that under given hypothesis, the estimator
learned predicts as well as an model knowing the true distribution) learned predicts as well as an model knowing the true distribution)
is not possible because of the bias of the `L1`. It does say, however, is not possible because of the bias of the `L1`. It does say, however,
that model consistancy, in terms of finding the right set of non-zero that model consistency, in terms of finding the right set of non-zero
parameters as well as their signs, can be achieved by scaling parameters as well as their signs, can be achieved by scaling
`C1`. `C1`.
...@@ -64,7 +64,7 @@ corresponding cross-validation scores on the `y-axis`, for several different ...@@ -64,7 +64,7 @@ corresponding cross-validation scores on the `y-axis`, for several different
fractions of a generated data-set. fractions of a generated data-set.
In the `L1` penalty case, the results are best when scaling our `C` with In the `L1` penalty case, the results are best when scaling our `C` with
the amount of samples, `n`, which can be seen in the first figure. the number of samples, `n`, which can be seen in the first figure.
For the `L2` penalty case, the best result comes from the case where `C` For the `L2` penalty case, the best result comes from the case where `C`
is not scaled. is not scaled.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment