Skip to content
Snippets Groups Projects
Commit 254b1091 authored by Jaques Grobler's avatar Jaques Grobler Committed by Andreas Mueller
Browse files

gael`s suggestions/tweaks

parent 6b04635d
Branches
Tags
No related merge requests found
...@@ -21,35 +21,35 @@ where ...@@ -21,35 +21,35 @@ where
and our model parameters. and our model parameters.
- :math:`\Omega` is a `penalty` function of our model parameters - :math:`\Omega` is a `penalty` function of our model parameters
If we consider the :math:`\mathcal{L}` function to be the individual error per If we consider the loss function to be the individual error per
sample, then the data-fit term, or the sum of the error for each sample, will sample, then the data-fit term, or the sum of the error for each sample, will
increase as we add more samples. The penalization term, however, will not increase as we add more samples. The penalization term, however, will not
increase. increase.
When using, for example, :ref:`cross validation <cross_validation>`, to When using, for example, :ref:`cross validation <cross_validation>`, to
set amount of regularization with :math:`C`, there will be a different set amount of regularization with `C`, there will be a different
amount of samples between every problem that we are using for model amount of samples between every problem that we are using for model
selection, as well as for the final problem that we want to use for selection, as well as for the final problem that we want to use for
training. training.
Since our loss function is dependant on the amount of samples, the latter Since our loss function is dependant on the amount of samples, the latter
will influence the selected value of :math:`C`. will influence the selected value of `C`.
The question that arises is `How do we optimally adjust C to The question that arises is `How do we optimally adjust C to
account for the different training samples?` account for the different training samples?`
The figures below are used to illustrate the effect of scaling our The figures below are used to illustrate the effect of scaling our
:math:`C` to compensate for the change in the amount of samples, in the `C` to compensate for the change in the amount of samples, in the
case of using an :math:`L1` penalty, as well as the :math:`L2` penalty. case of using an `L1` penalty, as well as the `L2` penalty.
L1-penalty case L1-penalty case
----------------- -----------------
In the :math:`L1` case, theory says that prediction consistency In the `L1` case, theory says that prediction consistency
(i.e. that under given hypothesis, the estimator (i.e. that under given hypothesis, the estimator
learned predicts as well as an model knowing the true distribution) learned predicts as well as an model knowing the true distribution)
is not possible because of the biasof the :math:`L1`. It does say, however, is not possible because of the biasof the `L1`. It does say, however,
that model consistancy, in terms of finding the right set of non-zero that model consistancy, in terms of finding the right set of non-zero
parameters as well as their signs, can be achieved by scaling parameters as well as their signs, can be achieved by scaling
:math:`C1`. `C1`.
L2-penalty case L2-penalty case
----------------- -----------------
...@@ -59,17 +59,21 @@ as the number of samples grow, in order to keep prediction consistency. ...@@ -59,17 +59,21 @@ as the number of samples grow, in order to keep prediction consistency.
Simulations Simulations
------------ ------------
The two figures below plot the values of :math:`C` on the `x-axis` and the The two figures below plot the values of `C` on the `x-axis` and the
corresponding cross-validation scores on the `y-axis`, for several different corresponding cross-validation scores on the `y-axis`, for several different
fractions of a generated data-set. fractions of a generated data-set.
In the :math:`L1` penalty case, the results are best when scaling our :math:`C` with In the `L1` penalty case, the results are best when scaling our `C` with
the amount of samples, `n`, which can be seen in the third plot of the first figure. the amount of samples, `n`, which can be seen in the third plot of the first figure.
For the :math:`L2` penalty case, the best result comes from the case where :math:`C` For the `L2` penalty case, the best result comes from the case where `C`
is not scaled. is not scaled.
.. topic:: Note:
Two seperate datasets are used for the two different plots. The reason
behind this is the `L1` case works better on sparse data, while `L2`
is better suited to the non-sparse case.
""" """
print __doc__ print __doc__
...@@ -116,9 +120,6 @@ colors = ['b', 'g', 'r', 'c'] ...@@ -116,9 +120,6 @@ colors = ['b', 'g', 'r', 'c']
for fignum, (clf, cs, X, y) in enumerate(clf_sets): for fignum, (clf, cs, X, y) in enumerate(clf_sets):
# set up the plot for each regressor # set up the plot for each regressor
pl.figure(fignum, figsize=(9, 10)) pl.figure(fignum, figsize=(9, 10))
pl.clf
pl.xlabel('C')
pl.ylabel('CV Score')
for k, train_size in enumerate(np.linspace(0.3, 0.7, 3)[::-1]): for k, train_size in enumerate(np.linspace(0.3, 0.7, 3)[::-1]):
param_grid = dict(C=cs) param_grid = dict(C=cs)
...@@ -136,16 +137,13 @@ for fignum, (clf, cs, X, y) in enumerate(clf_sets): ...@@ -136,16 +137,13 @@ for fignum, (clf, cs, X, y) in enumerate(clf_sets):
for subplotnum, (scaler, name) in enumerate(scales): for subplotnum, (scaler, name) in enumerate(scales):
pl.subplot(2, 1, subplotnum + 1) pl.subplot(2, 1, subplotnum + 1)
pl.xlabel('C')
pl.ylabel('CV Score')
grid_cs = cs * float(scaler) # scale the C's grid_cs = cs * float(scaler) # scale the C's
pl.semilogx(grid_cs, scores, label="fraction %.2f" % pl.semilogx(grid_cs, scores, label="fraction %.2f" %
train_size) train_size)
pl.title('scaling=%s, penalty=%s, loss=%s' % (name, clf.penalty, clf.loss)) pl.title('scaling=%s, penalty=%s, loss=%s' % (name, clf.penalty, clf.loss))
#ymin, ymax = pl.ylim()
#pl.axvline(grid_cs[np.argmax(scores)], 0, 1,
# color=colors[k])
#pl.ylim(ymin=ymin-0.0025, ymax=ymax+0.008) # adjust the y-axis
pl.legend(loc="best") pl.legend(loc="best")
pl.show() pl.show()
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment