sklearn/model_selection/tests/test_search.py · 36bc053a69ac5b9ba5a54cb2bd19adb33dcde50e · Ian Johnson / scikit-learn

7 years ago

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on... · a08555a2

Raghav RV authored 7 years ago

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d98353380a186fbc3dade211670ee72726d.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

a08555a2

History

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on...

Raghav RV authored 7 years ago

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d98353380a186fbc3dade211670ee72726d.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline