Select Git revision
bench_isotonic.py
-
Andrew Tulloch authored
This supercedes #2940. PAVA runs in linear time, but the existing algorithm was scaling approximately quadratically, due to the array pop in the inner loop being O(N). I implemented the O(N) version of PAVA using the decreasing subsequences trick, and the performance is significantly faster in benchmarking. The benchmarks in this diff (adapted from one of the existing unit tests) show significant performance improvements - on the order of ~7x faster for problems of size ~1000, ~30x faster for problems of size 10,000, and ~250x faster for problems of size 100,000. On correctness - unit tests cover the isotonic regression code fairly well, and all pass before and after the change. It's a fairly well known algorithm with a bunch of implementations, so I think this is correct. In coding up this algorithm I made some mistakes and the unit tests caught the failures, which makes me more confident in the correctness now. Still, the performance improvements are surprisingly high. Added a benchmark script. For an example usage, run: ``` python benchmarks/bench_isotonic.py --iterations 10 --log_min_problem_size 2 --log_max_problem_size 8 --dataset logistic ```
Andrew Tulloch authoredThis supercedes #2940. PAVA runs in linear time, but the existing algorithm was scaling approximately quadratically, due to the array pop in the inner loop being O(N). I implemented the O(N) version of PAVA using the decreasing subsequences trick, and the performance is significantly faster in benchmarking. The benchmarks in this diff (adapted from one of the existing unit tests) show significant performance improvements - on the order of ~7x faster for problems of size ~1000, ~30x faster for problems of size 10,000, and ~250x faster for problems of size 100,000. On correctness - unit tests cover the isotonic regression code fairly well, and all pass before and after the change. It's a fairly well known algorithm with a bunch of implementations, so I think this is correct. In coding up this algorithm I made some mistakes and the unit tests caught the failures, which makes me more confident in the correctness now. Still, the performance improvements are surprisingly high. Added a benchmark script. For an example usage, run: ``` python benchmarks/bench_isotonic.py --iterations 10 --log_min_problem_size 2 --log_max_problem_size 8 --dataset logistic ```