Skip to content
Snippets Groups Projects
Select Git revision
20 results Searching

bench_isotonic.py

  • Andrew Tulloch's avatar
    3753563c
    ENH - Improve performance of isotonic regression · 3753563c
    Andrew Tulloch authored
    This supercedes #2940.
    
    PAVA runs in linear time, but the existing algorithm was scaling
    approximately quadratically, due to the array pop in the inner loop
    being O(N).
    
    I implemented the O(N) version of PAVA using the decreasing subsequences
    trick, and the performance is significantly faster in benchmarking.
    
    The benchmarks in this diff (adapted from one of the existing unit
    tests) show significant performance improvements - on the order of ~7x
    faster for problems of size ~1000, ~30x faster for problems of size
    10,000, and ~250x faster for problems of size 100,000.
    
    On correctness - unit tests cover the isotonic regression code fairly
    well, and all pass before and after the change. It's a fairly well known
    algorithm with a bunch of implementations, so I think this is correct.
    In coding up this algorithm I made some mistakes and the unit tests
    caught the failures, which makes me more confident in the correctness
    now. Still, the performance improvements are surprisingly high.
    
    Added a benchmark script. For an example usage, run:
    
    ```
    python benchmarks/bench_isotonic.py --iterations 10 --log_min_problem_size 2 --log_max_problem_size 8 --dataset logistic
    ```
    3753563c
    History
    ENH - Improve performance of isotonic regression
    Andrew Tulloch authored
    This supercedes #2940.
    
    PAVA runs in linear time, but the existing algorithm was scaling
    approximately quadratically, due to the array pop in the inner loop
    being O(N).
    
    I implemented the O(N) version of PAVA using the decreasing subsequences
    trick, and the performance is significantly faster in benchmarking.
    
    The benchmarks in this diff (adapted from one of the existing unit
    tests) show significant performance improvements - on the order of ~7x
    faster for problems of size ~1000, ~30x faster for problems of size
    10,000, and ~250x faster for problems of size 100,000.
    
    On correctness - unit tests cover the isotonic regression code fairly
    well, and all pass before and after the change. It's a fairly well known
    algorithm with a bunch of implementations, so I think this is correct.
    In coding up this algorithm I made some mistakes and the unit tests
    caught the failures, which makes me more confident in the correctness
    now. Still, the performance improvements are surprisingly high.
    
    Added a benchmark script. For an example usage, run:
    
    ```
    python benchmarks/bench_isotonic.py --iterations 10 --log_min_problem_size 2 --log_max_problem_size 8 --dataset logistic
    ```