Use threadpool for finding labels in chunk #327

dcherian · 2024-01-20T21:34:26Z

Figure out a good heuristic for when to use the threadpool

Great when we have lots of decent size chunks, particularly the NWM county groupby: 600ms -> 400ms.

| Change   | Before [627bf2b6] <main>   | After [7da43646] <threadpool>   |   Ratio | Benchmark (Parameter)                      |
|----------|----------------------------|---------------------------------|---------|--------------------------------------------|
| -        | 15.3±0.2ms                 | 7.93±0.03ms                     |    0.52 | cohorts.NWMMidwest.time_find_group_cohorts |

Great when we have lots of decent size chunks, particularly the NWM county groupby: 600ms -> 400ms. ``` | Before [0cccb90] <optimize-again> | After [38fe8a6c] <threadpool> | Ratio | Benchmark (Parameter) | |--------------------------------------|---------------------------------|---------|---------------------------------------------| | 3.50±0.2ms | 2.93±0.07ms | 0.84 | cohorts.PerfectMonthly.time_graph_construct | | 20.0±1ms | 9.66±1ms | 0.48 | cohorts.NWMMidwest.time_find_group_cohorts | ```

* main: Optimize bitmask finding for chunk size 1 and single chunk cases (#360) Edits to climatology doc (#361)

This reverts commit c6b93367e2024e60d77af24a69d177670a040dfc.

* main: (64 commits) import `normalize_axis_index` from `numpy.lib` on `numpy>=2` (#364) Optimize `min_count` when `expected_groups` is not provided. (#236) Use threadpool for finding labels in chunk (#327) Manually fuse reindexing intermediates with blockwise reduction for cohorts. (#300) Bump codecov/codecov-action from 4.1.1 to 4.3.1 (#362) Add cubed notebook for hourly climatology example using "map-reduce" method (#356) Optimize bitmask finding for chunk size 1 and single chunk cases (#360) Edits to climatology doc (#361) Fix benchmarks (#358) Trim CI (#355) [pre-commit.ci] pre-commit autoupdate (#350) Initial minimal working Cubed example for "map-reduce" (#352) Bump codecov/codecov-action from 4.1.0 to 4.1.1 (#349) `method` heuristics: Avoid dot product as much as possible (#347) Fix nanlen with strings (#344) Fix direct quantile reduction (#343) Fix upstream-dev CI, silence warnings (#341) Bump codecov/codecov-action from 4.0.0 to 4.1.0 (#338) Fix direct reductions of Xarray objects (#339) Test with py3.12 (#336) ...

* main: Bump codecov/codecov-action from 4.3.1 to 4.4.1 (#366) Cubed blockwise (#357) Remove errant print statement import `normalize_axis_index` from `numpy.lib` on `numpy>=2` (#364) Optimize `min_count` when `expected_groups` is not provided. (#236) Use threadpool for finding labels in chunk (#327) Manually fuse reindexing intermediates with blockwise reduction for cohorts. (#300) Bump codecov/codecov-action from 4.1.1 to 4.3.1 (#362) Add cubed notebook for hourly climatology example using "map-reduce" method (#356) Optimize bitmask finding for chunk size 1 and single chunk cases (#360) Edits to climatology doc (#361) Fix benchmarks (#358)

dcherian changed the title ~~Optimize bitmask finding some more.~~ Use threadpool for finding labels in chunk Jan 20, 2024

Base automatically changed from optimize-again to main January 20, 2024 22:01

dcherian force-pushed the threadpool branch from bda76d2 to 247824d Compare January 20, 2024 22:19

dcherian marked this pull request as ready for review January 20, 2024 22:19

dcherian enabled auto-merge (squash) January 20, 2024 22:19

dcherian disabled auto-merge January 20, 2024 22:21

Add threshold

7e8e717

dcherian force-pushed the threadpool branch from e87242c to 7e8e717 Compare January 21, 2024 00:36

Fix + comment

8450813

dcherian marked this pull request as draft January 21, 2024 03:39

dcherian and others added 4 commits April 25, 2024 09:52

Merge branch 'main' into threadpool

af78c26

Fix benchmark.

f455689

Tweak threshold

2823677

Merge branch 'main' into threadpool

7da4364

* main: Optimize bitmask finding for chunk size 1 and single chunk cases (#360) Edits to climatology doc (#361)

dcherian force-pushed the threadpool branch from b0b1297 to 7da4364 Compare April 27, 2024 05:02

dcherian and others added 6 commits May 1, 2024 23:20

Small cleanup

630e083

Merge branch 'main' into threadpool

8ec11d0

Comment

34c3374

Try single allocation

668f7f8

Revert "Try single allocation"

53479af

This reverts commit c6b93367e2024e60d77af24a69d177670a040dfc.

cleanup

ff0d8c2

dcherian marked this pull request as ready for review May 2, 2024 05:36

dcherian enabled auto-merge (squash) May 2, 2024 05:37

dcherian merged commit c398f4e into main May 2, 2024
15 checks passed

dcherian deleted the threadpool branch May 2, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use threadpool for finding labels in chunk #327

Use threadpool for finding labels in chunk #327

dcherian commented Jan 20, 2024 •

edited

Loading

Use threadpool for finding labels in chunk #327

Use threadpool for finding labels in chunk #327

Conversation

dcherian commented Jan 20, 2024 • edited Loading

dcherian commented Jan 20, 2024 •

edited

Loading