Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use threadpool for finding labels in chunk #327

Merged
merged 13 commits into from
May 2, 2024
Merged

Use threadpool for finding labels in chunk #327

merged 13 commits into from
May 2, 2024

Conversation

dcherian
Copy link
Collaborator

@dcherian dcherian commented Jan 20, 2024

  • Figure out a good heuristic for when to use the threadpool

Great when we have lots of decent size chunks, particularly the NWM county groupby: 600ms -> 400ms.

| Change   | Before [627bf2b6] <main>   | After [7da43646] <threadpool>   |   Ratio | Benchmark (Parameter)                      |
|----------|----------------------------|---------------------------------|---------|--------------------------------------------|
| -        | 15.3±0.2ms                 | 7.93±0.03ms                     |    0.52 | cohorts.NWMMidwest.time_find_group_cohorts |

@dcherian dcherian changed the title Optimize bitmask finding some more. Use threadpool for finding labels in chunk Jan 20, 2024
Base automatically changed from optimize-again to main January 20, 2024 22:01
Great when we have lots of decent size chunks, particularly the NWM
county groupby: 600ms -> 400ms.

```
| Before [0cccb90] <optimize-again>   | After [38fe8a6c] <threadpool>   |   Ratio | Benchmark (Parameter)                       |
|--------------------------------------|---------------------------------|---------|---------------------------------------------|
| 3.50±0.2ms                           | 2.93±0.07ms                     |    0.84 | cohorts.PerfectMonthly.time_graph_construct |
| 20.0±1ms                             | 9.66±1ms                        |    0.48 | cohorts.NWMMidwest.time_find_group_cohorts  |
```
@dcherian dcherian marked this pull request as ready for review January 20, 2024 22:19
@dcherian dcherian enabled auto-merge (squash) January 20, 2024 22:19
@dcherian dcherian disabled auto-merge January 20, 2024 22:21
@dcherian dcherian marked this pull request as draft January 21, 2024 03:39
dcherian and others added 4 commits April 25, 2024 09:52
* main:
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
@dcherian dcherian marked this pull request as ready for review May 2, 2024 05:36
@dcherian dcherian enabled auto-merge (squash) May 2, 2024 05:37
@dcherian dcherian merged commit c398f4e into main May 2, 2024
15 checks passed
@dcherian dcherian deleted the threadpool branch May 2, 2024 06:17
dcherian added a commit that referenced this pull request May 2, 2024
* main: (64 commits)
  import `normalize_axis_index` from `numpy.lib` on `numpy>=2` (#364)
  Optimize `min_count` when `expected_groups` is not provided. (#236)
  Use threadpool for finding labels in chunk (#327)
  Manually fuse reindexing intermediates with blockwise reduction for cohorts. (#300)
  Bump codecov/codecov-action from 4.1.1 to 4.3.1 (#362)
  Add cubed notebook for hourly climatology example using "map-reduce" method (#356)
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
  Fix benchmarks (#358)
  Trim CI (#355)
  [pre-commit.ci] pre-commit autoupdate (#350)
  Initial minimal working Cubed example for "map-reduce" (#352)
  Bump codecov/codecov-action from 4.1.0 to 4.1.1 (#349)
  `method` heuristics: Avoid dot product as much as possible (#347)
  Fix nanlen with strings (#344)
  Fix direct quantile reduction (#343)
  Fix upstream-dev CI, silence warnings (#341)
  Bump codecov/codecov-action from 4.0.0 to 4.1.0 (#338)
  Fix direct reductions of Xarray objects (#339)
  Test with py3.12 (#336)
  ...
dcherian added a commit that referenced this pull request Jun 30, 2024
* main:
  Bump codecov/codecov-action from 4.3.1 to 4.4.1 (#366)
  Cubed blockwise (#357)
  Remove errant print statement
  import `normalize_axis_index` from `numpy.lib` on `numpy>=2` (#364)
  Optimize `min_count` when `expected_groups` is not provided. (#236)
  Use threadpool for finding labels in chunk (#327)
  Manually fuse reindexing intermediates with blockwise reduction for cohorts. (#300)
  Bump codecov/codecov-action from 4.1.1 to 4.3.1 (#362)
  Add cubed notebook for hourly climatology example using "map-reduce" method (#356)
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
  Fix benchmarks (#358)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant