-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly faster cohorts detection. #272
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The profile looks like
I strongly suspect we can do better. The |
Down to
|
Need a way to assign to a nested list. This is why I chose the numpy object array route in the first place :) Punting to later, since this is already a massive improvement. |
|
Benchmarks seem to be broken after the numbagg PR. I'll fix in a new branch. |
dcherian
added a commit
that referenced
this pull request
Nov 3, 2023
* main: (24 commits) Add `packaging` as dependency use engine flox for ordered groups (#266) Update pyproject.toml: py3.12 Bump numpy to >=1.22 (#278) Cleanups (#276) benchmarks updates (#273) repo-review comments (#270) Significantly faster cohorts detection. (#272) Add engine="numbagg" (#72) Support quantile, median, mode with method="blockwise". (#269) Add multidimensional binning demo (#203) [pre-commit.ci] pre-commit autoupdate (#268) Drop python 3.8, test python 3.11 (#209) tests: move xfail out of functions (#265) Bump actions/checkout from 3 to 4 (#267) convert datetime: micro-optimizations (#261) compatibility with `numpy>=2.0` (#257) replace the deprecated `provision-with-micromamba` with `setup-micromamba` (#258) Fix some typing errors in asv_bench and tests (#253) [pre-commit.ci] pre-commit autoupdate (#250) ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #271
I was iterating over
array.blocks
to figure out the shape of each chunk.When indexing this object, it creates a dask array per chunk, which is slow for many reasons
Replace with a function that calculates the chunk shape. On the arco era5 with 93044 time chunks, this is a speedup from infinite time to 840ms.