Merge remote-tracking branch 'upstream/main' into numbagg

* upstream/main: Support quantile, median, mode with method="blockwise". (#269) Add multidimensional binning demo (#203) [pre-commit.ci] pre-commit autoupdate (#268)
xarray-contrib · Oct 5, 2023 · 412f31f · 412f31f
2 parents e1eda24 + 68b122e
commit 412f31f
Show file tree

Hide file tree

Showing 14 changed files with 628 additions and 61 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -4,7 +4,7 @@ ci:
 repos:
     - repo: https://github.com/astral-sh/ruff-pre-commit
       # Ruff version.
-      rev: 'v0.0.276'
+      rev: 'v0.0.292'
       hooks:
         - id: ruff
           args: ["--fix"]
@@ -18,12 +18,12 @@ repos:
         - id: check-docstring-first
 
     - repo: https://github.com/psf/black
-      rev: 23.3.0
+      rev: 23.9.1
       hooks:
         - id: black
 
     - repo: https://github.com/executablebooks/mdformat
-      rev: 0.7.16
+      rev: 0.7.17
       hooks:
       - id: mdformat
         additional_dependencies:
@@ -44,13 +44,13 @@ repos:
           args: [--extra-keys=metadata.kernelspec metadata.language_info.version]
 
     - repo: https://github.com/codespell-project/codespell
-      rev: v2.2.5
+      rev: v2.2.6
       hooks:
         - id: codespell
           additional_dependencies:
             - tomli
 
     - repo: https://github.com/abravalheri/validate-pyproject
-      rev: v0.13
+      rev: v0.14
       hooks:
         - id: validate-pyproject
diff --git a/ci/environment.yml b/ci/environment.yml
@@ -22,5 +22,6 @@ dependencies:
   - pooch
   - toolz
   - numba
+  - scipy
   - pip:
     - git+https://github.com/numbagg/numbagg
diff --git a/docs/source/aggregations.md b/docs/source/aggregations.md
@@ -11,8 +11,11 @@ the `func` kwarg:
 - `"std"`, `"nanstd"`
 - `"argmin"`
 - `"argmax"`
-- `"first"`
-- `"last"`
+- `"first"`, `"nanfirst"`
+- `"last"`, `"nanlast"`
+- `"median"`, `"nanmedian"`
+- `"mode"`, `"nanmode"`
+- `"quantile"`, `"nanquantile"`
 
 ```{tip}
 We would like to add support for `cumsum`, `cumprod` ([issue](https://github.com/xarray-contrib/flox/issues/91)). Contributions are welcome!

diff --git a/docs/source/implementation.md b/docs/source/implementation.md
@@ -199,7 +199,7 @@ width: 100%
 1. Group labels must be known at graph construction time, so this only works for numpy arrays.
 1. This does require more tasks and a more complicated graph, but the communication overhead can be significantly lower.
 1. The detection of "cohorts" is currently slow but could be improved.
-1. The extra effort of detecting cohorts and mul;tiple copying of intermediate blocks may be worthwhile only if the chunk sizes are small
+1. The extra effort of detecting cohorts and multiple copying of intermediate blocks may be worthwhile only if the chunk sizes are small
    relative to the approximate period of group labels, or small relative to the size of spatially localized groups.
 
 ### Example : sensitivity to chunking

diff --git a/docs/source/user-stories.md b/docs/source/user-stories.md
@@ -8,4 +8,5 @@
    user-stories/climatology.ipynb
    user-stories/climatology-hourly.ipynb
    user-stories/custom-aggregations.ipynb
+   user-stories/nD-bins.ipynb
 ```
diff --git a/docs/source/user-stories/custom-aggregations.ipynb b/docs/source/user-stories/custom-aggregations.ipynb
@@ -15,8 +15,13 @@
     ">\n",
     ">     A = da.groupby(['lon_bins', 'lat_bins']).mode()\n",
     "\n",
-    "This notebook will describe how to accomplish this using a custom `Aggregation`\n",
-    "since `mode` and `median` aren't supported by flox yet.\n"
+    "This notebook will describe how to accomplish this using a custom `Aggregation`.\n",
+    "\n",
+    "\n",
+    "```{tip}\n",
+    "flox now supports `mode`, `nanmode`, `quantile`, `nanquantile`, `median`, `nanmedian` using exactly the same \n",
+    "approach as shown below\n",
+    "```\n"
    ]
   },
   {
@@ -135,7 +140,7 @@
     "    # The next are for dask inputs and describe how to reduce\n",
     "    # the data in parallel\n",
     "    chunk=(\"sum\", \"nanlen\"), # first compute these blockwise : (grouped_sum, grouped_count)\n",
-    "    combine=(\"sum\", \"sum\"), #  reduce intermediate reuslts (sum the sums, sum the counts)\n",
+    "    combine=(\"sum\", \"sum\"), #  reduce intermediate results (sum the sums, sum the counts)\n",
     "    finalize=lambda sum_, count: sum_ / count, # final mean value (divide sum by count)\n",
     "\n",
     "    fill_value=(0, 0),  # fill value for intermediate  sums and counts when groups have no members\n",
-Original file line number
+Diff line change
@@ Expand Up / @@ -22,5 +22,6 @@ dependencies: @@
       - pooch
       - toolz
       - numba
+      - scipy
       - pip:
         - git+https://github.com/numbagg/numbagg