Skip to content

Commit

Permalink
Update docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Oct 3, 2023
1 parent 6715587 commit e92f483
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions docs/source/engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,20 @@
1. `engine="numba"` wraps `numpy_groupies.aggregate_numba`. This uses `numba` kernels for the core aggregation.
1. `engine="flox"` uses the `ufunc.reduceat` method after first argsorting the array so that all group members occur sequentially. This was copied from
a [gist by Stephan Hoyer](https://gist.github.com/shoyer/f538ac78ae904c936844)
1. `engine="numbagg"` uses the reductions available in [`numbagg.grouped`](https://github.com/numbagg/numbagg/blob/main/numbagg/grouped.py)
from the [numbagg](https://github.com/numbagg/numbagg) project.

See [](arrays) for more details.

## Tradeoffs

For the common case of reducing a nD array by a 1D array of group labels (e.g. `groupby("time.month")`), `engine="flox"` *can* be faster.
For the common case of reducing a nD array by a 1D array of group labels (e.g. `groupby("time.month")`), `engine="numbagg"` is almost always faster, and `engine="flox"` *can* be faster.

The reason is that `numpy_groupies` converts all groupby problems to a 1D problem, this can involve [some overhead](https://github.com/ml31415/numpy-groupies/pull/46).
It is possible to optimize this a bit in `flox` or `numpy_groupies`, but the work has not been done yet.
The advantage of `engine="numpy"` is that it tends to work for more array types, since it appears to be more common to implement `np.bincount`, and not `np.add.reduceat`.

```{tip}
Other potential engines we could add are [`numbagg`](https://github.com/numbagg/numbagg) ([stalled PR here](https://github.com/xarray-contrib/flox/pull/72)) and [`datashader`](https://github.com/xarray-contrib/flox/issues/142).
Both use numba for high-performance aggregations. Contributions or discussion is very welcome!
One other potential engine we could add is [`datashader`](https://github.com/xarray-contrib/flox/issues/142).
Contributions or discussion is very welcome!
```

0 comments on commit e92f483

Please sign in to comment.