-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataArray.idxmax() #60
Comments
Just as I am interested in having this functionality, and the new from wherever import argmax, take # numpy or dask
def gufunc_idxmax(x, y, axis=None):
indx = argmax(x, axis)
return take(y, indx)
def idxmax(obj, dim):
sig = ([(dim,), (dim,)], [()])
kwargs = {'axis': -1}
return apply_ufunc(gufunc_idxmin, obj, obj[dim],
signature=sig, kwargs=kwargs,
dask_array='allowed') |
See http://stackoverflow.com/questions/40179593/how-to-get-the-coordinates-of-the-maximum-in-xarray for examples of how to do this with the current version of xarray. @MaximilianR's answer using @jcmgray Your proposal looks pretty close to me. But to handle higher dimension arrays, instead of I think something like the following would work: def _index_from_1d_array(array, indices):
return array[indices,]
def gufunc_idxmax(x, y, axis=None):
# note: y is always a numpy.ndarray, because IndexVariable objects
# always have their data loaded into memory
indx = argmax(x, axis)
func = functools.partial(_index_from_1d_array, y)
if isinstance(array, dask_array_type):
import dask.array as da
return da.map_blocks(func, indx, dtype=indx.dtype)
else:
return func(indx) |
So I thought Regarding edge cases: multiple maxes is presumably fine as long as user is aware it just takes the first. |
I just merged #1237 -- see if it works with that.
Yeah, that's not a problem here, only for the
This behavior for nanargmax is unfortunate. The "right" behavior for xarray is probably to use |
Ah yes both ways are working now, thanks. Just had a little play around with timings, and this seems like a reasonably quick way to achieve correct NaN behaviour: def xr_idxmax(obj, dim):
sig = ([(dim,), (dim,)], [()])
kwargs = {'axis': -1}
allna = obj.isnull().all(dim)
return apply_ufunc(gufunc_idxmax, obj.fillna(-np.inf), obj[dim],
signature=sig, kwargs=kwargs,
dask_array='allowed').where(~allna).fillna(np.nan) i.e. originally replace all NaN values with -Inf, use the usual |
Yes, that looks pretty reasonable. Two minor concerns:
|
Would using Ah yes true. I was slightly anticipating e.g. filling with NaT if the |
Indeed,
Yes, ideally we would detect the dtype and find an appropriate fill or minimum value, similar to |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity |
This is still relevant |
this is still very relevant |
I got around this with some (masked) numpy operations. perhaps it is useful? I was seeing the
big piece here is modifying the mask directly and making sure that is correct. numpy docs advise against this approach but it seems to be giving me what I want. |
How would import xarray as xr
xr.DataArray([1, 3, 2]).argmax() Could this be closed? |
e.g., |
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2.3.1 to 2.3.2. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v2.3.1...v2.3.2) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html
The text was updated successfully, but these errors were encountered: