Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silence some warnings. #2328

Merged
merged 16 commits into from
Sep 4, 2018
Merged

Silence some warnings. #2328

merged 16 commits into from
Sep 4, 2018

Conversation

dcherian
Copy link
Contributor

  • Tests passed (for all non-documentation changes)

Remove some warnings.

@dcherian dcherian force-pushed the fix-some-warnings branch from 58218ff to 5f0e251 Compare July 29, 2018 02:22
@dcherian
Copy link
Contributor Author

Most of the remaining warnings are:
RuntimeWarning: Cannot close a netcdf_file opened with mmap=True, when netcdf_variables or arrays referring to its data still exist. All data arrays obtained from such files refer directly to data on disk, and must be copied before the file can be cleanly closed. (See netcdf_file docstring for more information on mmap.)

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
This is great.

Just a few comment regarding to isel_points and sel_points

@@ -693,7 +693,7 @@ def test_isel_fancy(self):
da.isel(time=(('points',), [1, 2]), x=(('points',), [2, 2]),
y=(('points',), [3, 4]))
np.testing.assert_allclose(
da.isel_points(time=[1], x=[2], y=[4]).values.squeeze(),
da.isel(time=[1], x=[2], y=[4]).values.squeeze(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not give the same result.
I think we can just ignore the warning by @pytest.mark.filterwarnings.

@@ -1417,7 +1420,7 @@ def test_sel_fancy(self):
assert_identical(actual['b'].drop('y'), idx_y['b'])

with pytest.raises(KeyError):
data.sel_points(x=[2.5], y=[2.0], method='pad', tolerance=1e-3)
data.sel(x=[2.5], y=[2.0], method='pad', tolerance=1e-3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, we can just ignore the deprecation warning.

@jhamman
Copy link
Member

jhamman commented Jul 29, 2018

Most of the remaining warnings are:
RuntimeWarning: Cannot close a netcdf_file opened with mmap=True, when netcdf_variables or arrays referring to its data still exist. All data arrays obtained from such files refer directly to data on disk, and must be copied before the file can be cleanly closed. (See netcdf_file docstring for more information on mmap.)

@dcherian - take a look here: #2261 (comment). These may be fixed soon.

Also, xrefing: #1652, #1657

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@@ -2519,6 +2520,7 @@ class PyNioTestAutocloseTrue(PyNioTest):


@requires_pseudonetcdf
@pytest.mark.filterwarnings('ignore:IOAPI_ISPH is assumed to be 6370000')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really nice! way better than using filterwarnings manually :)

@@ -464,7 +461,7 @@ def _rescale_imshow_rgb(darray, vmin, vmax, robust):
# After scaling, downcast to 32-bit float. This substantially reduces
# memory usage after we hand `darray` off to matplotlib.
darray = ((darray.astype('f8') - vmin) / (vmax - vmin)).astype('f4')
return minimum(maximum(darray, 0), 1)
return np.minimum(np.maximum(darray, 0), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__array_ufunc__ (which enalbes this on was added new in numpy 1.13. For now, we only require numpy 1.12, so let's stick with using xarray.ufuncs for now.

@dcherian
Copy link
Contributor Author

@jhamman, @fujiisoup, @shoyer Thanks!

@@ -176,6 +176,15 @@ def get_clean_interp_index(arr, dim, use_coordinate=True, **kwargs):
# raise if index cannot be cast to a float (e.g. MultiIndex)
try:
index = index.values.astype(np.float64)
if method != 'nearest':
Copy link
Contributor Author

@dcherian dcherian Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rescaling is necessary to avoid underflow/overflow RuntimeWarnings with datetime64 (seen in tests). I've made a similar change to interp() below.

@dcherian dcherian changed the title Fix some warnings. Silence some warnings. Jul 29, 2018
# Let's keep that compatitibility
index = (index - index.min())
if len(index) > 1:
index /= index.std()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard deviation is probably fine here, but I wonder if it would be better to guarantee that are all values fall within some fixed range [-MAX, MAX], e.g., by using max() instead:

index = index - index.mean()
index /= EPSILON * np.maximum(index.max(), 1)

where EPSILON is some constant, e.g., 1e-8.

if len(index) > 1:
index /= index.std()
index /= np.max(np.abs(index))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs a fix to account for possibly dividing by zero.

Also, I don't think you need abs() here because subtracting by mean() already centers the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abs() makes sure that it's always within [-1, 1] right? e.g. if index = [-2, 0, 0.9] and as long as index is not all zeroes, it should always work? 🤔
Sorry, I feel like I'm missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would work fine unless the array is all zeros/constant.

It still might be worth considering that possibility, though at this point I'm pretty sure we are just refining total edge case behavior.

Maybe:

min_value = index.min()
max_value = index.max()
range_ = max_value - min_value
if range_:
    midpoint = (min_value + max_value) / 2
    index = index - midpoint
    index /= range_

Copy link
Member

@shoyer shoyer Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to not really worry about this at all, besides silencing the warning. float64 is about the best we can do as far as precision, and it's a simple fact that you cannot represent every int64 value exactly in float64. In most cases this isn't a problem.

Arguably, this is the most predictable thing to do. I don't think it really helps with precision to rescale values in general, given that float64 can handle very large exponents just fine without any loss of precision.

I guess the only case where this would make a difference is if you really do care about nanosecond level precision and all your dates are within a single year or so. But that does seem a little unusual for xarray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The people doing lightning research probably don't think that scenario is all that unusual (don't know for sure, I just remember a conversation with a lightning researcher when he complained about matplotlib's lack of support for nanosecond tickers).

Copy link
Contributor Author

@dcherian dcherian Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to not really worry about this at all, besides silencing the warning. float64 is about the best we can do as far as precision, and it's a simple fact that you cannot represent every int64 value exactly in float64. In most cases this isn't a problem.

Ya, so the choice is between using filterwarnings or rescaling to avoid the warning. I have a mild preference for rescaling because it "solves" the problem. @shoyer: your call...

Hopefully, the lightning research people will chime in / send in a PR if this is actually an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just silence the warning here. I don't think rescaling actually fixes anything, and it's less efficient than silencing it.

dcherian added 13 commits August 20, 2018 17:35
These were being triggered by casting datetime64[ns] to float32.
We now rescale the co-ordinate before interpolating, except for
nearest-neighbour interpolation. The rescaling can change the
nearest neighbour, and so is avoided in this case to maintain
pandas compatibility.
This reverts commit 76f988f.
@@ -3532,6 +3532,8 @@ def test_rolling_reduce(da, center, min_periods, window, name):
@pytest.mark.parametrize('min_periods', (None, 1, 2, 3))
@pytest.mark.parametrize('window', (1, 2, 3, 4))
@pytest.mark.parametrize('name', ('sum', 'max'))
@pytest.mark.filterwarnings('ignore:Using a non-tuple sequence')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silences a warning that seems to be thrown by code in bottleneck

xarray/tests/test_dataarray.py::test_rolling_reduce_nonnumeric[sum-1-3-False]
  /home/travis/miniconda/envs/test_env/lib/python3.6/site-packages/bottleneck/slow/move.py:149: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
    nidx1 = n[idx1]
  /home/travis/miniconda/envs/test_env/lib/python3.6/site-packages/bottleneck/slow/move.py:150: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
    nidx1 = nidx1 - n[idx2]
  /home/travis/miniconda/envs/test_env/lib/python3.6/site-packages/bottleneck/slow/move.py:152: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
    idx[idx1] = nidx1 < min_count
  /home/travis/miniconda/envs/test_env/lib/python3.6/site-packages/bottleneck/slow/move.py:153: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
    idx[idx3] = n[idx3] < min_count

Copy link
Member

@shoyer shoyer Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- can you file a report in bottleneck? I expect this would be easy to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Filed a report upstream pydata/bottleneck#194 and reverted this commit.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One small comment on dask version stuff but I'm excited to see this go in.

with dask.set_options(get=dask.get):

with (dask.config.set(get=dask.get) if hasattr(dask, 'config')
else dask.set_options(get=dask.get)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just a version check? Generally, I prefer to see a version comparison so we can more obviously clean these things up when older versions are no longer supported.

(Same comment below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya it's basically a version check. I've made it an explicit version check now.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s merge this!

@dcherian dcherian merged commit a3ca579 into pydata:master Sep 4, 2018
@dcherian dcherian deleted the fix-some-warnings branch September 4, 2018 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants