Make xr.corr and xr.map_blocks work without dask #5731

Gijom · 2021-08-23T08:53:04Z

Closes map_blocks doesn't work when dask isn't installed #3391
Closes Dask error on xarray.corr #5715
Tests added
Passes pre-commit run --all-files
User visible changes (including notable bug fixes) are documented in whats-new.rst

Calls to dask functions are replaced by calls to the pycompat functions

Gijom · 2021-08-23T08:57:36Z

Concerning the tests I added a test to check that lazy correlations have identical results to non-lazy correlations. But this is not directly related to the bug fix. Indeed there is a need to have a test that checks if the non-lazy correlation works without dask installed. The already existing test function test_corr does not have a @requires_dask decorator. But it did not fail as it should have. Not sure why.

About documenting the changes, I am not sure of the whats-new.rst format and I thus did not add it. Should I just add a bullet item on the top of the file ?

xarray/core/parallel.py

dcherian · 2021-08-23T14:46:28Z

Thanks @Gijom!

he already existing test function test_corr does not have a @requires_dask decorator. But it did not fail as it should have. Not sure why.

I can't tell either. It would be good to track this down.

I am not sure of the whats-new.rst format and I thus did not add it. Should I just add a bullet item on the top of the file ?

Yes just copy an existing entry and modify it.

The fonction is used to test for lazy datasets in map_blocks (parallel.py)

Gijom · 2021-08-24T07:43:28Z

I included the proposed changes, thanks for the help.

However the new test I implemented does not go through and I fail to understand why. Would you be able to check ? The idea of the test is to ensure that lazy computations give the same results than normal ones. I tracked down the problem to line 1377 of computation.py:

demeaned_da_a = da_a - da_a.mean(dim=dim)  # <-- this one returns nan upon computation (.compute())
demeaned_da_b = da_b - da_b.mean(dim=dim)  # <-- this one returns the correct value although the input is masked with nans in the same way

The values before the mean call are:

da_a.compute() = <xarray.DataArray <this-array> (x: 2, time: 2)>
array([[ 1.,  2.],
       [ 2., nan]])
da_b.compute() = <xarray.DataArray <this-array> (x: 2, time: 2)>
array([[ 1.,  2.],
       [ 1., nan]])

And for the means (dim is None):

da_a.mean(dim=dim).compute()=<xarray.DataArray <this-array> ()>
array(nan)
da_b.mean(dim=dim).compute()=<xarray.DataArray <this-array> ()>
array(1.33333333)

For non lazy computations everything seems fine.

keewis · 2021-08-30T05:24:57Z

The already existing test function test_corr does not have a @requires_dask decorator. But it did not fail as it should have. Not sure why.

the reason for this is that dask is imported using importorskip at the top of the module:

xarray/xarray/tests/test_computation.py

Line 27 in a96154a

dask = pytest.importorskip("dask")

Since most tests already have requires_dask I think it should be fine to just remove the importandskip call.

github-actions · 2021-08-30T05:26:55Z

Unit Test Results

        6 files         6 suites 49m 53s ⏱️
16 246 tests 14 512 ✔️ 1 734 💤 0 ❌
90 672 runs 82 496 ✔️ 8 176 💤 0 ❌

Results for commit 3dde39b.

♻️ This comment has been updated with latest results.

The importskip was canceling all tests if dask is not present. Some tests are relevant to be done without dask. @require_dask was added to one test since it import dask directly

The tests do not pass for arrays 5 and 6. More work is needed to indentify why

Gijom · 2021-08-31T10:10:15Z

@keewis indeed this was the problem. I removed it and corrected one test which did not have the @require_dask decorator.

There is still a test error with the test arrays 5 and 6: the lazy array version do not return the same value than the usual corr. I consider this a different bug and just removed the tests with a TODO in the code.

The last commit pass the tests on my machine and I added the whats-new text to version 0.19.1.

dcherian · 2021-08-31T14:36:19Z

Thanks @Gijom can you post the error for those tests please.

Gijom · 2021-08-31T14:40:24Z

I was referring to:
#5731 (comment)

Otherwise I am working on the conflict with the main branch. Should be done soon.

dcherian · 2021-08-31T14:51:43Z

Thanks @Gijom, I can't reproduce. Can you post the output of xr.show_versions()?

Here's mine

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 | packaged by conda-forge | (default, May 10 2021, 22:58:09) 
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.19.0
pandas: 1.3.1
numpy: 1.21.1
scipy: 1.5.3
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.3.0
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: 3.0.4
bottleneck: 1.3.2
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 21.2.3
conda: 4.10.3
pytest: 6.2.4
IPython: 7.26.0
sphinx: 4.1.2

dcherian · 2021-08-31T14:54:51Z

oh never mind: I can reproduce with

da_a = xr.DataArray([[1, 2], [2, 1]], dims=["x", "time"])
da_b = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"])

dim = None

xr.testing.assert_equal(
    xr.corr(da_a, da_b, dim=dim), xr.corr(da_a.chunk(), da_b.chunk(), dim=dim)
)
xr.testing.assert_equal(
    xr.corr(da_a, da_b, dim=dim), xr.corr(da_a, da_b.chunk(), dim=dim)
)

dcherian · 2021-08-31T15:19:57Z

Ouch here's the problem: da_a has dtype int if it has no NaNs. this dtype could change but need not change after the masking.

Importantly skipna=False by default for int dtypes (since they can't represent NaNs).

The fix is to specify skipna=True explicitly in the .mean calls.

Alternatively we could stick an .astype(float) in _get_valid_values

shoyer · 2021-11-23T22:47:30Z

Checking in here -- it seems like this PR surfaced an existing bug (incorrect calculation of correlation when using dask), at the same time as it corrected a different bug, which is removing the hard dependency on dask for calculating correlation?

If I'm understanding correctly, this is still a net win (fixing one bug), so it probably makes sense to merge it now and save the other bug fix for a later issue?

* upstream/main: (86 commits) Fixed a mispelling of dimension in dataarray documentation for from_dict (pydata#6020) [pre-commit.ci] pre-commit autoupdate (pydata#6014) [pre-commit.ci] pre-commit autoupdate (pydata#5990) Use set_options for asv bottleneck tests (pydata#5986) Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (pydata#5959) Check for py version instead of try/except when importing entry_points (pydata#5988) Add "see also" in to_dataframe docs (pydata#5978) Alternate method using inline css to hide regular html output in an untrusted notebook (pydata#5880) Fix mypy issue with entry_points (pydata#5979) Remove pre-commit auto update (pydata#5958) Do not change coordinate inplace when throwing error (pydata#5957) Create CITATION.cff (pydata#5956) Add groupby & resample benchmarks (pydata#5922) Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (pydata#5948) Disable unit test comments (pydata#5946) Publish test results from workflow_run only (pydata#5947) Generator for groupby reductions (pydata#5871) whats-new dev whats-new for 0.20.1 (pydata#5943) Docs: fix URL for PTSA (pydata#5935) ...

dcherian · 2021-11-24T03:18:36Z

so it probably makes sense to merge it now and save the other bug fix for a later issue?

It was easy .Should be good to go if tests pass.

mathause

Does this need a test for #3391? @dcherian? Could also add a what's new entry for #3391.

doc/whats-new.rst

xarray/tests/test_computation.py

Gijom · 2021-11-24T10:08:04Z

Sorry for the unresponsiveness. I confirm that using the asfloat conversion also works for me. Should I close the issue or should I let you do it ?

Co-authored-by: Mathias Hauser <[email protected]>

dcherian · 2021-11-24T17:25:17Z

Thanks @Gijom

shoyer · 2021-11-24T17:48:52Z

I have a follow-up PR that simplifies the internal logic a little bit more: #6025

* upstream/main: fix grammatical typo in docs (pydata#6034) Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies (pydata#6007) Use complex nan by default when interpolating out of bounds (pydata#6019) Simplify missing value handling in xarray.corr (pydata#6025) Add pyXpcm to Related Projects doc page (pydata#6031) Make xr.corr and xr.map_blocks work without dask (pydata#5731)

Gijom added 2 commits August 23, 2021 08:46

Fix bug on duck access without dask (pydata#3391, pydata#5715)

e26fa13

Calls to dask functions are replaced by calls to the pycompat functions

Test if lazy corr is equal to corr

9089438

dcherian reviewed Aug 23, 2021

View reviewed changes

xarray/core/parallel.py Outdated Show resolved Hide resolved

Gijom added 2 commits August 23, 2021 17:33

Create is_dask_collection function

ee7a65b

The fonction is used to test for lazy datasets in map_blocks (parallel.py)

Correct test function name

a96154a

Gijom added 3 commits August 30, 2021 18:36

Remove dask importskip

bd30f07

The importskip was canceling all tests if dask is not present. Some tests are relevant to be done without dask. @require_dask was added to one test since it import dask directly

Remove tests for arrays 5 and 6

dbf5eec

The tests do not pass for arrays 5 and 6. More work is needed to indentify why

Add changes to what-new.rst

b9dfbe6

Gijom added 3 commits August 31, 2021 17:15

Merge branch 'main' into corr-bug

1787bb5

Return None

39cffd1

fix - add dimension to tests

3dde39b

TomNicholas mentioned this pull request Oct 23, 2021

Release v0.20? #5889

Closed

dcherian marked this pull request as draft October 24, 2021 09:56

dcherian added 3 commits November 23, 2021 20:06

update whats-new

2d86f2c

bugfix

32545a3

dcherian marked this pull request as ready for review November 24, 2021 03:17

dcherian changed the title ~~Fix bug on duck access without dask~~ Make xr.corr and xr.map_blocks work without dask Nov 24, 2021

mathause reviewed Nov 24, 2021

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

xarray/tests/test_computation.py Outdated Show resolved Hide resolved

Apply suggestions from code review

52d82c9

Co-authored-by: Mathias Hauser <[email protected]>

dcherian merged commit 21e8484 into pydata:main Nov 24, 2021

shoyer mentioned this pull request Nov 24, 2021

Simplify missing value handling in xarray.corr #6025

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make xr.corr and xr.map_blocks work without dask #5731

Make xr.corr and xr.map_blocks work without dask #5731

Gijom commented Aug 23, 2021 •

edited by mathause

Loading

Gijom commented Aug 23, 2021

dcherian commented Aug 23, 2021 •

edited

Loading

Gijom commented Aug 24, 2021 •

edited

Loading

keewis commented Aug 30, 2021

github-actions bot commented Aug 30, 2021 •

edited

Loading

Gijom commented Aug 31, 2021

dcherian commented Aug 31, 2021

Gijom commented Aug 31, 2021

dcherian commented Aug 31, 2021 •

edited

Loading

dcherian commented Aug 31, 2021

dcherian commented Aug 31, 2021

shoyer commented Nov 23, 2021

dcherian commented Nov 24, 2021

mathause left a comment

Gijom commented Nov 24, 2021

dcherian commented Nov 24, 2021

shoyer commented Nov 24, 2021

Make xr.corr and xr.map_blocks work without dask #5731

Make xr.corr and xr.map_blocks work without dask #5731

Conversation

Gijom commented Aug 23, 2021 • edited by mathause Loading

Gijom commented Aug 23, 2021

dcherian commented Aug 23, 2021 • edited Loading

Gijom commented Aug 24, 2021 • edited Loading

keewis commented Aug 30, 2021

github-actions bot commented Aug 30, 2021 • edited Loading

Unit Test Results

Gijom commented Aug 31, 2021

dcherian commented Aug 31, 2021

Gijom commented Aug 31, 2021

dcherian commented Aug 31, 2021 • edited Loading

dcherian commented Aug 31, 2021

dcherian commented Aug 31, 2021

shoyer commented Nov 23, 2021

dcherian commented Nov 24, 2021

mathause left a comment

Choose a reason for hiding this comment

Gijom commented Nov 24, 2021

dcherian commented Nov 24, 2021

shoyer commented Nov 24, 2021

Gijom commented Aug 23, 2021 •

edited by mathause

Loading

dcherian commented Aug 23, 2021 •

edited

Loading

Gijom commented Aug 24, 2021 •

edited

Loading

github-actions bot commented Aug 30, 2021 •

edited

Loading

dcherian commented Aug 31, 2021 •

edited

Loading