-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make xr.corr and xr.map_blocks work without dask #5731
Conversation
Calls to dask functions are replaced by calls to the pycompat functions
Concerning the tests I added a test to check that lazy correlations have identical results to non-lazy correlations. But this is not directly related to the bug fix. Indeed there is a need to have a test that checks if the non-lazy correlation works without dask installed. The already existing test function About documenting the changes, I am not sure of the whats-new.rst format and I thus did not add it. Should I just add a bullet item on the top of the file ? |
Thanks @Gijom!
I can't tell either. It would be good to track this down.
Yes just copy an existing entry and modify it. |
The fonction is used to test for lazy datasets in map_blocks (parallel.py)
I included the proposed changes, thanks for the help. However the new test I implemented does not go through and I fail to understand why. Would you be able to check ? The idea of the test is to ensure that lazy computations give the same results than normal ones. I tracked down the problem to line 1377 of demeaned_da_a = da_a - da_a.mean(dim=dim) # <-- this one returns nan upon computation (.compute())
demeaned_da_b = da_b - da_b.mean(dim=dim) # <-- this one returns the correct value although the input is masked with nans in the same way The values before the
And for the means (dim is None):
For non lazy computations everything seems fine. |
the reason for this is that xarray/xarray/tests/test_computation.py Line 27 in a96154a
requires_dask I think it should be fine to just remove the importandskip call.
|
Unit Test Results 6 files 6 suites 49m 53s ⏱️ Results for commit 3dde39b. ♻️ This comment has been updated with latest results. |
The importskip was canceling all tests if dask is not present. Some tests are relevant to be done without dask. @require_dask was added to one test since it import dask directly
The tests do not pass for arrays 5 and 6. More work is needed to indentify why
@keewis indeed this was the problem. I removed it and corrected one test which did not have the @require_dask decorator. There is still a test error with the test arrays 5 and 6: the lazy array version do not return the same value than the usual corr. I consider this a different bug and just removed the tests with a TODO in the code. The last commit pass the tests on my machine and I added the whats-new text to version 0.19.1. |
Thanks @Gijom can you post the error for those tests please. |
I was referring to: Otherwise I am working on the conflict with the main branch. Should be done soon. |
Thanks @Gijom, I can't reproduce. Can you post the output of Here's mine
|
oh never mind: I can reproduce with
|
Ouch here's the problem: Importantly The fix is to specify Alternatively we could stick an |
Checking in here -- it seems like this PR surfaced an existing bug (incorrect calculation of correlation when using dask), at the same time as it corrected a different bug, which is removing the hard dependency on dask for calculating correlation? If I'm understanding correctly, this is still a net win (fixing one bug), so it probably makes sense to merge it now and save the other bug fix for a later issue? |
* upstream/main: (86 commits) Fixed a mispelling of dimension in dataarray documentation for from_dict (pydata#6020) [pre-commit.ci] pre-commit autoupdate (pydata#6014) [pre-commit.ci] pre-commit autoupdate (pydata#5990) Use set_options for asv bottleneck tests (pydata#5986) Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (pydata#5959) Check for py version instead of try/except when importing entry_points (pydata#5988) Add "see also" in to_dataframe docs (pydata#5978) Alternate method using inline css to hide regular html output in an untrusted notebook (pydata#5880) Fix mypy issue with entry_points (pydata#5979) Remove pre-commit auto update (pydata#5958) Do not change coordinate inplace when throwing error (pydata#5957) Create CITATION.cff (pydata#5956) Add groupby & resample benchmarks (pydata#5922) Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (pydata#5948) Disable unit test comments (pydata#5946) Publish test results from workflow_run only (pydata#5947) Generator for groupby reductions (pydata#5871) whats-new dev whats-new for 0.20.1 (pydata#5943) Docs: fix URL for PTSA (pydata#5935) ...
It was easy .Should be good to go if tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the unresponsiveness. I confirm that using the |
Co-authored-by: Mathias Hauser <[email protected]>
Thanks @Gijom |
I have a follow-up PR that simplifies the internal logic a little bit more: #6025 |
* upstream/main: fix grammatical typo in docs (pydata#6034) Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies (pydata#6007) Use complex nan by default when interpolating out of bounds (pydata#6019) Simplify missing value handling in xarray.corr (pydata#6025) Add pyXpcm to Related Projects doc page (pydata#6031) Make xr.corr and xr.map_blocks work without dask (pydata#5731)
pre-commit run --all-files
whats-new.rst