Add a fixture that provides numpy, masked, and dask array functions #1935

rpmanser · 2021-06-24T03:00:35Z

Description Of Changes

This PR adds a fixture, array_type, that provides array types for tests. The fixture is used in test functions where appropriate (I think I covered all test cases without extending the length of time to run the test suite). dask was added as a test requirement (version is arbitrary).

Two bugs were discovered and fixed:

When wind_direction() was given masked arrays that contained calm values (u=0, v=0), the boolean indexing of the array to find these values and assignment caused masked wind speeds to become unmasked.
smooth_gaussian() returned a numpy.ndarray when given a numpy.ma.array as input.

The following functions are xfailed for dask arrays:

wind_direction(): the boolean mask used to index values needs to be the same size as the array it is indexing, which is a problem when correcting for negative wind speeds.
heat_index(): when adding values to a boolean indexed dask array, dask doesn't know the dimensions of the indexed array, so it can't broadcast the values.
apparent_temperature(): I'm not completely sure what is happening here, but the error suggests that the shape of the indexed values of hi do not match the shape of the array that it is being assigned to. Specifically, this happens here:

sel = (temperature <= units.Quantity(40., 'degF))
if np.any(sel):
    hi[sel] = temperature[sel].to(units.degF)

I noted the TODO in smooth_window() that discusses issues with lazy-loading and dask, but I did not attempt to fix the problem here.

Finally, the time to run the test suite increases from 119 seconds to 145 seconds on my machine. As it stands, testing functions against array types will significantly increase the amount of time it takes to run the test suite if implemented in all of the calc modules.

Checklist

Progress towards Systematize calculation testing #1214
Progress towards Support for Dask Arrays #1479
Tests added
Fully documented

conftest.py

dopplershift · 2021-07-22T20:45:10Z

Overall, I like the simplicity here and how easily we can invoke this. One thing that comes to mind, would it be better, instead of just returning some kind of function like np.array, instead have the function do all of it given data and units? That way we could potentially (here or in a future PR) extend this more easily to xarray (or 😱 xarray + dask)?

That would look something like:

@pytest.fixture(params=['dask', 'masked', 'numpy'])
def array_type(request):
    """Return an array type for testing calc functions."""
    if request.param == 'dask':
        dask_array = pytest.importorskip('dask.array', reason='dask.array is not available')
        return lambda d, u: units.Quantity(dask_array.array(d), u)
    elif request.param == 'masked':
        return lambda d, u: units.Quantity(numpy.ma.array(d), u)
    elif request.param == 'xarray':
        return lambda d, u: xr.DataArray(d, attrs={'units': u})
    else:
        return lambda d, u: units.Quantity(numpy.array(d), u)

* Use `array_type` as needed `test_basic.py` * Fix bug in `wind_direction()` for masked arrays when calm. Boolean indexing of a masked array resulted in modification of the mask itself. * Fix bug in `smooth_gaussian()` where function returned a `numpy.ndarray` when given a `numpy.ma.array` * Remove `test_direction_masked()`

dopplershift

This is awesome! Heartening to have it catch a few things, but not an overwhelming amount.

Regarding the increased time, that's not unexpected to me. I'd rather have the coverage and the longer run time.

src/metpy/calc/basic.py

tests/calc/test_basic.py

src/metpy/testing.py

dopplershift · 2021-08-16T20:08:29Z

I should also add that I really like how you handled having a mask available. Is there any reason you didn't add a mask for the other smoothing functions?

dopplershift · 2021-08-16T20:53:19Z

So I think we could use a custom marker to mark tests that we expect to fail with Dask.

First, we need to register the custom marker in setup.cfg (we can remove the mpl_image_compare stuff that's there now):

[tool:pytest]
markers = xfail_dask: marks tests as expected fail with Dask arrays

Then the fixture needs to look for the marker and use that to mark that particular function run as xfail:

...
    if request.param == 'dask':
        dask_array = pytest.importorskip('dask.array', reason='dask.array is not available')
        marker = request.node.get_closest_marker('xfail_dask')
        if marker is not None:
            request.applymarker(pytest.mark.xfail(reason=marker.args[0]))
        return lambda d, u, *, mask=None: quantity(dask_array.array(d), u)
...

Then, instead of having the xfail_dask function in testing.py, instead use pytest.mark.xfail_dask with a similar call signature as a decorator on any of the relevant test functions:

@pytest.mark.xfail_dask('Boolean index assignment in Dask expects equally shaped arrays')
def test_direction_with_north_and_calm(array_type):
...

dopplershift · 2021-08-16T21:06:03Z

I also looked briefly at using just pytest.mark.parameterize instead of fixtures, but that would get messy trying to use importorskip (even though I don't think what we're doing at this point is really a fixture).

rpmanser · 2021-08-17T15:53:36Z

Thanks for the feedback @dopplershift! I aim to look at your comments more closely and (hopefully) make changes tomorrow evening.

rpmanser · 2021-08-19T01:21:58Z

I should also add that I really like how you handled having a mask available. Is there any reason you didn't add a mask for the other smoothing functions?

No, I just missed adding masks for those. Thanks for catching that.

I also looked briefly at using just pytest.mark.parameterize instead of fixtures, but that would get messy trying to use importorskip (even though I don't think what we're doing at this point is really a fixture).

Right, if I understand fixtures correctly, they're only supposed to provide data or a setup for the test. Parametrization would be preferable, but I don't see a way to use importorskip without creating a separate test module or unit tests in that case.

dopplershift

I'm happy to see this infrastructure go in. This will make it much easier for us to check what we're doing.

rpmanser force-pushed the dask_tests_basic branch 2 times, most recently from 177cf45 to 645fa9a Compare July 22, 2021 00:36

dopplershift reviewed Jul 22, 2021

View reviewed changes

conftest.py Outdated Show resolved Hide resolved

rpmanser mentioned this pull request Jul 22, 2021

Prototype systematic data type testing #1388

Closed

Add a fixture that provides numpy, masked, and dask array functions

e3c5141

rpmanser force-pushed the dask_tests_basic branch from 645fa9a to e3c5141 Compare July 27, 2021 23:02

rpmanser added 4 commits July 27, 2021 18:16

Add xarray.DataArray; add units as arg; handle unsupported array types

b8018aa

Add a function to xfail tests when array_type returns a dask array

92a716a

Add mask keyword only argument in array_type for np.ma.array

8bdd064

Add dask as a test requirement

a56c399

rpmanser force-pushed the dask_tests_basic branch 2 times, most recently from 77f4813 to 9003c52 Compare August 14, 2021 21:00

rpmanser added 2 commits August 14, 2021 16:33

Fix linting issues

0270d23

rpmanser force-pushed the dask_tests_basic branch from 9003c52 to 0270d23 Compare August 14, 2021 21:34

rpmanser marked this pull request as ready for review August 16, 2021 14:46

rpmanser requested a review from dcamron as a code owner August 16, 2021 14:46

dopplershift reviewed Aug 16, 2021

View reviewed changes

src/metpy/calc/basic.py Outdated Show resolved Hide resolved

src/metpy/calc/basic.py Show resolved Hide resolved

tests/calc/test_basic.py Outdated Show resolved Hide resolved

src/metpy/testing.py Outdated Show resolved Hide resolved

rpmanser added 2 commits August 18, 2021 20:57

Add xfail_dask pytest marker; add masks in smoothing function tests

4fef8d8

Use np.asanyarray instead of explicitly checking for a mask

a047609

rpmanser force-pushed the dask_tests_basic branch from b476eb5 to a047609 Compare August 19, 2021 01:59

dopplershift mentioned this pull request Aug 19, 2021

Smoothing with masked data #2030

Open

dopplershift approved these changes Aug 19, 2021

View reviewed changes

dopplershift merged commit dd8e492 into Unidata:main Aug 19, 2021

dopplershift added Area: Tests Affects tests Type: Feature New functionality labels Aug 19, 2021

dopplershift added this to the 1.2.0 milestone Aug 19, 2021

dopplershift added the Type: Bug Something is not working like it should label Aug 19, 2021

dopplershift mentioned this pull request Aug 19, 2021

Systematize calculation testing #1214

Open

rpmanser mentioned this pull request Sep 9, 2021

Convert instances of boolean indexing to ndarrays to pass dask tests in basic module #2090

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fixture that provides numpy, masked, and dask array functions #1935

Add a fixture that provides numpy, masked, and dask array functions #1935

rpmanser commented Jun 24, 2021 •

edited

Loading

dopplershift commented Jul 22, 2021

dopplershift left a comment

dopplershift commented Aug 16, 2021

dopplershift commented Aug 16, 2021

dopplershift commented Aug 16, 2021

rpmanser commented Aug 17, 2021

rpmanser commented Aug 19, 2021 •

edited

Loading

dopplershift left a comment

Add a fixture that provides numpy, masked, and dask array functions #1935

Add a fixture that provides numpy, masked, and dask array functions #1935

Conversation

rpmanser commented Jun 24, 2021 • edited Loading

Description Of Changes

Checklist

dopplershift commented Jul 22, 2021

dopplershift left a comment

Choose a reason for hiding this comment

dopplershift commented Aug 16, 2021

dopplershift commented Aug 16, 2021

dopplershift commented Aug 16, 2021

rpmanser commented Aug 17, 2021

rpmanser commented Aug 19, 2021 • edited Loading

dopplershift left a comment

Choose a reason for hiding this comment

rpmanser commented Jun 24, 2021 •

edited

Loading

rpmanser commented Aug 19, 2021 •

edited

Loading