Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fixture that provides numpy, masked, and dask array functions #1935

Merged
merged 9 commits into from
Aug 19, 2021

Conversation

rpmanser
Copy link
Contributor

@rpmanser rpmanser commented Jun 24, 2021

Description Of Changes

This PR adds a fixture, array_type, that provides array types for tests. The fixture is used in test functions where appropriate (I think I covered all test cases without extending the length of time to run the test suite). dask was added as a test requirement (version is arbitrary).

Two bugs were discovered and fixed:

  • When wind_direction() was given masked arrays that contained calm values (u=0, v=0), the boolean indexing of the array to find these values and assignment caused masked wind speeds to become unmasked.
  • smooth_gaussian() returned a numpy.ndarray when given a numpy.ma.array as input.

The following functions are xfailed for dask arrays:

  • wind_direction(): the boolean mask used to index values needs to be the same size as the array it is indexing, which is a problem when correcting for negative wind speeds.
  • heat_index(): when adding values to a boolean indexed dask array, dask doesn't know the dimensions of the indexed array, so it can't broadcast the values.
  • apparent_temperature(): I'm not completely sure what is happening here, but the error suggests that the shape of the indexed values of hi do not match the shape of the array that it is being assigned to. Specifically, this happens here:
sel = (temperature <= units.Quantity(40., 'degF))
if np.any(sel):
    hi[sel] = temperature[sel].to(units.degF)

I noted the TODO in smooth_window() that discusses issues with lazy-loading and dask, but I did not attempt to fix the problem here.

Finally, the time to run the test suite increases from 119 seconds to 145 seconds on my machine. As it stands, testing functions against array types will significantly increase the amount of time it takes to run the test suite if implemented in all of the calc modules.

Checklist

@rpmanser rpmanser force-pushed the dask_tests_basic branch 2 times, most recently from 177cf45 to 645fa9a Compare July 22, 2021 00:36
conftest.py Outdated Show resolved Hide resolved
@dopplershift
Copy link
Member

Overall, I like the simplicity here and how easily we can invoke this. One thing that comes to mind, would it be better, instead of just returning some kind of function like np.array, instead have the function do all of it given data and units? That way we could potentially (here or in a future PR) extend this more easily to xarray (or 😱 xarray + dask)?

That would look something like:

@pytest.fixture(params=['dask', 'masked', 'numpy'])
def array_type(request):
    """Return an array type for testing calc functions."""
    if request.param == 'dask':
        dask_array = pytest.importorskip('dask.array', reason='dask.array is not available')
        return lambda d, u: units.Quantity(dask_array.array(d), u)
    elif request.param == 'masked':
        return lambda d, u: units.Quantity(numpy.ma.array(d), u)
    elif request.param == 'xarray':
        return lambda d, u: xr.DataArray(d, attrs={'units': u})
    else:
        return lambda d, u: units.Quantity(numpy.array(d), u)

@rpmanser rpmanser force-pushed the dask_tests_basic branch 2 times, most recently from 77f4813 to 9003c52 Compare August 14, 2021 21:00
* Use `array_type` as needed `test_basic.py`
* Fix bug in `wind_direction()` for masked arrays when calm. Boolean
  indexing of a masked array resulted in modification of the mask
  itself.
* Fix bug in `smooth_gaussian()` where function returned a
  `numpy.ndarray` when given a `numpy.ma.array`
* Remove `test_direction_masked()`
@rpmanser rpmanser marked this pull request as ready for review August 16, 2021 14:46
@rpmanser rpmanser requested a review from dcamron as a code owner August 16, 2021 14:46
Copy link
Member

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Heartening to have it catch a few things, but not an overwhelming amount.

Regarding the increased time, that's not unexpected to me. I'd rather have the coverage and the longer run time.

src/metpy/calc/basic.py Outdated Show resolved Hide resolved
src/metpy/calc/basic.py Show resolved Hide resolved
tests/calc/test_basic.py Outdated Show resolved Hide resolved
src/metpy/testing.py Outdated Show resolved Hide resolved
@dopplershift
Copy link
Member

I should also add that I really like how you handled having a mask available. Is there any reason you didn't add a mask for the other smoothing functions?

@dopplershift
Copy link
Member

So I think we could use a custom marker to mark tests that we expect to fail with Dask.

First, we need to register the custom marker in setup.cfg (we can remove the mpl_image_compare stuff that's there now):

[tool:pytest]
markers = xfail_dask: marks tests as expected fail with Dask arrays

Then the fixture needs to look for the marker and use that to mark that particular function run as xfail:

...
    if request.param == 'dask':
        dask_array = pytest.importorskip('dask.array', reason='dask.array is not available')
        marker = request.node.get_closest_marker('xfail_dask')
        if marker is not None:
            request.applymarker(pytest.mark.xfail(reason=marker.args[0]))
        return lambda d, u, *, mask=None: quantity(dask_array.array(d), u)
...

Then, instead of having the xfail_dask function in testing.py, instead use pytest.mark.xfail_dask with a similar call signature as a decorator on any of the relevant test functions:

@pytest.mark.xfail_dask('Boolean index assignment in Dask expects equally shaped arrays')
def test_direction_with_north_and_calm(array_type):
...

@dopplershift
Copy link
Member

I also looked briefly at using just pytest.mark.parameterize instead of fixtures, but that would get messy trying to use importorskip (even though I don't think what we're doing at this point is really a fixture).

@rpmanser
Copy link
Contributor Author

Thanks for the feedback @dopplershift! I aim to look at your comments more closely and (hopefully) make changes tomorrow evening.

@rpmanser
Copy link
Contributor Author

rpmanser commented Aug 19, 2021

I should also add that I really like how you handled having a mask available. Is there any reason you didn't add a mask for the other smoothing functions?

No, I just missed adding masks for those. Thanks for catching that.

I also looked briefly at using just pytest.mark.parameterize instead of fixtures, but that would get messy trying to use importorskip (even though I don't think what we're doing at this point is really a fixture).

Right, if I understand fixtures correctly, they're only supposed to provide data or a setup for the test. Parametrization would be preferable, but I don't see a way to use importorskip without creating a separate test module or unit tests in that case.

Copy link
Member

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see this infrastructure go in. This will make it much easier for us to check what we're doing.

@dopplershift dopplershift merged commit dd8e492 into Unidata:main Aug 19, 2021
@dopplershift dopplershift added Area: Tests Affects tests Type: Feature New functionality labels Aug 19, 2021
@dopplershift dopplershift added this to the 1.2.0 milestone Aug 19, 2021
@dopplershift dopplershift added the Type: Bug Something is not working like it should label Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Tests Affects tests Type: Bug Something is not working like it should Type: Feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants