Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dask-friendly EWA resampler class (DaskEWAResampler) #284

Merged
merged 39 commits into from
Feb 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
5009b28
Add initial working dask-friendly EWA functions
djhoese Jun 1, 2020
e22f22c
Add dask-based EWA resampler class (DaskEWAResampler)
djhoese Jun 1, 2020
19134fc
Switch dask EWA to use dask reduction function
djhoese Jun 2, 2020
f634825
Allow dask EWA to persist ll2cr results
djhoese Jun 2, 2020
06779ee
Add xarray recreation to EWA dask resampler
djhoese Jun 3, 2020
8ae2a05
Replace more proj_dict usage in AreaDefinition with crs_wkt
djhoese Jun 3, 2020
b8b92c6
Add ability to run multi-band datasets through EWA dask resampling
djhoese Jun 3, 2020
42beea2
Fix EWA dask dimensions being incorrect
djhoese Jun 3, 2020
6cfd3ed
Fix geostationary extents tests not mocking crs properly
djhoese Jun 3, 2020
45aba2f
Fix stickler styling issues
djhoese Jun 3, 2020
f61451c
Use partial functions to pass keyword arguments to dask fornav
djhoese Jun 4, 2020
8129b81
Add maximum_weight_mode support for EWA dask resampler
djhoese Jun 8, 2020
6f47381
Merge branch 'master' into feature-dask-ewa
djhoese Jan 15, 2021
775d6fd
Add legacy dask-based EWA resampler from Satpy
djhoese Jan 17, 2021
0fce0de
Switch to pytest for EWA tests
djhoese Jan 18, 2021
85aadd4
Switch legacy EWA resampler tests to use parametrize
djhoese Jan 18, 2021
d84ef1f
Update legacy ewa tests to use real ll2cr/fornav calls
djhoese Jan 24, 2021
1d562a8
Add tests for new dask ewa code
djhoese Jan 25, 2021
f27ed5d
Remove leftover TODO comment
djhoese Jan 25, 2021
41f3618
Add basic numpy support to DaskEWAResampler
djhoese Jan 25, 2021
38140db
Fix integer handling in DaskEWAResampler
djhoese Jan 25, 2021
e475842
Fix stickler issues
djhoese Jan 25, 2021
4ff07a3
Add resampler method to DaskEWAResampler and move documentation there
djhoese Jan 28, 2021
50be51e
Refactor dask ewa to be a little cleaner
djhoese Jan 29, 2021
61c4b3b
Fix pyproj warnings related to EWA and AreaDefinition hashing
djhoese Jan 29, 2021
1c15abc
Fix dask ewa not calling the right function for reductions
djhoese Jan 29, 2021
fdff3fe
Fix stickler line too long
djhoese Jan 29, 2021
9cdc62b
Add resampler specific documentation about EWA
djhoese Feb 1, 2021
a385bac
Fix styling issues
djhoese Feb 1, 2021
262017b
Cleanup if branch in add_xy_coords
djhoese Feb 1, 2021
4d84edd
Resolve one set of reviewer comments
djhoese Feb 2, 2021
4c7f7d8
Refactor resampler and ewa tests
djhoese Feb 2, 2021
7eb28dc
Refactor dask ewa tests to be more parametrized
djhoese Feb 2, 2021
5e1f163
More dask ewa test refactoring
djhoese Feb 2, 2021
142f6ad
Fix flake8 issues in ewa
djhoese Feb 2, 2021
b1aee57
Refactor crs handling in AreaDefinition to be a property
djhoese Feb 2, 2021
6ba4847
Fix redundant imports in dask ewa tests
djhoese Feb 2, 2021
0348f3c
Merge branch 'master' into feature-dask-ewa
djhoese Feb 3, 2021
bf4b07b
Remove .coveragerc since setup.cfg was already being used
djhoese Feb 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .coveragerc

This file was deleted.

3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ include docs/Makefile
recursive-include docs/source *
include pyresample/test/test_files/*
include LICENSE.txt
include pyresample/ewa/*.h
include pyresample/ewa/_fornav_templates.*
include versioneer.py
include pyresample/version.py
include README.md
recursive-include pyresample *.pyx
146 changes: 84 additions & 62 deletions docs/source/swath.rst
Original file line number Diff line number Diff line change
Expand Up @@ -449,14 +449,16 @@ algorithms that pyresample provides. The KDTree-based algorithms
process each output grid pixel by searching for all "nearby" input
pixels and applying a certain interpolation (nearest neighbor, gaussian, etc).
The EWA algorithm processes each input pixel mapping it to one or more output
pixels. Once each input pixel has been analyzed the intermediate results are
pixels. Once each input pixel has been analyzed, the intermediate results are
averaged to produce the final gridded result.

The EWA algorithm also has limitations on how the input data is structured
The EWA algorithm also has limitations on how the input data are structured
compared to the generic KDTree algorithms. EWA assumes that data in the array
is organized geographically; adjacent data in the array is adjacent data
geographically. The algorithm uses this to configure parameters based on the
size and location of the swath pixels.
size and location of the swath pixels. It also assumes that data are
scan-based, recorded by a orbiting satellite scan by scan, and the user must
provide scan size with the ``rows_per_scan`` option.

The EWA algorithm consists of two
steps: ll2cr and fornav. The algorithm was originally part of the
Expand All @@ -467,73 +469,93 @@ quality images from VIIRS and AVHRR data with the right parameters.

.. note::

This code was originally part of the Polar2Grid project. This
This code was originally part of the CSPP Polar2Grid project. This
documentation and the API documentation for this algorithm may still
use references or concepts from Polar2Grid until everything can
be updated.

Gridding
********

The first step is called 'll2cr' which stands for "longitude/latitude to
column/row". This step maps the pixel location (lon/lat space) into area (grid)
space. Areas in pyresample are defined by a PROJ.4 projection specification.
An area is defined by the following parameters:

- Grid Name
- PROJ.4 String (either lat/lon or metered projection space)
- Width (number of pixels in the X direction)
- Height (number of pixels in the Y direction)
- Cell Width (pixel size in the X direction in grid units)
- Cell Height (pixel size in the Y direction in grid units)
- X Origin (upper-left X coordinate in grid units)
- Y Origin (upper-left Y coordinate in grid units)

Resampling
**********

The second step of EWA remapping is called "fornav", short for
"forward navigation". This EWA algorithm processes one input scan line
at a time. The algorithm weights the effect of an input pixel on an output
pixel based on its location in the scan line and other calculated
coefficients. It can also handle swaths that are not scan based by specifying
`rows_per_scan` as the number of rows in the entire swath.
How the algorithm treats the data can be configured with various
keyword arguments, see the API documentation for more information.
Both steps provide additional information to inform the user how much data
was used in the result. The first returned value of ll2cr tells you how many
of the input swath pixels overlap the grid. The first returned value of fornav
tells you how many grid points have valid data values in them.

Example
*******
Resampler
*********

The :class:`~pyresample.ewa.DaskEWAResampler` is the easiest way to use the
EWA resampling algorithm. Internally this resampler uses the ``dask`` library
to perform all of its operations in parallel. This will typically provide
better performance than any of the below methods, but does require the
``dask`` library to be installed. The below code assumes you have a
``swath_def`` object (:class:`~pyresample.geometry.SwathDefinition`), an
``area_def`` object (:class:`~pyresample.geometry.AreaDefinition`), and
some array data in ``data``. Data can be a numpy array, a dask array, or
an xarray DataArray object.

.. testsetup::

import numpy as np
from pyresample import geometry
import dask.array as da
import xarray as xr
area_def = geometry.AreaDefinition('areaD', 'Europe (3km, HRV, VTC)', 'areaD',
{'a': '6378144.0', 'b': '6356759.0',
'lat_0': '50.00', 'lat_ts': '50.00',
'lon_0': '8.00', 'proj': 'stere'},
800, 800,
[-1370912.72, -909968.64,
1029087.28, 1490031.36])
data = np.fromfunction(lambda y, x: y*x, (50, 10))
dask_data = da.from_array(data, chunks='auto')
xr_data = xr.DataArray(dask_data)
lons = np.fromfunction(lambda y, x: 3 + x, (50, 10))
lats = np.fromfunction(lambda y, x: 75 - y, (50, 10))
swath_def = geometry.SwathDefinition(lons=lons, lats=lats)
dask_lons = da.from_array(lons, chunks='auto')
dask_lats = da.from_array(lats, chunks='auto')
xr_lons = xr.DataArray(dask_lons)
xr_lats = xr.DataArray(dask_lats)
xr_swath_def = geometry.SwathDefinition(lons=xr_lons, lats=xr_lats)

.. testcode::

from pyresample.ewa import DaskEWAResampler
resampler = DaskEWAResampler(swath_def, area_def)

# if the data are scan based, specify how many data rows make up one scan
rows_per_scan = 5
result = resampler.resample(data, rows_per_scan=rows_per_scan)

Legacy Dask Resampler
*********************

.. note::
This resampler is similar to the above, but only works on xarray DataArray
objects backed by dask arrays. Although it uses dask underneath, it doesn't
use it optimally and in most cases will use a lot of memory.

EWA resampling in pyresample is still in an alpha stage. As development
continues, EWA resampling may be called differently.
.. testcode::

.. doctest::
from pyresample.ewa import LegacyDaskEWAResampler
resampler = LegacyDaskEWAResampler(xr_swath_def, area_def)

# if the data are scan based, specify how many data rows make up one scan
rows_per_scan = 5
result = resampler.resample(xr_data, rows_per_scan=rows_per_scan)

Legacy Function Interface
*************************

It is recommended to use the Resampler interfaces described above whenever
possible. However, the low-level ``ll2cr`` and ``fornav`` functions can be
used if desired. These functions will only work on basic numpy arrays and
although fast, they will use a lot of memory for large input arrays or large
target areas.

.. testcode::

from pyresample.ewa import ll2cr, fornav
# ll2cr converts swath longitudes and latitudes to grid columns and rows
swath_points_in_grid, cols, rows = ll2cr(swath_def, area_def)
# if the data are scan based, specify how many data rows make up one scan
rows_per_scan = 5
# fornav resamples the swath data to the gridded area
num_valid_points, gridded_data = fornav(cols, rows, area_def, data, rows_per_scan=rows_per_scan)

>>> import numpy as np
>>> from pyresample.ewa import ll2cr, fornav
>>> area_def = geometry.AreaDefinition('areaD', 'Europe (3km, HRV, VTC)', 'areaD',
... {'a': '6378144.0', 'b': '6356759.0',
... 'lat_0': '50.00', 'lat_ts': '50.00',
... 'lon_0': '8.00', 'proj': 'stere'},
... 800, 800,
... [-1370912.72, -909968.64,
... 1029087.28, 1490031.36])
>>> data = np.fromfunction(lambda y, x: y*x, (50, 10))
>>> lons = np.fromfunction(lambda y, x: 3 + x, (50, 10))
>>> lats = np.fromfunction(lambda y, x: 75 - y, (50, 10))
>>> swath_def = geometry.SwathDefinition(lons=lons, lats=lats)
>>> # ll2cr converts swath longitudes and latitudes to grid columns and rows
>>> swath_points_in_grid, cols, rows = ll2cr(swath_def, area_def)
>>> # if the data is scan based, specify how many data rows make up one scan
>>> rows_per_scan = 5
>>> # fornav resamples the swath data to the gridded area
>>> num_valid_points, gridded_data = fornav(cols, rows, area_def, data, rows_per_scan=rows_per_scan)

pyresample.bucket
-----------------
Expand Down
Loading