Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize CRS handling across notebooks/utilities #600

Closed
Kirill888 opened this issue Jul 6, 2020 · 7 comments
Closed

Harmonize CRS handling across notebooks/utilities #600

Kirill888 opened this issue Jul 6, 2020 · 7 comments
Labels
help wanted Extra attention is needed priority Issues that should be fixed ASAP

Comments

@Kirill888
Copy link
Contributor

History

The way CRS information is stored and accessed in datacube has changed over the years. Those changes were driven primarily by two factors: better interoperability with xarray/rasterio and other geospatial libraries and better "handling of various xarray operations".

Initially CRS information was stored as a datacube CRS class in an attribute named crs on each datavariable and on dataset object returned from dc.load(..). This approach was problematic as it prevented methods like xarray.to_netcdf(..) from working (didn't know what to do with crs attribute that is not a string). It also suffered from "missing attribute" problem, some common xarray operations like type casting could drop crs attribute.

Datacube code was then changed for a brief period to store crs attribute on the x,y coordinates, and to be a string and not a CRS class, this allowed for better "survivability" of CRS data across per-pixel operations, as coordinates are essentially copied from source to destination including all the attributes attached to them. Turning crs into a string allowed for ability to save data with xarray.to_netcdf method.

Then we had a proposal to change representation yet again to be inline with what NetCDF CF convention is using. This had a lot of merit and provided with cleaner solution than what we had at that point in time, so we went along and implemented that as part of 1.8.0 release.

Current State

  • CRS information is stored in "NetCDF CF"-like convention as text attributes on a dimensionless coordinate named spatial_ref
  • Since this a coordinate and not an attribute it doesn't go missing when performing per pixel operations, it's always propagated from source to destination.
  • You access that information via .geobox "dynamic property", available after importing datacube
  • .geobox can also detect CRS information as returned by xarray.open_(dataset|rasterio)
  • .crs property is no longer populated with CRS class for data returned by dc.load, one must use .geobox.crs
  • .crs property is populated by xarray.open_rasterio and is set to a string representation of the CRS, usually WKT or EPSG code
  • If CRS information can not be found .geobox should return None, if it raises an Error of any sort, this should be considered a bug and fixed.
  • .geobox survives round trip to netcdf and back
  • As of 1.8.1, there is now datacube.utils.geometry.assign_crs method. This is a preferred mechanism to add CRS to unregistered data and to convert simple attribute based representation to a more robust coordinates based representation.

Changes Required to Notebooks

  • Do not access xx.crs, instead access xx.geobox.crs, add assert xx.geobox is not None to indicate to people reading code and to check at run-time that crs information is required/available
  • Do not inject crs attribute directly into dataset/dataarray. You probably don't need to do it anymore, it probably survived computation just fine, or has been detected if data was returned from one of xarray load functions
  • If loading data directly from files via xarray methods, use assign_crs to convert to Datacube format that survives per pixel operations better
  • Do not assume that spatial dimensions of a dataset are the same as xx.geobox.dims, this does not work for external datasets in ESPG:4326 loaded via xarray open functions (it names coordinates as x,y always), use datacube.utils.spatial_dims instead.

Those changes are difficult to make in a way that will work with older and newer versions of datacube, so I propose we just change to support newest version only 1.8.1 and forward. Probably 1.8.2 that is yet to be released as there has been some more fixes to .geobox handling for external sources since 1.8.1.

I have been able to run following with current datacube code after some changes to notebooks and to xr_rasterize function

Applying_WOfS_bitmasking.ipynb .................                                     [  6%]
Calculating_band_indices.ipynb .........                                             [ 10%]
Contour_extraction.ipynb ................                                            [ 16%]
Exporting_GeoTIFFs.ipynb ..........                                                  [ 20%]
Exporting_NetCDFs.ipynb .........                                                    [ 24%]
Image_segmentation.ipynb ...............                                             [ 30%]
Imagery_on_web_map.ipynb ............                                                [ 35%]
Integrating_external_data.ipynb ............                                         [ 40%]
Machine_learning_with_ODC.ipynb ....................                                 [ 48%]
Masking_data.ipynb ..............                                                    [ 53%]
Opening_GeoTIFFs_NetCDFs.ipynb ...................                                   [ 61%]
Pan_sharpening_Brovey.ipynb .............                                            [ 66%]
Rasterize_vectorize.ipynb .............                                              [ 71%]
Using_load_ard.ipynb ................                                                [ 78%]
Virtual_products.ipynb ...........................................                   [ 95%]
Working_with_time.ipynb ............                                                 [100%]

but there are way more notebooks than these...

The tricky part is synchronising updates to sandbox/dea on NCI/notebooks repos.

@robbibt robbibt closed this as completed Jul 6, 2020
@robbibt robbibt reopened this Jul 6, 2020
@Kirill888
Copy link
Contributor Author

Needed changes look something like this 39c0532, except that there are more places

Scripts/notebookapp_miningrehab.py:85:    ds_resampled.attrs["crs"] = dataset_fc.crs
Scripts/notebookapp_changefilmstrips.py:214:        ds_geomedian.attrs['crs'] = ds.crs
Frequently_used_code/Contour_extraction.ipynb:738:    "Another way to get around this issue would be to assign a `crs` attribute to `s2_ds.NDWI` before we attempt to extract contours (e.g. `s2_ds.NDWI.attrs['crs'] = s2_ds.crs`)"
Frequently_used_code/Machine_learning_with_ODC.ipynb:780:    "predicted.attrs['crs'] = geometry.CRS(\"EPSG:3577\")\n",
Real_world_examples/Intertidal_elevation.ipynb:863:    "intertidal_dem_ds.attrs['crs'] = landsat_ds.crs\n",

@robbibt robbibt added help wanted Extra attention is needed priority Issues that should be fixed ASAP labels Jul 7, 2020
@Kirill888
Copy link
Contributor Author

Using unstable image on AU sandbox and this branch I have all notebooks in Frequently_used_code passing

===================================================== test session starts ======================================================
platform linux -- Python 3.6.9, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/jovyan/wk/dea-notebooks/Frequently_used_code
plugins: nbval-0.9.5
collected 292 items                                                                                                            

Analyse_multiple_polygons.ipynb .............                                                                            [  4%]
Animated_timeseries.ipynb .................                                                                              [ 10%]
Applying_WOfS_bitmasking.ipynb .................                                                                         [ 16%]
Calculating_band_indices.ipynb .........                                                                                 [ 19%]
Contour_extraction.ipynb ..............                                                                                  [ 23%]
Exporting_GeoTIFFs.ipynb ..........                                                                                      [ 27%]
Exporting_NetCDFs.ipynb .........                                                                                        [ 30%]
Geomedian_composites.ipynb ........                                                                                      [ 33%]
Image_segmentation.ipynb ...............                                                                                 [ 38%]
Imagery_on_web_map.ipynb ............                                                                                    [ 42%]
Integrating_external_data.ipynb ............                                                                             [ 46%]
Machine_learning_with_ODC.ipynb ....................                                                                     [ 53%]
Masking_data.ipynb ..............                                                                                        [ 58%]
Opening_GeoTIFFs_NetCDFs.ipynb ............                                                                              [ 62%]
Pan_sharpening_Brovey.ipynb .............                                                                                [ 66%]
Rasterize_vectorize.ipynb ...........                                                                                    [ 70%]
Tidal_modelling.ipynb ................                                                                                   [ 76%]
Using_load_ard.ipynb ...............                                                                                     [ 81%]
Virtual_products.ipynb ...........................................                                                       [ 95%]
Working_with_time.ipynb ............                                                                                     [100%]

======================================================= warnings summary =======================================================
/env/lib/python3.6/site-packages/nbval/plugin.py:115: 20 tests with warnings
  /env/lib/python3.6/site-packages/nbval/plugin.py:115: PytestDeprecationWarning: direct construction of IPyNbFile has been deprecated, please use IPyNbFile.from_parent
    return IPyNbFile(path, parent)

/env/lib/python3.6/site-packages/nbval/plugin.py:312: 292 tests with warnings
  /env/lib/python3.6/site-packages/nbval/plugin.py:312: PytestDeprecationWarning: direct construction of IPyNbCell has been deprecated, please use IPyNbCell.from_parent
    cell, options)

Analyse_multiple_polygons.ipynb::Cell 0
  /env/lib/python3.6/site-packages/jupyter_client/manager.py:66: DeprecationWarning: KernelManager._kernel_spec_manager_changed is deprecated in traitlets 4.1: use @observe and @unobserve instead.
    def _kernel_spec_manager_changed(self):

Analyse_multiple_polygons.ipynb: 1 test with warning
Animated_timeseries.ipynb: 1 test with warning
Applying_WOfS_bitmasking.ipynb: 1 test with warning
Calculating_band_indices.ipynb: 1 test with warning
Contour_extraction.ipynb: 1 test with warning
Exporting_GeoTIFFs.ipynb: 1 test with warning
Exporting_NetCDFs.ipynb: 1 test with warning
Geomedian_composites.ipynb: 1 test with warning
Image_segmentation.ipynb: 1 test with warning
Imagery_on_web_map.ipynb: 1 test with warning
Integrating_external_data.ipynb: 1 test with warning
Machine_learning_with_ODC.ipynb: 1 test with warning
Masking_data.ipynb: 1 test with warning
Opening_GeoTIFFs_NetCDFs.ipynb: 1 test with warning
Pan_sharpening_Brovey.ipynb: 1 test with warning
Rasterize_vectorize.ipynb: 1 test with warning
Tidal_modelling.ipynb: 1 test with warning
Using_load_ard.ipynb: 1 test with warning
Virtual_products.ipynb: 1 test with warning
Working_with_time.ipynb: 1 test with warning
  /env/lib/python3.6/site-packages/jupyter_client/manager.py:358: FutureWarning: Method cleanup(connection_file=True) is deprecated, use cleanup_resources(restart=False).
    FutureWarning)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================== 292 passed, 333 warnings in 1407.62s (0:23:27) ========================================

@Kirill888
Copy link
Contributor Author

We should probably combine this work with replacing calls to write_geotiff with write_cog (#586)

@cbur24
Copy link
Collaborator

cbur24 commented Jul 7, 2020

I didn't see @Kirill888 comment above.....but I agree, this issue dovetails with our need to update write_geotiff examples to write_cog, as many of the instances in the repository of adding crs to attributes (ie ds['attrs'].crs = ds.crs) occurs during the conversion of dataarrays to datasets to satisfy write_geotiff.

@Kirill888
Copy link
Contributor Author

write_geotiff should not need crs injection, it accesses that information via .geobox, and .geobox probably survives in those notebooks just fine. Also swapping out write_geotiff does not need to wait for 1.8.1, only 1.8.0, which is already everywhere. Only code that must use assign_crs is dependent on 1.8.1.

@Kirill888
Copy link
Contributor Author

Kirill888 commented Jul 8, 2020

Looks like synchronizing everything would be rather difficult, so I have pushed changes to datacube-core that take care of the current state. So we can proceed with updating sandbox images independently of this work. Still I'd rather see these fixes done earlier than later as notebooks are an essential part of documentation for datacube and related libs so keeping them up to date is a priority.

@robbibt
Copy link
Member

robbibt commented Jun 30, 2023

This will be resolved better by switching to odc.geo tools, will deal with that in more specific issues

@robbibt robbibt closed this as completed Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed priority Issues that should be fixed ASAP
Projects
None yet
Development

No branches or pull requests

3 participants