Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom dataset names in 'generic_image' reader and fix nodata handling #1560

Merged
merged 20 commits into from
Apr 30, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions satpy/readers/generic_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ def end_time(self):

def get_dataset(self, key, info):
"""Get a dataset from the file."""
logger.debug("Reading %s.", key)
return self.file_content[key['name']]
logger.debug("Reading 'image' (ignoring provided key %s).", key)
return self.file_content['image']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you either:

  1. Copy self.file_content['image'] (with .copy() method) and update the attributes with the info dictionary. Like img_copy.attrs.update(info). I think this should update the name and any other DataID information that someone might include in the YAML.
  2. Keep the old way of doing key['name'] but update the read method to write the file content to the right file_content key.

Option 1 is easiest. Option 2 provides the most flexibility in the future. @pnuu let us know if there is a reason not to move .read to the get_dataset call as that also seems like it could help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for @pnuu 's input and I'll change it accordingly. Option 2 seems nicer to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'image' is there just to make things work with the existing way Satpy is designed to load data with a named dataset. In an image, there's only one dataset and scn.load() should probably just load the image. But yeah, when loading data with multiple readers that'd not work.

I think option 2 is the better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove the read() completely as it is only used in the ctor and move the reading stuff to get_dataset(). Then there's also no need to keep an attribute self.file_content any longer. The "How-To for custom readers" (https://satpy.readthedocs.io/en/stable/dev_guide/custom_reader.html) only demands implementing get_dataset().
And I couldn't find any other readers that use GenericImageFileHandler as base class.
Keeping self.file_content seems only necessary for caching reasons. But I think there are other mechanisms in satpy that care about this, right?
Okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding moving the functionality: sounds good as long as the code is clean.

Regarding file_content, that is likely just a common practice used from other readers. It also makes writing tests easier, at least in other readers, because complex NetCDF/HDF files can be described by a dictionary and put in file_content. For this reader, maybe that isn't necessary. As long as the tests can be made to work with whatever you do then 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to lease reading in ctor as it is because otherwise get_area_def() needs special treatment. The area is set during reading. Although it seems that get_area_def() is called after get_dataset() and therefore area should be already available in my tests - it don't know if it's the case in all situations..
The latest commit introduces a class attribute dataset_name that is set to default image in the read function. In derived fake readers you can provide own file_content and leave ds_name empty to work with multiple datasets - if somebody really wants to do this..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there was a semi-recent change in the base reader that changed the order of when the dataset was loaded and when the area was loaded. It used to be that the area was always loaded first (if I remember correctly), but now it is loaded after.



def mask_image_data(data):
Expand Down
15 changes: 15 additions & 0 deletions satpy/tests/reader_tests/test_generic_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,3 +187,18 @@ def test_GenericImageFileHandler(self):
self.assertTrue(data.bands.size == 4)
data = mask_image_data(data)
self.assertTrue(data.bands.size == 3)

def test_GenericImageFileHandler_datasetid(self):
"""Test direct use of the reader."""
from satpy.readers.generic_image import GenericImageFileHandler
from satpy.readers.generic_image import mask_image_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'satpy.readers.generic_image.mask_image_data' imported but unused


fname = os.path.join(self.base_dir, 'test_rgba.tif')
fname_info = {'start_time': self.date}
ftype_info = {}
reader = GenericImageFileHandler(fname, fname_info, ftype_info)

foo = make_dataid(name='image-custom')
self.assertTrue(reader.file_content)
dataset = reader.get_dataset(foo, None)
self.assertTrue(isinstance(dataset, xr.DataArray))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W292 no newline at end of file