Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom dataset names in 'generic_image' reader and fix nodata handling #1560

Merged
merged 20 commits into from
Apr 30, 2021

Conversation

ch-k
Copy link
Contributor

@ch-k ch-k commented Feb 22, 2021

This PR is about two issues. Sorry about mixing it, but it concerns the same file.

At the moment it is not possible to use the GenericImageFileHandler for reader configs that define multiple dataset names. This is an example:

reader:
    name: world_generic_image
    description: generic image reader
    reader: !!python/name:satpy.readers.yaml_reader.FileYAMLReader
    sensors: [world-images]
    default_channels: [image_goes16, image_goes17]

datasets:
  image_goes16:
    name: image_goes16
    file_type: graphic_goes16
  image_goes17:
    name: image_goes17
    file_type: graphic_goes17
  image_zds:
    name: image_zds
    file_type: graphic_zds
  image_iodc:
    name: image_iodc
    file_type: graphic_iodc
  image_himawari8:
    name: image_himawari8
    file_type: graphic_himawari8

file_types:
    graphic_goes16:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'goes16_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'goes16_{filename}.tif'
    graphic_goes17:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'goes17_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'goes17_{filename}.tif'
    graphic_zds:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'ZDS_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'ZDS_{filename}.tif'
    graphic_iodc:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'IODC_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'IODC_{filename}.tif'
    graphic_himawari8:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'himawari8_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'himawari8_{filename}.tif'

That doesn't work because a hardcoded name image (https://github.com/pytroll/satpy/blob/master/satpy/readers/generic_image.py#L85) is used to set the data and the name used in the reader config is used to get the data. Therefore only one dataset with name image is possible. Otherwise the reader complains that the dataset is not available.

The fill value of GeoTIFF files is not used to mask the data. With this PR, fill value pixels are set to nan as it is already the case when there are alpha channels present.

  • Closes #xxxx
  • Tests added
  • Fully documented
  • Add your name to AUTHORS.md if not there already

@codecov
Copy link

codecov bot commented Feb 22, 2021

Codecov Report

Merging #1560 (0f93397) into master (b564e0c) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1560      +/-   ##
==========================================
+ Coverage   92.62%   92.65%   +0.02%     
==========================================
  Files         258      258              
  Lines       37847    37929      +82     
==========================================
+ Hits        35057    35142      +85     
+ Misses       2790     2787       -3     
Flag Coverage Δ
behaviourtests 4.83% <0.00%> (-0.02%) ⬇️
unittests 92.78% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/generic_image.py 97.56% <100.00%> (+4.22%) ⬆️
satpy/tests/reader_tests/test_generic_image.py 98.90% <100.00%> (+0.57%) ⬆️
satpy/tests/test_scene.py 99.44% <0.00%> (+<0.01%) ⬆️
satpy/scene.py 92.29% <0.00%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b564e0c...0f93397. Read the comment docs.

def test_GenericImageFileHandler_datasetid(self):
"""Test direct use of the reader."""
from satpy.readers.generic_image import GenericImageFileHandler
from satpy.readers.generic_image import mask_image_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'satpy.readers.generic_image.mask_image_data' imported but unused

foo = make_dataid(name='image-custom')
self.assertTrue(reader.file_content)
dataset = reader.get_dataset(foo, None)
self.assertTrue(isinstance(dataset, xr.DataArray))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W292 no newline at end of file

@mraspaud
Copy link
Member

Could you explain the motivation behind this change?

@ch-k
Copy link
Contributor Author

ch-k commented Feb 22, 2021

Could you explain the motivation behind this change?

Sorry, should be a draft only. My primary objectiv was to create the DWD world composite with a new trollflow2/satpy chain. I have geotiff files of all satellites on disk and segment gatherer should be used to publish an event when everything is available. Then trollflow2/satpy should use the generic image reader to read all images together into the same scene object. I've created a custom reader.yml there I specified each satellites' image as channel. Afterwards I'll try to create a new compositor to stich the images. To read all images together, this patch is necessary. I can provide the reader.yml when I'm back in office.

@djhoese
Copy link
Member

djhoese commented Feb 22, 2021

Ah I see, the main problem is:

self.file_content['image'] = data

This doesn't use the key provided in the YAML at all. Maybe it would be better to copy the file_content['image'] DataArray and reassign the .attrs['name'] to the provided key's name. Maybe there should be a key in the info (that comes from the YAML) that specifies the file_key. This could default to image and would provide flexibility in the future if we wanted to make it possible to load other things besides the image (the alpha channel maybe, I don't know).

Just some brainstorming.

@ch-k
Copy link
Contributor Author

ch-k commented Feb 23, 2021

Here is an example of the reader config that I'd like to use to read the images into the same scene (GOES16 + 17 is only the first step):

reader:
    name: world_generic_image
    description: generic image reader
    reader: !!python/name:satpy.readers.yaml_reader.FileYAMLReader
    sensors: [images]
    default_channels: [image_goes16, image_goes17]

datasets:
  image_goes16:
    name: image_goes16
    file_type: graphic_goes16
  image_goes17:
    name: image_goes17
    file_type: graphic_goes17

file_types:
    graphic_goes16:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'goes16_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'goes16_{filename}.tif'
    graphic_goes17:
        file_reader: !!python/name:satpy.readers.generic_image.GenericImageFileHandler
        file_patterns:
         - 'goes17_{filename}{start_time:%Y%m%d%H%M}.tif'
         - 'goes17_{filename}.tif'

@ch-k
Copy link
Contributor Author

ch-k commented Feb 23, 2021

Ah I see, the main problem is:

self.file_content['image'] = data

This doesn't use the key provided in the YAML at all. Maybe it would be better to copy the file_content['image'] DataArray and reassign the .attrs['name'] to the provided key's name. Maybe there should be a key in the info (that comes from the YAML) that specifies the file_key. This could default to image and would provide flexibility in the future if we wanted to make it possible to load other things besides the image (the alpha channel maybe, I don't know).

Just some brainstorming.
At first, I tried to change the static 'image' to something like

    def read(self):
        """Read the image."""
        dataset = rasterio.open(self.finfo['filename'])
...
        self.file_content[key['name']] = data

... until I noticed that it's not available in this stage. I've thought about moving the whole reading stuff to get_dataset but I'm unsure if that's reasonable.

@djhoese
Copy link
Member

djhoese commented Feb 23, 2021

Yeah hard to say. @pnuu I think wrote this reader originally and mind have ideas for what can and can't be done on __init__ of the file handler. I don't think my idea or your idea would be bad. No reason we couldn't do both (from the little I know about this reader). We definitely want to support your use case as I think this was kind of the intention of making this functionality a reader in the first place.

@pnuu
Copy link
Member

pnuu commented Feb 24, 2021

Sorry, this might not be very helpful in respect to the changes needed to do this purely in Satpy.

My primary objectiv was to create the DWD world composite with a new trollflow2/satpy chain. I have geotiff files of all satellites on disk and segment gatherer should be used to publish an event when everything is available.

This sounds very much like create_global_mosaic.py we had in pytroll-collectors still few years back, but didn't want to keep it working with the current Satpy and decided to remove it all together. If you want to have a look, see this and this and this.

It collected the images similar to segment gatherer, then cut the input images very naïvely at the GEO satellites' mid-point longitudes and put the images together. It required the input images to be in latlon grid, didn't do any blending, and had some other simplifications.

My current solution is to rely on putting the images (in what ever projection they are generated in) to WMS and let the client software do the requests so that the images are in correct projection, and let it also handle (possible) blending, arrangement and everything else.

@ch-k
Copy link
Contributor Author

ch-k commented Feb 24, 2021

Sorry, this might not be very helpful in respect to the changes needed to do this purely in Satpy.

My primary objectiv was to create the DWD world composite with a new trollflow2/satpy chain. I have geotiff files of all satellites on disk and segment gatherer should be used to publish an event when everything is available.

This sounds very much like create_global_mosaic.py we had in pytroll-collectors still few years back, but didn't want to keep it working with the current Satpy and decided to remove it all together. If you want to have a look, see this and this and this.

It collected the images similar to segment gatherer, then cut the input images very naïvely at the GEO satellites' mid-point longitudes and put the images together. It required the input images to be in latlon grid, didn't do any blending, and had some other simplifications.

My current solution is to rely on putting the images (in what ever projection they are generated in) to WMS and let the client software do the requests so that the images are in correct projection, and let it also handle (possible) blending, arrangement and everything else.

@pnuu I remember that we discussed about it in the past. Currently we still have a similar solution running. It uses some reading and writing functions provided by good old mpop. It has to be migrated. I hope to use as much satpy/trollflow2 functionality as possible. It should be parametrized as all the other stuff. The core mosaic creation would be done in a new satpy compositor. Geoserver is not an option for our use case.

@pnuu
Copy link
Member

pnuu commented Feb 24, 2021

So would the use be like this?

fnames = ["goes16_true_color_202102241200.tif", "goes17_true_color_202102241200.tif"]
scn = Scene(reader="world_generic_reader", filenames=fnames)
scn.load(["image_goes16", "image_goes17"])

And another question: how are you planning to handle resampling so that the intertwined pixels do not cause weird effects where the images overlap? The plain nearest neighbour resampling will just blindly use the closest pixel to each target location, so the illumination, parallax and (very slightly) calibration will cause artifacts on the overlapping areas. Maybe MultiScene (several separate scenes) and MultiScene.blend() would handle this readily in a nice way?

@ch-k
Copy link
Contributor Author

ch-k commented Feb 24, 2021

So would the use be like this?

fnames = ["goes16_true_color_202102241200.tif", "goes17_true_color_202102241200.tif"]
scn = Scene(reader="world_generic_reader", filenames=fnames)
scn.load(["image_goes16", "image_goes17"])

Yes, exactly. But we start with single channel images..

And another question: how are you planning to handle resampling so that the intertwined pixels do not cause weird effects where the images overlap? The plain nearest neighbour resampling will just blindly use the closest pixel to each target location, so the illumination, parallax and (very slightly) calibration will cause artifacts on the overlapping areas. Maybe MultiScene (several separate scenes) and MultiScene.blend() would handle this readily in a nice way?
In our use case, no resampling is necessary. The input images are already resampled to the target area:
goes16_IR_1035_ir_wwwWcm40km_202102181630
goes17_IR_1035_ir_wwwWcm40km_202102181530

@ch-k
Copy link
Contributor Author

ch-k commented Feb 24, 2021

Maybe MultiScene (several separate scenes) and MultiScene.blend() would handle this readily in a nice way?

Does trollflow2 work with MultiScene?

@pnuu
Copy link
Member

pnuu commented Feb 24, 2021

Does trollflow2 work with MultiScene?

No, not at the moment. Creating a plugin that supports the usage of it should be pretty simple.

@ch-k ch-k changed the title Fix: use same hardcoded dataset name for set and get Fix: GenericImageFileHandler, hardcoded dataset name for set and get + fillvalue Feb 26, 2021
@ch-k
Copy link
Contributor Author

ch-k commented Feb 26, 2021

I've added some code to handle the fill_value when reading GeoTIFFs.
Sorry for mixing it with the other issue. It's difficult to manage changes in different branches when I need all of them to get my feature running (world comp).

@pnuu I think wrote this reader originally and mind have ideas for what can and can't be done on init of the file handler. I don't think my idea or your idea would be bad. No reason we couldn't do both (from the little I know about this reader). We definitely want to support your use case as I think this was kind of the intention of making this functionality a reader in the first place.

@pnuu: What do think? I see my PR more as a fix than an enhancement.

@ch-k ch-k marked this pull request as ready for review March 1, 2021 10:51
@ch-k ch-k requested review from djhoese and mraspaud as code owners March 1, 2021 10:51
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple suggestions to make this more flexible...I think.

@@ -117,4 +117,9 @@ def mask_image_data(data):
for i in range(data.shape[0])])
data.data = masked_data
data = data.sel(bands=BANDS[data.bands.size - 1])
elif hasattr(data, 'nodatavals') and data.nodatavals:
data = data.astype(np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to convert to float here? If we can keep the initial data type and use a fill value dependent on that data type then it may work for more situations. What if you copied the nodatavals to .attrs['_FillValue'] for integer types, but replaced them with np.nan for floats? This would match what we do in other readers for "category" products.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure what's better option. Unfortunately, NaN is only possible with floats and I have the feeling that most of the compositors only work with NaNs as masks. It seems that only ColormapCompositor cares about "_FillValue". That's why I've done the float conversation as it is already the case if alpha channels are present.
Maybe I should introduce a new reader parameter and let the user decide how to handle nodatavals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stumbled across the following paragraph in the satpy docs (https://satpy.readthedocs.io/en/latest/dev_guide/custom_reader.html)

Single precision floats (np.float32) is a good compromise, as it has 23 significant bits (mantissa) and can thus represent 16 bit integers exactly, as well as keeping the memory footprint half of a double precision float.
One commonly used method in readers is xarray.DataArray.where() (to mask invalid data) which can be coercing the data to np.float64. To ensure for example that integer data is coerced to np.float32 when xarray.DataArray.where() is used, you can do ..
So converting to floats to mask data with NaN seems a common practice. As noted in the docs, float32 would be better. Conversion to floats is already done few lines above when alpha channels are handled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is more specifically talking about integer data that is in the file as integers and needs to be scaled (ex. data * scale_factor + add_offset) to floats, but also has a fill value. In that case you'd need to do the .where call to mask out the floats with some other value. In Satpy that is NaN for floating point data and any _FillValue for integer data. In older versions of numpy (I think newer versions have fixed this) if you do .where with np.nan it would cast 32-bit floats to 64-bit floats. Now it knows that you really mean 32-bit float -> 32-bit float NaN. For integers, they would always be cast to 64-bit floats so the note in the documentation is trying to say "there is no reason for that so force 32-bit float conversion".

The note should probably be updated to refer to integer fields that are meant to stay integers like cloud type products or in this case 8-bit unsigned integer RGB images. If an RGB is being read in by Satpy and is provided as floating point data (instead of as an integer) then it should probably be normalized to a 0-1 range as that is the assumption by the trollimage package when saving images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how to go one here. When I have an image (i.e. 1 band, 8bit int), I do not know without prior knowledge whether the values represent categories or not. Additionally, maybe scaling (according to parameters store in the file itself) is wanted or normalization to [0..1] - or nothing at all. At the moment (without the changes in this PR) no scaling/normalization/calibration is applied. They are returned as stored in the file.
I think the behaviour of the reader should be parametrized. I'm not very familiar with this but looking at other readers, the dataset sections in the reader yaml might be a way. Stuff like calibration, units, fill_value, etc. seems to be specified there.
Maybe we can introduce a parameter like mask_type: floatNan to define what the reader should do.

I don't want to risk any problems for other users of this reader. Therefore default behaviour would be to do nothing (no type conversion and nan-masking) and the "float-nan-masking" is only active when it's enabled in the reader config.
Are you okay with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like scale factor and add offset in a geotiff are not standard so I don't think we should add any support for that. Support for that should be added in a separate reader (a custom reader created by the user).

Indeed there are a lot of attributes that are not standarized (geotiff and beyond). I do not really care about the scaling or offsets at the moment (only masks :-) ). But I think that readers in general should provide the possibility to output data that is comparable to data provided by other readers. They should intruduce a level of abstraction from the files. It's not very convenient for users of satpy when you have to adjust/modify compositors when switching between different readers. IMHO readers (that share the same API) should also return similar data structures or at least can be configured to do so.

In other readers, things like units are just metadata to fill in the space of missing information in the file. In those cases, the reader is specific to a particular implementation of a data file (VIIRS SDR in the HDF5 format versus generic HDF5 files) or are representing a standard (CF compliant NetCDF files). This generic image reader should try to use standard geotiff features.

This reader is not only used to read GeoTiff. As far as I see, there are filetypes (png, ..) that this reader can handle. But then it is even more necessary to provide more reader options to specified how to handle this data.

Maybe these kind of reader parametrizations are then highly use case dependent and do not belong to "core" satpy (aka satpy/etc/readers/xxx.yaml) but the reader implementation in satpy (satpy/readers/xxx.py) should offer the set of tools (parameters) to control the handling of the input data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO readers (that share the same API) should also return similar data structures or at least can be configured to do so.

Yeah, sure, but the question is what is the point of this pull request and how much work do you want to include in it? I am totally fine with the python code allowing for all sorts of flexibility as long as it is documented and that the defaults work. This is the "generic" image reader so none of this extra functionality should be required to get something out of the reader. Adding options to a static list of datasets in the YAML sounds fine.

I think this reader should fix the dataset naming so it can be overridden in the YAML. If it includes some minimal nodata handling for geotiffs, then great. Anything beyond that should probably go in a separate PR, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original use for which I wrote this reader was to read any image as an image and use it as a background for BackgroundCompositor and to make it possible to resample them to a specific area. So in a way, I never considered it for anything else than categorical data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure, but the question is what is the point of this pull request and how much work do you want to include in it? I am totally fine with the python code allowing for all sorts of flexibility as long as it is documented and that the defaults work. This is the "generic" image reader so none of this extra functionality should be required to get something out of the reader. Adding options to a static list of datasets in the YAML sounds fine.

I think this reader should fix the dataset naming so it can be overridden in the YAML. If it includes some minimal nodata handling for geotiffs, then great. Anything beyond that should probably go in a separate PR, right?

My latest commit should address this points. There's now an option to control the handling of nodata. Default is to do nothing with the data but copying the nodata value (first element) to dataset.attrs['_FillValue'].
Beside that you can add the option "nodata_handling" to the readers yaml dataset sections. If it is set to "nan_mask", the data is converted to float32 and masked with np.nan.

Anything beyond that should probably go in a separate PR, right?

Absolutely right. Anything else should be in another PR. But I need this together do get my stuff working and it's hard to handle multiple PRs addressing the same file.

If you have any proposals for better naming of this option - you're welcome :-)
Hope this commit fulfills the demands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original use for which I wrote this reader was to read any image as an image and use it as a background for BackgroundCompositor and to make it possible to resample them to a specific area. So in a way, I never considered it for anything else than categorical data.

It's good to have such reader. Everything starts small. I hope that your needs are also covered with this modifications and the code still doesn't get too complex. These are only small changes and it may be more confusing to implement this in a separate reader.

logger.debug("Reading %s.", key)
return self.file_content[key['name']]
logger.debug("Reading 'image' (ignoring provided key %s).", key)
return self.file_content['image']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you either:

  1. Copy self.file_content['image'] (with .copy() method) and update the attributes with the info dictionary. Like img_copy.attrs.update(info). I think this should update the name and any other DataID information that someone might include in the YAML.
  2. Keep the old way of doing key['name'] but update the read method to write the file content to the right file_content key.

Option 1 is easiest. Option 2 provides the most flexibility in the future. @pnuu let us know if there is a reason not to move .read to the get_dataset call as that also seems like it could help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for @pnuu 's input and I'll change it accordingly. Option 2 seems nicer to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'image' is there just to make things work with the existing way Satpy is designed to load data with a named dataset. In an image, there's only one dataset and scn.load() should probably just load the image. But yeah, when loading data with multiple readers that'd not work.

I think option 2 is the better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove the read() completely as it is only used in the ctor and move the reading stuff to get_dataset(). Then there's also no need to keep an attribute self.file_content any longer. The "How-To for custom readers" (https://satpy.readthedocs.io/en/stable/dev_guide/custom_reader.html) only demands implementing get_dataset().
And I couldn't find any other readers that use GenericImageFileHandler as base class.
Keeping self.file_content seems only necessary for caching reasons. But I think there are other mechanisms in satpy that care about this, right?
Okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding moving the functionality: sounds good as long as the code is clean.

Regarding file_content, that is likely just a common practice used from other readers. It also makes writing tests easier, at least in other readers, because complex NetCDF/HDF files can be described by a dictionary and put in file_content. For this reader, maybe that isn't necessary. As long as the tests can be made to work with whatever you do then 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to lease reading in ctor as it is because otherwise get_area_def() needs special treatment. The area is set during reading. Although it seems that get_area_def() is called after get_dataset() and therefore area should be already available in my tests - it don't know if it's the case in all situations..
The latest commit introduces a class attribute dataset_name that is set to default image in the read function. In derived fake readers you can provide own file_content and leave ds_name empty to work with multiple datasets - if somebody really wants to do this..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there was a semi-recent change in the base reader that changed the order of when the dataset was loaded and when the area was loaded. It used to be that the area was always loaded first (if I remember correctly), but now it is loaded after.

@pnuu
Copy link
Member

pnuu commented Mar 24, 2021

Could you add a description )to the description field) of what this PR really is about? Looking at the the code, I'm also unsure about the title of the PR. Maybe clarify the title too?

@ch-k
Copy link
Contributor Author

ch-k commented Mar 24, 2021

Could you add a description )to the description field) of what this PR really is about? Looking at the the code, I'm also unsure about the title of the PR. Maybe clarify the title too?

I've added a description text (sorry for that). But I don't have a better title..

@djhoese djhoese changed the title Fix: GenericImageFileHandler, hardcoded dataset name for set and get + fillvalue Allow custom dataset names in 'generic_image' reader and fix nodata handling Mar 24, 2021
@djhoese djhoese added component:readers enhancement code enhancements, features, improvements labels Mar 24, 2021
@djhoese djhoese requested review from pnuu and djhoese March 25, 2021 17:32
@ghost
Copy link

ghost commented Apr 23, 2021

DeepCode's analysis on #0f9339 found:

  • ℹ️ 1 minor issue. 👇
  • ✔️ 1 issue was fixed.

Top issues

Description Example fixes
First argument to super() should be the enclosing class FakeGenericImageFileHandler Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Thanks for the work. I have a couple suggestions and changes I'd like to talk about that I've made comments about.



def mask_image_data(data):
"""Mask image data if alpha channel is present."""
def mask_image_data(data, info=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is info ever not provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. This function is used in test_generic_image.py. That's why I fear it is part of the public interface and used by somebody somewhere..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch. Unless @pnuu says otherwise, I'd say update the tests to pass {} for info and make info a requirement parameter here.

If you have the time it would be nice to make this function private (_ prefix) and update the tests to not test this function specifically, but the higher level expected output of the reader instead.

Copy link
Member

@pnuu pnuu Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Dave said. And indeed, this ought to be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 140 to 143
try:
nodata_handling = info['nodata_handling'] if info else NODATA_HANDLING_FILLVALUE
except KeyError:
nodata_handling = NODATA_HANDLING_FILLVALUE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming info is never None as it is always passed, then this block could be simplified to:

Suggested change
try:
nodata_handling = info['nodata_handling'] if info else NODATA_HANDLING_FILLVALUE
except KeyError:
nodata_handling = NODATA_HANDLING_FILLVALUE
nodata_handling = info.get('nodata_handling', NODATA_HANDLING_FILLVALUE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would be much nicer. If we consider mask_image_data as private function and info as not none, I can change this.

satpy/readers/generic_image.py Outdated Show resolved Hide resolved
Comment on lines +173 to +184
fname = os.path.join(self.base_dir, 'test_l_nan_fillvalue.tif')
scn = Scene(reader='generic_image', filenames=[fname])
scn.load(['image'])
self.assertEqual(scn['image'].shape, (1, self.y_size, self.x_size))
self.assertEqual(np.sum(scn['image'].data[0][:10, :10].compute()), 0)

fname = os.path.join(self.base_dir, 'test_l_nan_nofillvalue.tif')
scn = Scene(reader='generic_image', filenames=[fname])
scn.load(['image'])
self.assertEqual(scn['image'].shape, (1, self.y_size, self.x_size))
self.assertTrue(np.all(np.isnan(scn['image'].data[0][:10, :10].compute())))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind separating these out into their own test functions. I think we've generally switched to not combining a ton of test cases into one function whenever possible. It makes it easier to debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


return data

def __mask_image_data(self, data, info):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this become a class method? Couldn't/shouldn't this remain a function outside the class? Also, please use a single underscore. Double underscore has special handling in python and isn't needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, too much java in the last months. I'll fix it tomorrow morning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.


def __init__(self, filename, filename_info, filetype_info, file_content, **kwargs):
"""Get fake file content from 'get_test_content'."""
if GenericImageFileHandler is object:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this import ever return object? Also, does this class need to be inside the test function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically a copy of an other fake reader (from file test_hdf5_utils.py). I supposed it is best practice t odo so (without knowning why..). Of cource I can remove it. I've already noted that some the checkers marked this as issue.
It is an inner class because all the other test routines in this file do their imports in the method itself. I thought, it would be better to do the same here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think now that we are using pytest for tests it isn't as important to have the various imports in the test functions themselves. @mraspaud thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed if GenericImageFileHandler is object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tihnk it's fine without the check

@djhoese
Copy link
Member

djhoese commented Apr 29, 2021

Last request: Could you look at fixing some of the style issues from codebeat and deepcode? It looks like one of the deepcode ones will be unfixable (the one regarding super) as I think what you've done is intended.

@ch-k
Copy link
Contributor Author

ch-k commented Apr 30, 2021

Last request: Could you look at fixing some of the style issues from codebeat and deepcode? It looks like one of the deepcode ones will be unfixable (the one regarding super) as I think what you've done is intended.

Fixed some of the issues. But I don't really know how to make codebeat happy. These two remaining issues are classified as info. I can't imagine that further reducing of ACB or branch nesting makes the code more readable and maintainable.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks.

@djhoese
Copy link
Member

djhoese commented Apr 30, 2021

Thanks for making the attempt. And yeah, removing all the issues identified by these tools is just something you get used to and readability is arguable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants