Adding image driver and restructuring #25

jsignell · 2018-12-13T18:51:04Z

This PR adds an image reader for xarray and does a restructure to have a common structure for all plugins in intake-xarray. In particular this adds the ability to return a dataset rather than a dataarray using the merge_dim option.

I think this will help justify the existence of an intake-xarray plugin at all since it can now be used directly and the other plugins are just helper for using specific readers.

It is likely that this needs lots of work. I rewrote the example notebook to try to explain some of the behavior.

…vers

martindurant · 2018-12-22T22:31:16Z

Are you around the coming week to take me through the proposal here?

philippjfr · 2019-01-06T17:05:02Z

Just tried playing with this, but I can't figure out why intake.open_image doesn't exist for me.

martindurant · 2019-01-06T17:15:21Z

@philippjfr , I believe @jsignell is not yet back from holidays. Maybe you get an explicit error when trying to import ImageSource?

philippjfr · 2019-01-06T17:19:05Z

Thanks for letting me know. No error when I try to import ImageSource directly. At some point I should clearly read some of the internals of the plugin system to understand how the function gets registered.

martindurant · 2019-01-06T17:23:30Z

That I can answer: Intake tries to import any package with the name intake_*, and looks for subclasses of DataSource in the top level. For each found, it registers the class by the name class attribute and generates the open function. They should appear in the top-level registry dict, and the functions that do the importing are in source.discovery.

jsignell · 2019-01-09T15:59:18Z

@martindurant I forgot to mention in our chat that this PR also adds the ability to return datasets using the kwarg merge_dim rather than only dataarrays. I am not sure if that is an overstep for intake or not, but it does seem like a bit of munging that it would be very handy to be able to encode in a catalog.

martindurant · 2019-01-09T16:00:58Z

I would say that adding extra capabilities that may be useful for some is totally in scope, so long as it doesn't ass complexity for those that don't need it.

jsignell · 2019-01-09T16:34:42Z

intake_xarray/base.py

+            for k, values in field_values.items() if k != self.merge_dim
+        }
+
+    def _open_files(self, files):


A lot of logic is in this method. Not sure how to split it up better though. Essentially there are 4 paths through, no pattern and merge_dim, no pattern and concat_dim, pattern and concat_dim, or pattern and merge_dim. The first two are pretty straightforward and then they increase in complexity.

Doesn't look too hairy. Perhaps could do with a comment on each branch, saying what it does, and a docstring with your comment above

added comments

martindurant

Overall, I like the restructure and the new image driver. This is blog-worthy!

I have questions around local versus remote files, and comments/questions elsewhere.

I have not gone through the example notebooks yet.

martindurant · 2019-01-09T16:58:46Z

intake_xarray/base.py

+        self._multireader = multireader or xr.open_mfdataset
+        super(XarraySource, self).__init__(metadata=metadata)
+
+    def reader(self, filename, chunks, **kwargs):


This feels like an attribute

I was trying to set a default while still enforcing that plugins define the reader and multireader. But maybe this isn't the right way...

martindurant · 2019-01-09T16:59:26Z

intake_xarray/base.py


+try:
+    import xarray as xr


Can the import be deferred until it is needed? This module will be imported upon import intake

Won't that just make these unavailable which is what we want?

In the case that xarray is available, it'll make the import of intake slower. In the case it isn't, the module won't load, but the exception will get swallowed.
If deferred, the module would load OK, but when the user tries to access the data, then they'll get the message, saying that they need to install something if they want to use that source.

martindurant · 2019-01-09T17:02:30Z

intake_xarray/base.py

+                                  'which takes at least filename, and chunks '
+                                  'and returns an xarray object')
+
+    def multireader(self, filename, chunks, **kwargs):


martindurant · 2019-01-09T17:04:40Z

intake_xarray/base.py

+            for k, values in field_values.items() if k != self.merge_dim
+        }
+
+    def _open_files(self, files):


Doesn't look too hairy. Perhaps could do with a comment on each branch, saying what it does, and a docstring with your comment above

martindurant · 2019-01-09T17:05:19Z

intake_xarray/base.py

+    def _open_files(self, files):
+        das = [self.reader(f, chunks=self.chunks, **self.kwargs)
+               for f in files]
+        if not self.pattern:


Perhaps some idea of which of these attributes are mutually exclusive

martindurant · 2019-01-09T17:29:15Z

intake_xarray/image.py

+    elif os.path.isfile(filename):
+        filenames = [filename]
+    else:
+        filenames = [filename]


What is the expected use-case here? We want to allow passing a directory?

Remote file - most likely url. I added a comment.

martindurant · 2019-01-09T17:30:04Z

intake_xarray/image.py

+     http://docs.dask.org/en/latest/array-api.html#dask.array.image.imread
+    for possible extra arguments.
+
+    NOTE: Although ``skimage.io.imread`` is used by default, any reader function which


Should somewhere give an example of how that works.

(or just remove the capability??)

The capability came from dask, but yeah I think you are right given how hard it is to write lambda functions in yaml it is probably better to just use skimage.io.imread until someone asks to be able to use another reader

I think if it's that is importable, it's easy to do: !!python/name:mymodule.process. Could be seen as a future enhancement?

That seems reasonable.

martindurant · 2019-01-09T17:32:48Z

intake_xarray/netcdf.py

+    for the file formats supported and possible extra arguments.
+
+    NOTE: When reading from OpenDAP URLs do not set the ``chunks`` option to
+    use provided default chunking.


There is an explanation under the opendap driver that it handles the auth part - should say here explicitly and that the other driver may be necessary

martindurant · 2019-01-09T17:36:07Z

intake_xarray/netcdf.py

+        Some examples:
+            - ``s3://data/*.nc``
+            - ``http://thredds.ucar.edu/thredds/dodsC/grib/FNMOC/WW3/Global_1p0deg/Best``
+            - ``https://github.com/pydata/xarray-data/blob/master/air_temperature.nc?raw=true``


So does this handle remote URLs directly or not? I assume if it is opendap, then yes, in which case the thing about needing caching is wrong (and in fact won't work).
The thredds URL gives Unrecognized Request for me.

Yes that is terrible wording. I think I meant that OpenDAP urls can/should be used directly and all others with caching. You can't GET thredds urls like that directly. The 400 is correct.

martindurant · 2019-01-09T17:37:46Z

tests/test_intake_xarray.py

@@ -16,7 +16,7 @@ def test_discover(source, cdf_source, zarr_source, dataset):
    r = source.discover()

    assert r['datashape'] is None
-    assert r['dtype'] is None
+    assert r['dtype'] == 'float32'


The whole dataset has a single dtype?

jsignell · 2019-01-11T17:39:48Z

After conversation with Martin this chunk of work seems unreasonably big. So I am going to make a new PR that just adds an ImagePlugin.

martindurant · 2019-01-14T20:23:25Z

Closed in favour of #28 . The refactor may become necessary again at a later stage.

jsignell added 2 commits December 13, 2018 13:46

Restructuring intake-xarray to have a common structure across all dri…

41cc256

…vers

Fixing tests to skip if skimage not available

238d534

This was referenced Dec 14, 2018

Using s3 and new intake-xarray plugin holoviz-topics/EarthML#76

Merged

RGB holoviz/hvplot#130

Closed

jsignell added 2 commits January 9, 2019 11:28

Moving imports into methods and renaming image ->xarray_image

7274088

Fixing examples to reflect the rename from image-> xarray_image

8cf4566

jsignell requested a review from martindurant January 9, 2019 16:29

jsignell commented Jan 9, 2019

View reviewed changes

martindurant reviewed Jan 9, 2019

View reviewed changes

Responding to comments

789f1ff

jsignell mentioned this pull request Jan 11, 2019

Adding path_as_pattern for netcdf #22

Merged

Using open_files to get list of paths

c10221c

jsignell mentioned this pull request Jan 14, 2019

Just adding xarray image plugin #28

Merged

martindurant closed this Jan 14, 2019

jsignell deleted the jsignell/image branch January 24, 2019 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding image driver and restructuring #25

Adding image driver and restructuring #25

jsignell commented Dec 13, 2018 •

edited

Loading

martindurant commented Dec 22, 2018

philippjfr commented Jan 6, 2019

martindurant commented Jan 6, 2019

philippjfr commented Jan 6, 2019 •

edited

Loading

martindurant commented Jan 6, 2019

jsignell commented Jan 9, 2019

martindurant commented Jan 9, 2019

jsignell Jan 9, 2019

martindurant Jan 9, 2019

jsignell Jan 9, 2019

martindurant left a comment

martindurant Jan 9, 2019

jsignell Jan 9, 2019

martindurant Jan 9, 2019

jsignell Jan 9, 2019

martindurant Jan 9, 2019

martindurant Jan 9, 2019

martindurant Jan 9, 2019

martindurant Jan 9, 2019

martindurant Jan 9, 2019

jsignell Jan 9, 2019

martindurant Jan 9, 2019

martindurant Jan 9, 2019

jsignell Jan 9, 2019

martindurant Jan 9, 2019

jsignell Jan 10, 2019

martindurant Jan 9, 2019

martindurant Jan 9, 2019

jsignell Jan 9, 2019

martindurant Jan 9, 2019

jsignell commented Jan 11, 2019

martindurant commented Jan 14, 2019

Adding image driver and restructuring #25

Adding image driver and restructuring #25

Conversation

jsignell commented Dec 13, 2018 • edited Loading

martindurant commented Dec 22, 2018

philippjfr commented Jan 6, 2019

martindurant commented Jan 6, 2019

philippjfr commented Jan 6, 2019 • edited Loading

martindurant commented Jan 6, 2019

jsignell commented Jan 9, 2019

martindurant commented Jan 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindurant left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsignell commented Jan 11, 2019

martindurant commented Jan 14, 2019

jsignell commented Dec 13, 2018 •

edited

Loading

philippjfr commented Jan 6, 2019 •

edited

Loading