-
Notifications
You must be signed in to change notification settings - Fork 15
DataSource.open_dataset() performance and code review #645
Comments
@forman @kbernat it appear that opeandap query feature is not used. I mean the remote URL should be enhanced with query constraints e.g ?var1[start:step:stop], befor to ask xarray to open it. I think we should find a way to get the structure information and then build an opendap query which reduce consistently the size when the user ask to get only a spatial/time subset. the response of a openDap server to a DAS request is something like this Das Response. Anyone know a library we can use to access/parse it ? |
@papesci That was also my first thought when I started using OPeNDAP protocol - in fact xarray (and netcdf4 driver underneath) builds OPeNDAP query automatically, that feature works efficiently. Line 70 in fe1308f
For test puropse I used datasource esacci.CLOUD.mon.L3C.CLD_PRODUCTS.MODIS.Aqua.MODIS_AQUA.2-0.r1 and single variable cfc, with time range constraint 2003-01-01,2003-01-01 and specified spatial coordinates - 0,0,1,1
L746-L748: While dropped_variables excludes specified variables and does that internally - e.g. broken or unsupported variables, L756-L759: L766: |
thanks Chris, i have check it also in Llinux you are right. I wonder where and when exactly xArray access the remote data building the query. Doesn't look to be open_dataset method. L766: |
L756-L759: That comes from my changes to that code at some point. Initially the values were taken directly from the dataset, in this case it could not be guaranteed that these are correct. The whole point of having Line 300 in 107dbfc
Note that we drop spatial attributes that can't be reliably found through introspection. EDIT: Yes, the current iteration of |
@JanisGailis we already had a lengthy discussion on this. The current implementation is tolerant w.r.t. not being able to derive the spatial attributes if this is not possible. The former implementation just raised exceptions and frustrated users. I thought we already agreed, that Cate should be data-tolerant in general. (However, I agree Normalisation of datasets should make sure that coordinate variables Just look at #655, where Cate fails, because the geo-spatial resolution attribute also contains the units. |
Sure, sure. I just explained how those lines came to be as neither @papesci nor @kbernat would know. The reasoning was that we shouldn't do the same thing in many ways and places all over the codebase. So, if there is code that determines spatial attributes of a dataset by introspection, it should be used whenever we need to find out things like spatial extents and resolution. Especially because it can then be updated to deal with weird datasets and corner cases in a single place. So in this case in my opinion it should then check for |
I agree. And rather than relying on specific attributes and/or coordinates, we should consistently use a single Cate API function that will fetch the spatio-temporal ranges and resolutions in a most flexible and tolerant way, including CF conventions and later on from other frequently seen encodings, e.g. from HDF-EOS, GeoTIFF. |
I'm closing this as the initial questions are all answered. All software issues should be further discussed in #644. |
I have several questions regarding the current implementation of ODP wrt OPeNDAP performance and efficiency. The background is that I am about to address #644.
They are:
dropped_variables
from the beginning?adjust_spatial_attrs_impl
does not and can not guarantee that spatial attributes are present!@papesci and @kbernat, could you please investigate.
Specifications
Cate master as of 2018-05-10
The text was updated successfully, but these errors were encountered: