Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularity #48

Closed
DPeterK opened this issue Jul 21, 2020 · 5 comments · Fixed by #69
Closed

Modularity #48

DPeterK opened this issue Jul 21, 2020 · 5 comments · Fixed by #69

Comments

@DPeterK
Copy link
Contributor

DPeterK commented Jul 21, 2020

I shouldn't need to have Iris, Xarray and Zarr all available in my environment to be able to use the core of this library. Instead the core of the library should import with a minimum of installed libraries. Consider a refactor of the library to make this possible.

@grumbling-tom
Copy link
Contributor

Just had a brief look at this one. Off the top of my head, there's 3 use cases we could be covering:

  1. I want to convert some NetCDF files to a TileDB format (core deps. only, e.g. netcdf4 and tiledb)
  2. I want to convert some NetCDF files to a TileDB format and then read them into an xarray.Dataset/xarray.DataArray object (core deps. + xarray)
  3. I want to convert some NetCDF files to a TileDB format and then read them into an iris.cube.Cube object (core deps. + iris)

I think a good first step for 2. might be to change the TileDBReader.to_xarray method. At the moment, to reading into an xarray DataArray requires all the Iris dependencies, since that method constructs an Iris Cube before casting it to an xarray.DataArray. If we can change that behaviour, we could make dependencies on xarray and Iris optional, letting the user decide between iris and xarray for reading from the TileDB array.

What do you think, @DPeterK ?

@DPeterK
Copy link
Contributor Author

DPeterK commented Oct 21, 2020

that method constructs an Iris Cube before casting it to an xarray.DataArray

Indeed it does - to make the maximum amount of functionality in the minimum time! In its way it's a neat solution as the Xarray - Iris interoperability in Xarray is pretty good. The downside is, of course, that you're reliant on Iris to read a TileDB array into Xarray.

A direct TileDB --> Xarray DataArray operation is definitely a gap in this library's functionality. Very happy for you to take a look into this, @grumbling-tom - but it would definitely be worth checking the Xarray and tiledb-py issue trackers to see if there's ongoing work (probably in Xarray) that's already working to solve this problem. If there is then you may be able to either borrow what's already been done, or contribute to that effort as well! It's also worth remembering that the TileDB arrays written by this library are opinionated, in that they make specialised TileDB arrays which effectively follow the NetCDF data model - so not all TileDB arrays will look like these TileDB arrays!

@DPeterK
Copy link
Contributor Author

DPeterK commented Oct 21, 2020

The broader bit of work to do here is to instil separation of concerns into the structure of the library, which might be something like the following:

+ tiledb_netcdf
  - __init__.py
  - data_model.py
  - grid_mappings.py
  - paths.py
  - utils.py
  + readers
    - __init__.py
    - core.py
    - iris.py
    - xarray.py
  + writers
    - __init__.py
    - tiledb.py
    - zarr.py

@grumbling-tom
Copy link
Contributor

A direct TileDB --> Xarray DataArray operation is definitely a gap in this library's functionality. Very happy for you to take a look into this, @grumbling-tom - but it would definitely be worth checking the Xarray and tiledb-py issue trackers to see if there's ongoing work (probably in Xarray) that's already working to solve this problem

That's a great idea - there's a few places where it's been discussed. In xarray, it looks like they're waiting on labelled axes in TileDB before implementing it as a backend. That way, the implementation can be driven by the standard, rather then vice-versa (as mentioned here).

In the meantime, an opinionated TileDB->xarray operation here could be of more immediate use, as well as potentially informing the later xarray implementation. I reckon a separate issue to track that makes sense.

The broader bit of work to do here is to instil separation of concerns into the structure of the library

Ahhh, I see. That should probably come before any work on the TileDB --> Xarray DataArray operation, so that we don't then have to refactor that code as well.

If you think that makes sense, I can look into restructuring this library a little!

@DPeterK
Copy link
Contributor Author

DPeterK commented Oct 23, 2020

In xarray, it looks like they're waiting on labelled axes in TileDB before implementing it as a backend.

Yup, that makes sense! I know TileDB has a form of this available in the latest versions of TileDB, but I think they're still finalising the APIs for it.

an opinionated TileDB->xarray operation here could be of more immediate use

Agreed - everything this library does is opinionated right now, and to greater or lesser extent I think it will need to remain so. I feel that doesn't prevent this library's opinions coming in line with the opinions of others - not least TileDB and Xarray, in the future!

I can look into restructuring this library a little

By all means, if you're happy to do so!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants