-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get data from a variable based on a range in associated coordinate variables #143
Comments
Hi @lhmarsden , currently there is no easy way to specify array subsets using data values, because the If a user knows the name of the coordinate variables (e.g. from Otherwise, they can use As you can see, RNetCDF is still quite a low-level interface to netcdf. It would be possible to add your requested feature, although it may be more suitable for a higher level package like |
there's only one dimension in this file, PRES I think if you read PRES, then use that for start and count ranges in the TEMP, but you might as well read both in whole, and do your own filtering in memory. TEMP <- var.get.nc(data, "TEMP")
PRES <- var.get.nc(data, "PRES") There's higher level tools in tidync which uses ncmeta (tidync is similar to but nowhere near as powerful as xarray). It's possible to explore what will be returned (via start/min/max values in the printout) from hyper_filter, but note we only have one grid in this file - when there are more you need library(tidync)
library(dplyr)
nc <- tidync(netcdf_file)
## here we stop being lazy and pull data
hyper_tibble(hyper_filter(nc, PRES = between(PRES, 15, 30)), select_var = c("PRES", "TEMP"))
... attempting remote connection
Connection succeeded.
# A tibble: 16 × 2
PRES TEMP
<dbl> <dbl>
1 15 3.83
2 16 3.81
3 17 3.79
4 18 3.79
5 19 3.81
6 20 3.85
7 21 3.86
8 22 3.83
9 23 3.82
10 24 3.82
11 25 3.81
12 26 3.81
13 27 3.79
14 28 3.71
15 29 3.70
16 30 3.64 That's quite convenient but it doesn't take long until you run into limitations (compared to xarray). Also it might be plagued by library or build variations on different OS ... it could do with a refactor but frankly I'm probably more inclined to use xarray via reticulate these days :) HTH |
Thanks @mdsumner. I have often considered adding higher-level capabilities to RNetCDF, but given limited time, my main focus has been on improving support for features of NetCDF datasets (e.g. nested user-defined types) while maintaining backward compatibility and portability. I also doubt that RNetCDF is the right starting point for a user-friendly, high-level interface to NetCDF. Although most of the code was rewritten for version 2.0, the core API of this package is almost 20 years old this year! I suspect that it would be easier to provide a convenient API from a more modern package, running RNetCDF under the hood. The Part of the challenge is that NetCDF is so flexible. It provides a variety of features, with few restrictions on how they are used. RNetCDF aims to support the low-level capabilities, no matter how they are used or abused. A higher-level API will often need to assume that NetCDF conventions have been followed, but there are several different conventions, they continue to evolve, and they are sometimes misused or misinterpreted. Where RNetCDF has strayed into supporting conventions, it has become messy - as in the numerous options for handling missing values. For this reason, a separation of concerns may be beneficial, with one or more packages implementing the NetCDF conventions, all leveraging the lower-level (and hopefully more stable) RNetCDF interface. I would appreciate your thoughts on this issue. Should RNetCDF be expanded, or should it be enhanced by other packages? |
I don't think it should expand, matching the API is good. Big picture it's a shame there's a few downstream high level APIs, a shame that work wasn't done at a lower level. But that's where we're at , xarray is very young still and very python focused only so far, but I think it's impact is here to stay and will deepen. stars is good, but it's not fleshed out enough for the RNetCDF-as-engine workflow, perhaps effort there would be most effective. |
Food for thought Michael, thanks again. |
I still haven't decided whether or not to add the requested feature. I think the changes involved are relatively simple, and they should apply to most netcdf datasets, although they would only be meaningful for numeric data types. One problem is that I am unlikely to work on the changes before @lhmarsden needs to run the tutorial series. If someone submits a PR with suitable changes, I would consider including them in the package. |
It was interesting to read through the discussion above and to learn about how these things develop. Thanks! You may be interested in this video tutorial that I have now posted to YouTube. Use it however you like. In it I explain how to open a NetCDF file using RNetCDF, understand the contents and extract the data and metadata. I also discuss the CF and ACDD conventions. https://youtu.be/Xer1XBm3sns?si=VfrDZIrkrzuKcy4F This will be the first in a series of videos working with NetCDF files in R, including Each video is accompanied by a chapter in this Jupyter book. You can get a sneak peak at the code and explanations for the next few chapters here: |
How about this:
Note that
Note that @mjwoods Also check out package |
This looks very interesting, @pvanlaake. I look forward to seeing how the project progresses! |
Many thanks to you both, @lhmarsden and @pvanlaake . It is very satisfying to see the training materials and advanced packages building on RNetCDF. |
I think RNetCDF contains the functionality needed by @pvanlaake to create |
On the matter of time conversions, I have mainly continued supporting |
Hi Milton, package On time format conversions, Best, |
Note that much of what is needed is baked into |
Hi Dave @dblodgett-usgs, the focus of Similar easy access to ZARR would be great, looking forward to seeing that coming to fruition. |
Sounds great @pvanlaake ! Have you seen https://hypertidy.github.io/ncmeta/reference/index.html ? In particular, we put some handling for finding coordinate variables and grid mappings in My hope is that |
Append "#mode=zarr" for using RNetCDF I don't know if that works for real remote sources, like the dodsC/ vs fileServer/ thing 🤔 |
Firstly, apologies if what I am asking for is already possible...
When loading in data from a NetCDF file, it is often useful to be able to select only a subset of the data. Using RNetCDF, I can do this:
This give me 3 values from the TEMP variable starting with the 5th value in the array.
However, often I want to be able to extract data based on some coordinate range. For example, working with the data shown above, I might want to retrieve the data between 10 and 20 metres depth.
In python xarray, it is possible to select a subset of the whole xarray object using sel
https://docs.xarray.dev/en/latest/generated/xarray.DataArray.sel.html
This has a number of useful features, and even allows you to select the nearest data to the coordinate that you have provided as an argument.
I think something similar would be very helpful in RNetCDF, and I haven't seen how to do this in the documentation - though I might have missed something.
I am currently writing a tutorial series showing people how to work with NetCDF files using R, and I would like to use RNetCDF throughout this course. This is something that I would like to include in this.
For reference, here is my equivalent course to teach people how to work with NetCDF files in Python, predominantly using xarray:
https://lhmarsden.github.io/NetCDF_in_Python_from_beginner_to_pro/intro.html
A course in R will follow the same structure with most of the same chapters.
The text was updated successfully, but these errors were encountered: