-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweaks for opening datasets #895
Conversation
When using the PyNIO engine, if you pass open_dataset() a file name that doesn't end with the extension, it will complain. The complaint can't be shut off using the built-in warnings filter. Passing PyNIO a format turns off the complaint.
Loading only the variables you'll need speeds up I/O for large datasets.
I like (2) - I once needed this functionality and had to infer the
|
Maybe |
@@ -81,7 +81,8 @@ def check_name(name): | |||
def open_dataset(filename_or_obj, group=None, decode_cf=True, | |||
mask_and_scale=True, decode_times=True, | |||
concat_characters=True, decode_coords=True, engine=None, | |||
chunks=None, lock=None, drop_variables=None): | |||
chunks=None, lock=None, drop_variables=None, | |||
only_variables=None, format=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default should be format=None
.
This looks like nice functionality! The main thing it needs is tests. Let me know if you have any questions about how to go about writing those. I also like the name |
Would anyone like to take this up? I was going to close as stale, but it's fairly close! |
Great, @mathause - all yours, given no response from OP |
I tweaked the open_dataset() and open_mfdataset() functions for better performance with the PyNIO engine.
format
option for.only_variables
which specifies which variables to load from the dataset in the event you don't want to load all variables. Say, for example, I have a dataset with 47 variables in it, but I only need 3 of them. If the data are not cached, then only loading the 3 I need cuts the I/O time in half. If they are cached, then loading only the 3 takes 20% of the time to load the full dataset. Theonly_variables
option behaves pretty similarly todrop_variables
. The default is to load all variables.