-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove code related to deprecated "autoclose" option #501
Conversation
xarray no longer supports this option.
@pwolfram, I'm getting warnings that this option will be removed in the next version of xarray and I'm wanting to get this taken care of before we run into trouble down the road. I'm not sure what this means for xarray functionality but by moving to NCO for several of the operations we were once using xarray for (e.g. the MOC) I don't think we need to worry about autoclose anymore. |
Just reiterating a comment that I have made in the past that the xarray developers would be very keen to hear about what motivated this change and how xarray was falling short. This will help us continue to improve xarray. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look at this PR. Our code has largely moved away from use of xarray in favor of NCO for many applications so this may not be a big deal. However, I'm a little cautious about completely removing this capability and would recommend replacing it with the current best practice approach advocated by the xarray team.
I'm following up at pydata/xarray#2592 on suggestions for handling a large number of datasets in xarray in terms of "best practices" so that we may be able to maintain some of this functionality if needed.
I suspect in the future we will still want to have these hooks to use xarray as long as they aren't too difficult to maintain in the short term.
@rabernat, a key thing was performance on things like climatological anomalies, which was noted in the past as a serious issue needing rectification. Some other issues related to complicated computation stacks also came into play, e.g., having a multi-dimensional groupby operations. I'm not sure if those were implemented but a quick look suggests some degree of this functionality has been added, e.g., http://xarray.pydata.org/en/stable/groupby.html#multidimensional-grouping. Another issue was the need to get this up and running and produce stable results, so we had to go the NCO route over the last 18 months. It sounds like we are due a revisiting of this conversation. I'm not sure if github or a skype call would be better but am more than happy to follow up with you, @mrocklin, @jhamman on these issues. I just didn't have adequate time to follow up previously in the last 3-6 months or so but will make the time now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, I'm thinking replacing instances of autoclose=True
to file_cache_maxsize=1200
would be a good default (100 years of monthly files) that could be modified via the config files we use. I would recommend we do not remove support of xarray's opening of many datasets in favor of this more general approach.
@rabernat, there was also the |
Thanks for the feedback folks. Very helpful. If you can boil any of these down to reproducible examples, it would be great to have xarray issues opened so we can keep track of them. But it sounds like these issues are a bit more complex and involve specific aspects of the datasets / computational environment. The recent refactoring of xarray's file management will hopefully totally solve the "too many open files" issue. Also, the Of course you have a deadline and have to make things work with the tools as they are. In the long term, I hope xarray can improve to the point that it becomes useful for your needs. |
...and use it to set the file_cache_maxsize option in xarray.
@pwolfram, let me know if my latest commit addresses your concern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy now that we have transition from autoclose
to file_cache_maxsize=1200
. Thanks @xylar for keeping the xarray door open.
# maximum number of open files to hold in xarray’s global least-recently-usage | ||
# cache. This should be smaller than your system’s per-process file descriptor | ||
# limit, e.g., ulimit -n on Linux | ||
file_cache_maxsize = 1200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
Worked for me when I tested with python 3.7 (with the bug fix from #500) |
Thanks, @pwolfram |
Fix unsupported file_cache_maxsize for older xarray Older versions of xarray are not supported after #501, which is undesirable and unnecessary. All that is needed to fix this is to not raise and exception if file_cache_maxsize is not supported in a given xarray version.
xarray no longer supports this option.