Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove code related to deprecated "autoclose" option #501

Merged
merged 2 commits into from
Dec 5, 2018

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Dec 2, 2018

xarray no longer supports this option.

xarray no longer supports this option.
@xylar xylar added the clean up label Dec 2, 2018
@xylar xylar self-assigned this Dec 2, 2018
@xylar xylar requested a review from pwolfram December 2, 2018 07:34
@xylar
Copy link
Collaborator Author

xylar commented Dec 2, 2018

@pwolfram, I'm getting warnings that this option will be removed in the next version of xarray and I'm wanting to get this taken care of before we run into trouble down the road.

I'm not sure what this means for xarray functionality but by moving to NCO for several of the operations we were once using xarray for (e.g. the MOC) I don't think we need to worry about autoclose anymore.

@rabernat
Copy link

rabernat commented Dec 5, 2018

I'm not sure what this means for xarray functionality but by moving to NCO for several of the operations we were once using xarray for (e.g. the MOC) I don't think we need to worry about autoclose anymore.

Just reiterating a comment that I have made in the past that the xarray developers would be very keen to hear about what motivated this change and how xarray was falling short. This will help us continue to improve xarray.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at this PR. Our code has largely moved away from use of xarray in favor of NCO for many applications so this may not be a big deal. However, I'm a little cautious about completely removing this capability and would recommend replacing it with the current best practice approach advocated by the xarray team.

I'm following up at pydata/xarray#2592 on suggestions for handling a large number of datasets in xarray in terms of "best practices" so that we may be able to maintain some of this functionality if needed.

I suspect in the future we will still want to have these hooks to use xarray as long as they aren't too difficult to maintain in the short term.

@pwolfram
Copy link
Contributor

pwolfram commented Dec 5, 2018

@rabernat, a key thing was performance on things like climatological anomalies, which was noted in the past as a serious issue needing rectification. Some other issues related to complicated computation stacks also came into play, e.g., having a multi-dimensional groupby operations. I'm not sure if those were implemented but a quick look suggests some degree of this functionality has been added, e.g., http://xarray.pydata.org/en/stable/groupby.html#multidimensional-grouping. Another issue was the need to get this up and running and produce stable results, so we had to go the NCO route over the last 18 months.

It sounds like we are due a revisiting of this conversation. I'm not sure if github or a skype call would be better but am more than happy to follow up with you, @mrocklin, @jhamman on these issues. I just didn't have adequate time to follow up previously in the last 3-6 months or so but will make the time now.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, I'm thinking replacing instances of autoclose=True to file_cache_maxsize=1200 would be a good default (100 years of monthly files) that could be modified via the config files we use. I would recommend we do not remove support of xarray's opening of many datasets in favor of this more general approach.

@milenaveneziani
Copy link
Collaborator

@rabernat, there was also the Too many open files issue that we tried to solve, but the solution was machine/memory dependent.

@rabernat
Copy link

rabernat commented Dec 5, 2018

Thanks for the feedback folks. Very helpful.

If you can boil any of these down to reproducible examples, it would be great to have xarray issues opened so we can keep track of them. But it sounds like these issues are a bit more complex and involve specific aspects of the datasets / computational environment.

The recent refactoring of xarray's file management will hopefully totally solve the "too many open files" issue. Also, the parallel=True open on open_mfdataset can really speed up opening large filesets (http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html#xarray.open_mfdataset).

Of course you have a deadline and have to make things work with the tools as they are. In the long term, I hope xarray can improve to the point that it becomes useful for your needs.

@jhamman
Copy link

jhamman commented Dec 5, 2018

@pwolfram - happy to revisit this conversation in the near future.

@xylar - it looks like you'll be at AGU (link). @rabernat and I will be there as well. Maybe worth intersecting at some point.

...and use it to set the file_cache_maxsize option in xarray.
@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2018

@pwolfram, let me know if my latest commit addresses your concern.

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2018

@jhamman and @rabernat, yes, it would be great to chat at AGU next week.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy now that we have transition from autoclose to file_cache_maxsize=1200. Thanks @xylar for keeping the xarray door open.

# maximum number of open files to hold in xarray’s global least-recently-usage
# cache. This should be smaller than your system’s per-process file descriptor
# limit, e.g., ulimit -n on Linux
file_cache_maxsize = 1200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2018

Worked for me when I tested with python 3.7 (with the bug fix from #500)

@xylar xylar merged commit 9fe4b14 into MPAS-Dev:develop Dec 5, 2018
@xylar xylar deleted the remove_autoclose branch December 5, 2018 22:47
@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2018

Thanks, @pwolfram

xylar added a commit that referenced this pull request Jan 10, 2019
Fix unsupported file_cache_maxsize for older xarray

Older versions of xarray are not supported after #501, which is undesirable and unnecessary. All that is needed to fix this is to not raise and exception if file_cache_maxsize is not supported in a given xarray version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants