-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray crashes when opening too many files are opened #49
Comments
This issue is related to pydata/xarray#463 and is discussed in #48. |
@milenaveneziani and @pwolfram, I wanted to move discussion of this issue out of #48, since it is not directly related to that PR. |
Following up on #48 (comment) @pwolfram wrote:
@kmpaul wrote:
@pwolfram and @milenaveneziani, this is a fair point. Is there a reason we don't generate a large number of streams, each with a small number of variables but with the complete time series of those variables? This is what we often want for analysis. Some of our tools (e.g. the paraview extractor) would need to take a list of files, each with separate variables instead of (or perhaps in addition to) separate time stamps. But I don't see any reason not to consider moving in that direction. I won't be at this week's ACME ice/ocean meeting but maybe this could be brought up for discussion. |
@milenaveneziani and @pwolfram, has there been any further discussion of this at LANL? Is there a possibility of storing yearly output files instead of monthly, for example? This would be one of the easiest ways around the problem. |
@xylar: that is a possibility, although we have to be careful that we don't create too large output files. Should be OK with sea-ice output, but with mpas-o 3d variables it may be less desirable depending on the model resolution. |
A solution there might be to break streams into smaller sub-streams. I think this is rather easy to do -- I've created new streams for land-ice variables, for example. For analysis, it is far better to have many time indices and few variables in a given file than our current layout. (For writing files from the model, presumably the opposite is true.) As I understand it, this is the essence of @kmpaul's suggestion. In the absence of such an approach, we will presumably need a tool like PyReshaper, but that adds an extra step that would be nice to avoid.
This sounds like a reasonable alternative. That is where #48 will actually be important. It should also be possible to read 100 years of data, compute with it, and store it a temporary array, then do the same wiht the next 100 years of data. But that'll require some non-trivial editing. |
@xylar and @milenaveneziani pydata/xarray#1198 should fix this issue provided that use of |
Thanks @pwolfram: this is great! |
Addressed by pydata/xarray#1198 and #151. |
we can always re-open if something else needs to be done after new xarray release. |
While testing the ACME script on rhea, @milenaveneziani hit the xarray mfdatasets error: 'too many open files'. This was while trying to open 100 years of monthly files. Not being able to display time series longer than 100 years is a big limitation on ACME analysis.
The text was updated successfully, but these errors were encountered: