Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time-series scripts use time bounds properly #48

Merged
merged 2 commits into from
Dec 12, 2016

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Nov 28, 2016

This merge updates the 3 time-series analysis scripts to make use
of the time series start and end dates, computed via timeseries_yr1
and timeseries_yr2, allowing analysis of a subset of the output
data.

To accomplish this, this merge adds the method to_datetime to
the Date class. This method can be used (optionally with an offset
year) to convert a Date to a datetime object. The date is
clamped to the valid range supported by numpy's datetime64[ns],
used by xarray and pandas.

@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2016

This PR should not be merged until after #47 and is intended to address #45.

@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2016

Testing

I verified that the results of the 3 time-series analysis scripts are bit-for-bit identical to those from #47 for both ACME alpha7 and alpha8 output. I didn't test the model vs. observations scripts, since they are unchanged.

I ran both alpha7 and alpha8 tests with just years 1-3 and an alpha8 test with years 3-5, see results in the following directories:
http://portal.nersc.gov/project/acme/xylar/fix_time_series_subset_1-3/
http://portal.nersc.gov/project/acme/xylar/fix_time_series_subset_3-5/
The results are limited to the first 2 years in the first case and the 2 years after that in the second case, as expcted.

@xylar xylar force-pushed the fix_time_series_subset branch 3 times, most recently from 17db7a9 to 7cf12f9 Compare November 28, 2016 16:08
@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2016

I just added additional tests to make sure the to_datetime and to_timedelta method of Date behave as expected.

@milenaveneziani
Copy link
Collaborator

@xylar, @pwolfram: just wanted to let you know that, while testing the ACME script on rhea, I hit the xarray mfdatasets error: 'too many open files'... this was while trying to open 100 years of monthly files.
This PR will help with that in the sense that we can reduce the number of files to open, but I think it is a big limitation to not be able to display time series longer than 100 years.
I am aware of the open issue on the xarray side: @pwolfram, do you think it will be solved some time soon?

@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2016

@milenaveneziani,

This PR will help with that in the sense that we can reduce the number of files to open

That's only true if you choose to plot a time series from a subset of files.

do you think it will be solved some time soon?

I took a look a the issue on the xarray GitHub site (pydata/xarray#463) and it doesn't look like there's been any activity on this issue since June. We may need to find our own workaround such as concatenating files together (see this recommended tool: https://github.com/NCAR/PyReshaper)

@pwolfram
Copy link
Contributor

@milenaveneziani and @xylar, note that Kevin Paul (@kmpaul) of https://github.com/NCAR/PyReshaper was at the AOSPY workshop and it would be great to be able to work with him because it is likely that we will need PyReshaper, if not now, at some point.

I'm also following up with the xarray folks (namely @shoyer and @mrocklin) to get some advice on the best way to proceed. See also pydata/xarray#444

@pwolfram
Copy link
Contributor

It sounds like the time to live with the open file limitation of xarray has passed and we should champion a solution. Do you both agree @milenaveneziani and @xylar?

@milenaveneziani
Copy link
Collaborator

Yes, I agree @pwolfram: I ran into that problem with the full alpha6 run, and future runs are likely to span a period longer than 100 years. Plus, it is a machine-memory-dependent problem so perhaps on smaller machines the cut is even on a smaller number of files.

@kmpaul
Copy link

kmpaul commented Nov 29, 2016

@milenaveneziani, @xylar, @pwolfram: I'm happy to help you with the PyReshaper, if you need it. I'm hoping to have a new release before 2017, incidentally, but that won't prevent it from helping you now.

@pwolfram
Copy link
Contributor

Thanks @kmpaul for following up with us. Ideally we would fix this issue upstream in xarray / dask but the best path forward is not yet clear to me yet and I need to look into this further.

@kmpaul
Copy link

kmpaul commented Nov 29, 2016 via email

@milenaveneziani
Copy link
Collaborator

@xylar: once we merge #47, it will be difficult to test this PR until we also merge #50 and #52. So, does it seem OK with you that we do all bunch of testing after those PR's are also merged?
I would do so under the MAPS-Analysis framework first, and then in ACME before we update the submodule.

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2016

@milenaveneziani, I think this PR should be independent of #50 and #52. I tested it myself before making those changes. So I think this PR should be tested independently of those two.

I agree to some extent that we will test this PR more rigorously as we test those other PRs but I do not think it will be helpful to us to try to test all three at once. Instead, I think it is important to make sure this PR is robust without those other changes, at least to the extent we can.

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2016

@milenaveneziani wrote:

I would do so under the MAPS-Analysis framework first, and then in ACME before we update the submodule.

I don't think were at the point of worrying about ACME just yet. Once #52 has been completed, tested and merged, we can think about testing develop in ACME/PreAndPostProcessingScripts but for now I don't think we need to worry about testing in ACME analysis until we get things working with MPAS-only analysis.

@milenaveneziani
Copy link
Collaborator

@xylar: I guess what I meant is that, once we merge #47, we won't be able to test on alpha7 or alpha8 because #47 is not backward compatible as is (it will be once #50 is merged as well). And we can't test using alpha9 data either because we are missing #20.
So one option would be to test on some standalone mpas-o output.

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2016

@milenaveneziani, I'm not quite sure why this would be the case. This PR is intended to be compatible with both alpha7 and alpha8. What aspect are you concerned is not backward compatible? It supports both time variables of type daysSinceStartOfSim (either with a units = days as in MPAS-O or without it as in current MPAS-SI versions) and of type xtime, as required for alpha7 and alpha8. I tested on both, as I state above.

@milenaveneziani
Copy link
Collaborator

@xylar: you are right. I recalled that xtime_start and xtime_end are indeed supported in alpha, so we can test #47 and this PR on alpha output.
About the testing in ACME: yes, what I meant is that, after we merge all these PR's (up to #52, and perhaps including one for the SSS analysis that @vanroekel added), we should then do the testing in ACME.

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2016

Okay, sounds good. I think we're on the same page.

@xylar xylar force-pushed the fix_time_series_subset branch 2 times, most recently from 3e166dc to 3c52f04 Compare December 7, 2016 23:28
@xylar xylar removed the in progress label Dec 7, 2016
@xylar xylar force-pushed the fix_time_series_subset branch from 3c52f04 to 77d2e3a Compare December 8, 2016 06:53
@milenaveneziani
Copy link
Collaborator

@xylar: this needs to be rebased too before testing, right?

@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2016

@milenaveneziani, I rebased it already, but the answer in general is, no, branches don't need to be rebased for testing. It's just convenient on GitHub itself that the commits that have already been merged disappear.

To test, you would want to do a test merge into your local checkout of develop and that would work whether I rebased or not.

Anyway, I did rebase already this morning, so you're good to go.

hour=0, minute=0, second=0)
maxDate = datetime.datetime(year=2262, month=1, day=1,
hour=0, minute=0, second=0)
outDate = max(minDate, min(maxDate, outDate))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should issue a warning if we are on the clipping boundary so the unawares user will know there may be a potential problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that this should be a warning. Typically want the end year of the time series analysis to be 9999 so we don't have to think about it. This is the function that makes sure that 9999 + 1849 still gives you a reasonable year that the analysis can handle. If you didn't want the clamping and the year offset, you wouldn't bother calling this function.

If I add a warning, it's going to be annoying that there are warnings throughout our output even though I'm intentionally using this function to clamp the date. The whole point of clamping the date here is so I don't have to do it elsewhere and so I can have better code reuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see what you mean. We really aren't at a point where our simulations will go past 2262 anyway so I'm ok leaving this as-is for now. However, I'm not sure if we can necessarily plan on this for the future (in general) because I know Jeremy runs multi-century simulations. But I agree this is probably sufficient for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, we don't support going past 100 years right now (because of limited numbers of files) so I think 2262 is the least of our concerns...

But also fixing Date with a new min/max allowed date will be super easy once xarray and pandas support a wider range. For now, we're at their mercy and this function is really only a side effect of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Thanks for the reminder about the file limitation issue. I need to take a closer look at that once I get a chance.

timedelta2 = datetime.timedelta(days=20)
self.assertEqual(timedelta1, timedelta2)

date = Date(dateString='0001-01-01', isInterval=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a comment above this line reminding the reader that the date is specified based on the quasi-arbitrary xarray datetime boundary.

datetime2 = datetime.datetime(year=1850, month=1, day=1)
self.assertEqual(datetime1, datetime2)

date = Date(dateString='9999-01-01', isInterval=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the clipping capability, right? I think we should note this here and if there is a warning thrown we should make sure it is what we expect to get.

xylar added 2 commits December 8, 2016 23:39
The to_datetime method can be used (optionally with an offset year)
to convert a Date to a datetime.  The date is clamped to a valid
range supported by numpy's datetime64[ns], used by xarray and
pandas.  This method will be useful for computing datetime bounds on
time-series data sets.

The to_timedelta method converts a Date object with
isInterval==True to a timedelta object.
This merge updates the 3 time-series analysis scripts to make use
of the time series start and end dates, computed via timeseries_yr1
and timeseries_yr2, allowing analysis of a subset of the output
data.
@xylar xylar force-pushed the fix_time_series_subset branch from 419df44 to 9f2a10c Compare December 8, 2016 22:40
@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2016

This branch has been rebased yet again following #57

@xylar
Copy link
Collaborator Author

xylar commented Dec 9, 2016

@milenaveneziani, please review and merge at your earliest convenience.

@milenaveneziani
Copy link
Collaborator

@xylar, @pwolfram: I tested this on alpha7 and alpha8, changing yr1 and yr2 in various ways (starting from 1 or not, ending with 9999 or other number). I always get the plots I'd expect. One thing I noticed is that, when starting from a year different from 1, the plots x-axis still starts from year 1: we will probably want to change this in the future (for example, when splitting the time series plots for sea-ice, when the time series are too long for xarray to handle or for seasonal cycles to be visible).
Merging this now.

@xylar
Copy link
Collaborator Author

xylar commented Dec 12, 2016

@milenaveneziani, could you report the incorrect x axis as an issue? I'll fix it as soon as I have some time.

@xylar xylar deleted the fix_time_series_subset branch December 12, 2016 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants