Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New failure in test_apply_time_offset #320

Closed
spencerahill opened this issue Mar 30, 2019 · 5 comments
Closed

New failure in test_apply_time_offset #320

spencerahill opened this issue Mar 30, 2019 · 5 comments

Comments

@spencerahill
Copy link
Owner

After reverting to xarray 0.11, c.f. #319, the test suite loads without error but has one failure, which I've pasted below. @spencerkclark, since this seems cftime-related (or adjacent), I'm going to punt this one to you.

test_apply_time_offset ___________________________________________________________________

cls = <class 'pandas.core.arrays.datetimes.DatetimeArray'>
index = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-30 00:00:00']
Length: 2, dtype: datetime64[ns], freq = <MonthEnd>, kwargs = {}, inferred = None
on_freq = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-31 00:00:00']
Length: 2, dtype: datetime64[ns]

    @classmethod
    def _validate_frequency(cls, index, freq, **kwargs):
        """
        Validate that a frequency is compatible with the values of a given
        Datetime Array/Index or Timedelta Array/Index

        Parameters
        ----------
        index : DatetimeIndex or TimedeltaIndex
            The index on which to determine if the given frequency is valid
        freq : DateOffset
            The frequency to validate
        """
        if is_period_dtype(cls):
            # Frequency validation is not meaningful for Period Array/Index
            return None

        inferred = index.inferred_freq
        if index.size == 0 or inferred == freq.freqstr:
            return None

        try:
            on_freq = cls._generate_range(start=index[0], end=None,
                                          periods=len(index), freq=freq,
                                          **kwargs)
            if not np.array_equal(index.asi8, on_freq.asi8):
>               raise ValueError
E               ValueError

../../pandas/core/arrays/datetimelike.py:884: ValueError

During handling of the above exception, another exception occurred:

    def test_apply_time_offset():
        start = datetime.datetime(1900, 5, 10)
        years, months, days, hours = -2, 1, 7, 3
        # test lengths 0, 1, and >1 of input time array
        for periods in range(3):
            times = pd.date_range(start=start, freq='M', periods=periods)
            actual = apply_time_offset(xr.DataArray(times), years=years,
                                       months=months, days=days, hours=hours)
            desired = (times + pd.tseries.offsets.DateOffset(
>               years=years, months=months, days=days, hours=hours
            ))

test_utils_times.py:55:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../pandas/core/indexes/datetimelike.py:489: in __add__
    result = self._data.__add__(maybe_unwrap_index(other))
../../pandas/core/arrays/datetimelike.py:1190: in __add__
    result = self._add_offset(other)
../../pandas/core/arrays/datetimes.py:737: in _add_offset
    result = offset.apply_index(values)
pandas/_libs/tslibs/offsets.pyx:116: in pandas._libs.tslibs.offsets.apply_index_wraps.wrapper
    ???
../../pandas/tseries/offsets.py:278: in apply_index
    i = type(i)(shifted, freq=i.freq, dtype=i.dtype)
../../pandas/core/arrays/datetimes.py:351: in __init__
    type(self)._validate_frequency(self, freq)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'pandas.core.arrays.datetimes.DatetimeArray'>
index = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-30 00:00:00']
Length: 2, dtype: datetime64[ns], freq = <MonthEnd>, kwargs = {}, inferred = None
on_freq = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-31 00:00:00']
Length: 2, dtype: datetime64[ns]

    @classmethod
    def _validate_frequency(cls, index, freq, **kwargs):
        """
        Validate that a frequency is compatible with the values of a given
        Datetime Array/Index or Timedelta Array/Index

        Parameters
        ----------
        index : DatetimeIndex or TimedeltaIndex
            The index on which to determine if the given frequency is valid
        freq : DateOffset
            The frequency to validate
        """
        if is_period_dtype(cls):
            # Frequency validation is not meaningful for Period Array/Index
            return None

        inferred = index.inferred_freq
        if index.size == 0 or inferred == freq.freqstr:
            return None

        try:
            on_freq = cls._generate_range(start=index[0], end=None,
                                          periods=len(index), freq=freq,
                                          **kwargs)
            if not np.array_equal(index.asi8, on_freq.asi8):
                raise ValueError
        except ValueError as e:
            if "non-fixed" in str(e):
                # non-fixed frequencies are not meaningful for timedelta64;
                #  we retain that error message
                raise e
            # GH#11587 the main way this is reached is if the `np.array_equal`
            #  check above is False.  This can also be reached if index[0]
            #  is `NaT`, in which case the call to `cls._generate_range` will
            #  raise a ValueError, which we re-raise with a more targeted
            #  message.
            raise ValueError('Inferred frequency {infer} from passed values '
                             'does not conform to passed frequency {passed}'
>                            .format(infer=inferred, passed=freq.freqstr))
E           ValueError: Inferred frequency None from passed values does not conform to passed frequency M

../../pandas/core/arrays/datetimelike.py:897: ValueError
@spencerahill
Copy link
Owner Author

Actually there are two failing tests, both time-related; see e.g. https://travis-ci.org/spencerahill/aospy/jobs/513359222#L1440

@spencerkclark
Copy link
Collaborator

Huh...this seems to be a bug in the latest version of pandas.

In [1]: import pandas as pd

In [2]: times = pd.date_range('2000', periods=2, freq='M')

In [3]: times + pd.DateOffset(months=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    883             if not np.array_equal(index.asi8, on_freq.asi8):
--> 884                 raise ValueError
    885         except ValueError as e:

ValueError:

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-3-846e5412af1b> in <module>()
----> 1 times + pd.DateOffset(months=1)

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in __add__(self, other)
    487         def __add__(self, other):
    488             # dispatch to ExtensionArray implementation
--> 489             result = self._data.__add__(maybe_unwrap_index(other))
    490             return wrap_arithmetic_op(self, other, result)
    491

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py in __add__(self, other)
   1188         elif isinstance(other, DateOffset):
   1189             # specifically _not_ a Tick
-> 1190             result = self._add_offset(other)
   1191         elif isinstance(other, (datetime, np.datetime64)):
   1192             result = self._add_datetimelike_scalar(other)

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in _add_offset(self, offset)
    735             else:
    736                 values = self
--> 737             result = offset.apply_index(values)
    738             if self.tz is not None:
    739                 result = result.tz_localize(self.tz)

pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.apply_index_wraps.wrapper()

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/tseries/offsets.py in apply_index(self, i)
    276             if months:
    277                 shifted = liboffsets.shift_months(i.asi8, months)
--> 278                 i = type(i)(shifted, freq=i.freq, dtype=i.dtype)
    279
    280             weeks = (kwds.get('weeks', 0)) * self.n

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in __init__(self, values, dtype, freq, copy)
    349
    350         if inferred_freq is None and freq is not None:
--> 351             type(self)._validate_frequency(self, freq)
    352
    353     @classmethod

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    895             raise ValueError('Inferred frequency {infer} from passed values '
    896                              'does not conform to passed frequency {passed}'
--> 897                              .format(infer=inferred, passed=freq.freqstr))
    898
    899     # monotonicity/uniqueness properties are called via frequencies.infer_freq,

ValueError: Inferred frequency None from passed values does not conform to passed frequency M

We're actually already working around it implicitly internally in apply_time_offset, because we automatically cast the DatetimeIndex with a specified frequency to one without a specified frequency. When I get a chance, I'll report this upstream, but I don't think it's a major concern. In the meantime I'll submit a PR to fix the failing test with a workaround.

@spencerkclark
Copy link
Collaborator

Actually there are two failing tests, both time-related; see e.g. https://travis-ci.org/spencerahill/aospy/jobs/513359222#L1440

Oddly I was not able to reproduce this other failing test, and when I re-ran the build that failure disappeared. We'll see if it comes up again.

@spencerkclark
Copy link
Collaborator

It turns out the other bug was in fact real :). I locally created a new environment based on our CI setup for the failing build and was able to reproduce the problem. It was made harder to find because we were mutating the state of the input argument, and the order of the variables in the set time_indexed_vars ended up dictating whether the function would succeed or not (so sometimes it would, and sometimes it wouldn't).

Ultimately I think this stems from the following change in behavior in xarray between 0.11.3 and 0.12.0. For a little more context on why this is relevant to this issue, see this gist.

xarray 0.11.3

In [1]: import xarray as xr

In [2]: da = xr.DataArray([1], [('x', [0])], name='a')

In [3]: da.indexes
Out[3]: x: Int64Index([0], dtype='int64', name='x')

In [4]: da.isel(x=0).expand_dims('x').indexes
Out[4]: x: Int64Index([0], dtype='int64', name='x')

In [5]: da.to_dataset().isel(x=0).expand_dims('x').a.indexes
Out[5]: x: Int64Index([0], dtype='int64', name='x')

xarray 0.12

Note the difference in behavior between line 5 here and line 5 above.

In [1]: import xarray as xr

In [2]: da = xr.DataArray([1], [('x', [0])], name='a')

In [3]: da.indexes
Out[3]: x: Int64Index([0], dtype='int64', name='x')

In [4]: da.isel(x=0).expand_dims('x').indexes
Out[4]: x: Int64Index([0], dtype='int64', name='x')

In [5]: da.to_dataset().isel(x=0).expand_dims('x').a.indexes
Out[5]:

I'm pretty sure this is a regression in xarray, but I'll sleep on it before reporting it.

@spencerkclark
Copy link
Collaborator

Interestingly the following simpler variant works in xarray 0.12:

In [18]: da = xr.DataArray(1, coords={'x': 0}, name='a')

In [19]: da.expand_dims('x').indexes
Out[19]: x: Int64Index([0], dtype='int64', name='x')

In [20]: da.to_dataset().expand_dims('x').a.indexes
Out[20]: x: Int64Index([0], dtype='int64', name='x')

So apparently the isel step is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants