Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected decoded time in xarray >= 0.10.1 #2002

Closed
JanisGailis opened this issue Mar 21, 2018 · 8 comments
Closed

Unexpected decoded time in xarray >= 0.10.1 #2002

JanisGailis opened this issue Mar 21, 2018 · 8 comments

Comments

@JanisGailis
Copy link

JanisGailis commented Mar 21, 2018

Problem description

Given the original time dimension:

ds = xr.open_mfdataset("C:\\Users\\janis\\.cate\\data_stores\\local\\local.SST_should_fail\\*.nc", decode_cf=False)
<xarray.DataArray 'time' (time: 32)>
array([788961600, 789048000, 789134400, 789220800, 789307200, 789393600,
       789480000, 789566400, 789652800, 789739200, 789825600, 789912000,
       789998400, 790084800, 790171200, 790257600, 790344000, 790430400,
       790516800, 790603200, 790689600, 790776000, 790862400, 790948800,
       791035200, 791121600, 791208000, 791294400, 791380800, 791467200,
       791553600, 791640000], dtype=int64)
Coordinates:
  * time     (time) int64 788961600 789048000 789134400 789220800 789307200 ...
Attributes:
    standard_name:  time
    axis:           T
    comment:        
    bounds:         time_bnds
    long_name:      reference time of sst file
    _ChunkSizes:    1
    units:          seconds since 1981-01-01
    calendar:       gregorian

Produces this decoded time dimension with xarray >= 0.10.1:

ds = xr.open_mfdataset("C:\\Users\\janis\\.cate\\data_stores\\local\\local.SST_should_fail\\*.nc", decode_cf=True)
<xarray.DataArray 'time' (time: 32)>
array(['1981-01-01T00:00:00.627867648', '1980-12-31T23:59:58.770774016',
       '1981-01-01T00:00:01.208647680', '1980-12-31T23:59:59.351554048',
       '1981-01-01T00:00:01.789427712', '1980-12-31T23:59:59.932334080',
       '1980-12-31T23:59:58.075240448', '1981-01-01T00:00:00.513114112',
       '1980-12-31T23:59:58.656020480', '1981-01-01T00:00:01.093894144',
       '1980-12-31T23:59:59.236800512', '1981-01-01T00:00:01.674674176',
       '1980-12-31T23:59:59.817580544', '1980-12-31T23:59:57.960486912',
       '1981-01-01T00:00:00.398360576', '1980-12-31T23:59:58.541266944',
       '1981-01-01T00:00:00.979140608', '1980-12-31T23:59:59.122046976',
       '1981-01-01T00:00:01.559920640', '1980-12-31T23:59:59.702827008',
       '1981-01-01T00:00:02.140700672', '1981-01-01T00:00:00.283607040',
       '1980-12-31T23:59:58.426513408', '1981-01-01T00:00:00.864387072',
       '1980-12-31T23:59:59.007293440', '1981-01-01T00:00:01.445167104',
       '1980-12-31T23:59:59.588073472', '1981-01-01T00:00:02.025947136',
       '1981-01-01T00:00:00.168853504', '1980-12-31T23:59:58.311759872',
       '1981-01-01T00:00:00.749633536', '1980-12-31T23:59:58.892539904'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1981-01-01T00:00:00.627867648 ...
Attributes:
    standard_name:  time
    axis:           T
    comment:        
    bounds:         time_bnds
    long_name:      reference time of sst file
    _ChunkSizes:    1

Expected Output

With xarray == 0.10.0 the output is as expected:

ds = xr.open_mfdataset("C:\\Users\\janis\\.cate\\data_stores\\local\\local.SST_should_fail\\*.nc",
                       decode_cf=True)
<xarray.DataArray 'time' (time: 32)>
array(['2006-01-01T12:00:00.000000000', '2006-01-02T12:00:00.000000000',
       '2006-01-03T12:00:00.000000000', '2006-01-04T12:00:00.000000000',
       '2006-01-05T12:00:00.000000000', '2006-01-06T12:00:00.000000000',
       '2006-01-07T12:00:00.000000000', '2006-01-08T12:00:00.000000000',
       '2006-01-09T12:00:00.000000000', '2006-01-10T12:00:00.000000000',
       '2006-01-11T12:00:00.000000000', '2006-01-12T12:00:00.000000000',
       '2006-01-13T12:00:00.000000000', '2006-01-14T12:00:00.000000000',
       '2006-01-15T12:00:00.000000000', '2006-01-16T12:00:00.000000000',
       '2006-01-17T12:00:00.000000000', '2006-01-18T12:00:00.000000000',
       '2006-01-19T12:00:00.000000000', '2006-01-20T12:00:00.000000000',
       '2006-01-21T12:00:00.000000000', '2006-01-22T12:00:00.000000000',
       '2006-01-23T12:00:00.000000000', '2006-01-24T12:00:00.000000000',
       '2006-01-25T12:00:00.000000000', '2006-01-26T12:00:00.000000000',
       '2006-01-27T12:00:00.000000000', '2006-01-28T12:00:00.000000000',
       '2006-01-29T12:00:00.000000000', '2006-01-30T12:00:00.000000000',
       '2006-01-31T12:00:00.000000000', '2006-02-01T12:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2006-01-01T12:00:00 2006-01-02T12:00:00 ...
Attributes:
    standard_name:  time
    axis:           T
    comment:        
    bounds:         time_bnds
    long_name:      reference time of sst file
    _ChunkSizes:    1

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 32 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

xarray: 0.10.1
pandas: 0.22.0
numpy: 1.14.2
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.1
distributed: 1.21.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 39.0.1
pip: 9.0.2
conda: None
pytest: 3.1.3
IPython: 6.2.1
sphinx: None

@JanisGailis
Copy link
Author

#1859 seems to be related.

@JanisGailis
Copy link
Author

Actual data can be retrieved from here:
ftp://anon-ftp.ceda.ac.uk/neodc/esacci/sst/data/lt/Analysis/L4/v01.1

@fujiisoup
Copy link
Member

I think it is related to #1932.
Can you try xarray=0.10.2?

@JanisGailis
Copy link
Author

Thanks for looking into this!

I did try 0.10.2, same result as 0.10.1.

@fujiisoup
Copy link
Member

Could you try investigating minimal working example?
I tried with two of your examples, but the issue is not reproduced

In [9]: ds = xr.open_mfdataset(['Desktop/19910901120000-ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_LT-v02.0-fv01.1.nc', 'Desktop/19910902120000-ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_LT-v02.0-fv01.1.nc'], decode_cf=True)
In [11]: ds['time']
Out[11]: 
<xarray.DataArray 'time' (time: 2)>
array(['1981-01-01T00:00:00.564166656', '1980-12-31T23:59:58.707073024'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1981-01-01T00:00:00.564166656 ...
Attributes:
    standard_name:  time
    axis:           T
    bounds:         time_bnds
    comment:        
    long_name:      reference time of sst file

@JanisGailis
Copy link
Author

You have just reproduced the issue.

The correct datetime values are in the filenames. So, you open two files, one from 1991-09-01T12:00:00.00 and the other from 1991-09-02T12:00:00.00, but the decoded time dimension becomes:

array(['1981-01-01T00:00:00.564166656', '1980-12-31T23:59:58.707073024'], dtype='datetime64[ns]')

Which is exactly the problem I'm facing.

@fujiisoup
Copy link
Member

Oh, thanks. I misunderstood the problem.
Related to #1803? but I did not follow that well.
Can anyone look into this issue?
The minimal working example would be
1 download from one file from here
2 Load and see xr.open_dataset(path/to/file)['time']
3 compare with netCDF4.Dataset(path/to/file)['time']

@fmaussion
Copy link
Member

The problem comes from the fact that the times in the files are stored in int32. This mwe reproduces the error:

import xarray as xr
import numpy as np
ds = xr.Dataset()
ds['time'] = xr.DataArray([np.int32(788961600)], dims=['time'])
ds['time'].attrs['units'] = 'seconds since 1981-01-01'
xr.decode_cf(ds)['time']

This is due to #1414, I'm looking into it

@fmaussion fmaussion added this to the 0.10.3 milestone Mar 24, 2018
spencerkclark added a commit to spencerkclark/xarray that referenced this issue May 1, 2018
I must have inadvertently removed it during a merge.
jhamman pushed a commit that referenced this issue May 13, 2018
* Start on implementing and testing NetCDFTimeIndex

* TST Move to using pytest fixtures to structure tests

* Address initial review comments

* Address second round of review comments

* Fix failing python3 tests

* Match test method name to method name

* First attempts at integrating NetCDFTimeIndex into xarray

This is a first pass at the following:
- Resetting the logic for decoding datetimes such that `np.datetime64` objects
are never used for non-standard calendars
- Adding logic to use a `NetCDFTimeIndex` whenever `netcdftime.datetime`
objects are used in an array being cast as an index (so if one reads in a
Dataset from a netCDF file or creates one in Python, which is indexed by a time
coordinate that uses `netcdftime.datetime` objects a NetCDFTimeIndex will be
used rather than a generic object-based index)
- Adding logic to encode `netcdftime.datetime` objects when saving out to
netCDF files

* Cleanup

* Fix DataFrame and Series test failures for NetCDFTimeIndex

These were related to a recent minor upstream change in pandas:
https://github.com/pandas-dev/pandas/blame/master/pandas/core/indexing.py#L1433

* First pass at making NetCDFTimeIndex compatible with #1356

* Address initial review comments

* Restore test_conventions.py

* Fix failing test in test_utils.py

* flake8

* Update for standalone netcdftime

* Address stickler-ci comments

* Skip test_format_netcdftime_datetime if netcdftime not installed

* A start on documentation

* Fix failing zarr tests related to netcdftime encoding

* Simplify test_decode_standard_calendar_single_element_non_ns_range

* Address a couple review comments

* Use else clause in _maybe_cast_to_netcdftimeindex

* Start on adding enable_netcdftimeindex option

* Continue parametrizing tests in test_coding_times.py

* Update time-series.rst for enable_netcdftimeindex option

* Use :py:func: in rst for xarray.set_options

* Add a what's new entry and test that resample raises a TypeError

* Move what's new entry to the version 0.10.3 section

* Add version-dependent pathway for importing netcdftime.datetime

* Make NetCDFTimeIndex and date decoding/encoding compatible with datetime.datetime

* Remove logic to make NetCDFTimeIndex compatible with datetime.datetime

* Documentation edits

* Ensure proper enable_netcdftimeindex option is used under lazy decoding

Prior to this, opening a dataset with enable_netcdftimeindex set to True
and then accessing one of its variables outside the context manager would
lead to it being decoded with the default enable_netcdftimeindex
(which is False).  This makes sure that lazy decoding takes into account
the context under which it was called.

* Add fix and test for concatenating variables with a NetCDFTimeIndex

Previously when concatenating variables indexed by a NetCDFTimeIndex
the index would be wrongly converted to a generic pd.Index

* Further namespace changes due to netcdftime/cftime renaming

* NetCDFTimeIndex -> CFTimeIndex

* Documentation updates

* Only allow use of CFTimeIndex when using the standalone cftime

Also only allow for serialization of cftime.datetime objects when
using the standalone cftime package.

* Fix errant what's new changes

* flake8

* Fix skip logic in test_cftimeindex.py

* Use only_use_cftime_datetimes option in num2date

* Require standalone cftime library for all new functionality

Add tests/fixes for dt accessor with cftime datetimes

* Improve skipping logic in test_cftimeindex.py

* Fix skipping logic in test_cftimeindex.py for when cftime or netcdftime
are not available.  Use existing requires_cftime decorator where possible
(i.e. only on tests that are not parametrized via pytest.mark.parametrize)

* Fix skip logic in Python 3.4 build for test_cftimeindex.py

* Improve error messages when for when the standalone cftime is not installed

* Tweak skip logic in test_accessors.py

* flake8

* Address review comments

* Temporarily remove cftime from py27 build environment on windows

* flake8

* Install cftime via pip for Python 2.7 on Windows

* flake8

* Remove unnecessary new lines; simplify _maybe_cast_to_cftimeindex

* Restore test case for #2002 in test_coding_times.py

I must have inadvertently removed it during a merge.

* Tweak dates out of range warning logic slightly to preserve current default

* Address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants