Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source encoding always set when opening datasets #2626

Merged
merged 12 commits into from
Dec 30, 2018
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ Enhancements
- :py:meth:`DataArray.resample` and :py:meth:`Dataset.resample` now supports the
``loffset`` kwarg just like Pandas.
By `Deepak Cherian <https://github.com/dcherian>`_
- Datasets are now guaranteed to have a ``'source'`` encoding, so the source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Datasets are now guaranteed to have a ``'source'`` encoding, so the source
- Datasets are now guaranteed to have an ``encoding.source`` attribute, so the source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do attribute lookups in encoding, so encoding.source isn't valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. So encoding['source'] then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, encoding['source'] would work

file name is always stored (:issue:`2550`).
By `Tom Nicholas <http://github.com/TomNicholas>`_.
- 0d slices of ndarrays are now obtained directly through indexing, rather than
extracting and wrapping a scalar, avoiding unnecessary copying. By `Daniel
Wennberg <https://github.com/danielwe>`_.
Expand Down
11 changes: 9 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,7 @@ def maybe_decode_store(store, lock=False):

if isinstance(filename_or_obj, backends.AbstractDataStore):
store = filename_or_obj
ds = maybe_decode_store(store)
elif isinstance(filename_or_obj, basestring):

if (isinstance(filename_or_obj, bytes) and
Expand Down Expand Up @@ -340,15 +341,21 @@ def maybe_decode_store(store, lock=False):
% engine)

with close_on_error(store):
return maybe_decode_store(store)
ds = maybe_decode_store(store)
else:
if engine is not None and engine != 'scipy':
raise ValueError('can only read file-like objects with '
"default engine or engine='scipy'")
# assume filename_or_obj is a file-like object
store = backends.ScipyDataStore(filename_or_obj)
ds = maybe_decode_store(store)

return maybe_decode_store(store)
# Ensure source filename always stored in dataset object (GH issue #2550)
if 'source' not in ds.encoding:
if isinstance(filename_or_obj, basestring):
ds.encoding['source'] = filename_or_obj

return ds


def open_dataarray(filename_or_obj, group=None, decode_cf=True,
Expand Down
11 changes: 11 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -3423,3 +3423,14 @@ def test_no_warning_from_dask_effective_get():
ds = Dataset()
ds.to_netcdf(tmpfile)
assert len(record) == 0


@requires_scipy_or_netCDF4
def test_source_encoding_always_present():
# Test for GH issue #2550.
rnddata = np.random.randn(10)
original = Dataset({'foo': ('x', rnddata)})
with create_tmp_file() as tmp:
original.to_netcdf(tmp)
with open_dataset(tmp) as ds:
assert ds.encoding['source'] == tmp