Skip to content

Commit

Permalink
Merge branch 'master' into groupy_plot2
Browse files Browse the repository at this point in the history
  • Loading branch information
Maximilian Maahn committed Aug 14, 2018
2 parents 98bc369 + df4a4b1 commit 87ef1cc
Show file tree
Hide file tree
Showing 10 changed files with 128 additions and 79 deletions.
63 changes: 1 addition & 62 deletions doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,71 +160,10 @@ methods for converting back and forth between xarray and these libraries. See
:py:meth:`~xarray.DataArray.to_iris` and :py:meth:`~xarray.DataArray.to_cdms2`
for more details.

.. _faq.other_projects:

What other projects leverage xarray?
------------------------------------

Here are several existing libraries that build functionality upon xarray.

Geosciences
~~~~~~~~~~~

- `aospy <https://aospy.readthedocs.io>`_: Automated analysis and management of gridded climate data.
- `infinite-diff <https://github.com/spencerahill/infinite-diff>`_: xarray-based finite-differencing, focused on gridded climate/meterology data
- `marc_analysis <https://github.com/darothen/marc_analysis>`_: Analysis package for CESM/MARC experiments and output.
- `MPAS-Analysis <http://mpas-analysis.readthedocs.io>`_: Analysis for simulations produced with Model for Prediction Across Scales (MPAS) components and the Accelerated Climate Model for Energy (ACME).
- `OGGM <http://oggm.org/>`_: Open Global Glacier Model
- `Oocgcm <https://oocgcm.readthedocs.io/>`_: Analysis of large gridded geophysical datasets
- `Open Data Cube <https://www.opendatacube.org/>`_: Analysis toolkit of continental scale Earth Observation data from satellites.
- `Pangaea: <https://pangaea.readthedocs.io/en/latest/>`_: xarray extension for gridded land surface & weather model output).
- `Pangeo <https://pangeo-data.github.io>`_: A community effort for big data geoscience in the cloud.
- `PyGDX <https://pygdx.readthedocs.io/en/latest/>`_: Python 3 package for
accessing data stored in GAMS Data eXchange (GDX) files. Also uses a custom
subclass.
- `Regionmask <https://regionmask.readthedocs.io/>`_: plotting and creation of masks of spatial regions
- `salem <https://salem.readthedocs.io>`_: Adds geolocalised subsetting, masking, and plotting operations to xarray's data structures via accessors.
- `Spyfit <https://spyfit.readthedocs.io/en/master/>`_: FTIR spectroscopy of the atmosphere
- `windspharm <https://ajdawson.github.io/windspharm/index.html>`_: Spherical
harmonic wind analysis in Python.
- `wrf-python <https://wrf-python.readthedocs.io/>`_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
- `xarray-simlab <https://xarray-simlab.readthedocs.io>`_: xarray extension for computer model simulations.
- `xarray-topo <https://gitext.gfz-potsdam.de/sec55-public/xarray-topo>`_: xarray extension for topographic analysis and modelling.
- `xbpch <https://github.com/darothen/xbpch>`_: xarray interface for bpch files.
- `xESMF <https://xesmf.readthedocs.io>`_: Universal Regridder for Geospatial Data.
- `xgcm <https://xgcm.readthedocs.io/>`_: Extends the xarray data model to understand finite volume grid cells (common in General Circulation Models) and provides interpolation and difference operations for such grids.
- `xmitgcm <http://xgcm.readthedocs.io/>`_: a python package for reading `MITgcm <http://mitgcm.org/>`_ binary MDS files into xarray data structures.
- `xshape <https://xshape.readthedocs.io/>`_: Tools for working with shapefiles, topographies, and polygons in xarray.
- `xskillscore <https://github.com/raybellwaves/xskillscore>`_: Metrics for verifying forecasts.

Machine Learning
~~~~~~~~~~~~~~~~
- `cesium <http://cesium-ml.org/>`_: machine learning for time series analysis
- `Elm <https://ensemble-learning-models.readthedocs.io>`_: Parallel machine learning on xarray data structures
- `sklearn-xarray (1) <https://phausamann.github.io/sklearn-xarray>`_: Combines scikit-learn and xarray (1).
- `sklearn-xarray (2) <https://sklearn-xarray.readthedocs.io/en/latest/>`_: Combines scikit-learn and xarray (2).

Extend xarray capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
- `eofs <https://ajdawson.github.io/eofs/>`_: EOF analysis in Python.
- `xarray_extras <https://github.com/crusaderky/xarray_extras>`_: Advanced algorithms for xarray objects (e.g. intergrations/interpolations).
- `xrft <https://github.com/rabernat/xrft>`_: Fourier transforms for xarray data.
- `xr-scipy <https://xr-scipy.readthedocs.io>`_: A lightweight scipy wrapper for xarray.
- `X-regression <https://github.com/kuchaale/X-regression>`_: Multiple linear regression from Statsmodels library coupled with Xarray library.
- `xyzpy <http://xyzpy.readthedocs.io>`_: Easily generate high dimensional data, including parallelization.

Visualization
~~~~~~~~~~~~~
- `Datashader <https://datashader.org>`_, `geoviews <http://geo.holoviews.org>`_, `holoviews <http://holoviews.org/>`_, : visualization packages for large data
- `psyplot <https://psyplot.readthedocs.io>`_: Interactive data visualization with python.

Other
~~~~~
- `ptsa <https://pennmem.github.io/ptsa_new/html/index.html>`_: EEG Time Series Analysis
- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python

More projects can be found at the `"xarray" Github topic <https://github.com/topics/xarray>`_.
See section :ref:`related-projects`.

How should I cite xarray?
-------------------------
Expand Down
2 changes: 2 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ Documentation
* :doc:`internals`
* :doc:`roadmap`
* :doc:`contributing`
* :doc:`related-projects`

.. toctree::
:maxdepth: 1
Expand All @@ -87,6 +88,7 @@ Documentation
internals
roadmap
contributing
related-projects

See also
--------
Expand Down
68 changes: 68 additions & 0 deletions doc/related-projects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
.. _related-projects:

Xarray related projects
-----------------------

Here below is a list of several existing libraries that build
functionality upon xarray. See also section :ref:`internals` for more
details on how to build xarray extensions.

Geosciences
~~~~~~~~~~~

- `aospy <https://aospy.readthedocs.io>`_: Automated analysis and management of gridded climate data.
- `infinite-diff <https://github.com/spencerahill/infinite-diff>`_: xarray-based finite-differencing, focused on gridded climate/meterology data
- `marc_analysis <https://github.com/darothen/marc_analysis>`_: Analysis package for CESM/MARC experiments and output.
- `MPAS-Analysis <http://mpas-analysis.readthedocs.io>`_: Analysis for simulations produced with Model for Prediction Across Scales (MPAS) components and the Accelerated Climate Model for Energy (ACME).
- `OGGM <http://oggm.org/>`_: Open Global Glacier Model
- `Oocgcm <https://oocgcm.readthedocs.io/>`_: Analysis of large gridded geophysical datasets
- `Open Data Cube <https://www.opendatacube.org/>`_: Analysis toolkit of continental scale Earth Observation data from satellites.
- `Pangaea: <https://pangaea.readthedocs.io/en/latest/>`_: xarray extension for gridded land surface & weather model output).
- `Pangeo <https://pangeo-data.github.io>`_: A community effort for big data geoscience in the cloud.
- `PyGDX <https://pygdx.readthedocs.io/en/latest/>`_: Python 3 package for
accessing data stored in GAMS Data eXchange (GDX) files. Also uses a custom
subclass.
- `Regionmask <https://regionmask.readthedocs.io/>`_: plotting and creation of masks of spatial regions
- `salem <https://salem.readthedocs.io>`_: Adds geolocalised subsetting, masking, and plotting operations to xarray's data structures via accessors.
- `Spyfit <https://spyfit.readthedocs.io/en/master/>`_: FTIR spectroscopy of the atmosphere
- `windspharm <https://ajdawson.github.io/windspharm/index.html>`_: Spherical
harmonic wind analysis in Python.
- `wrf-python <https://wrf-python.readthedocs.io/>`_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
- `xarray-simlab <https://xarray-simlab.readthedocs.io>`_: xarray extension for computer model simulations.
- `xarray-topo <https://gitext.gfz-potsdam.de/sec55-public/xarray-topo>`_: xarray extension for topographic analysis and modelling.
- `xbpch <https://github.com/darothen/xbpch>`_: xarray interface for bpch files.
- `xESMF <https://xesmf.readthedocs.io>`_: Universal Regridder for Geospatial Data.
- `xgcm <https://xgcm.readthedocs.io/>`_: Extends the xarray data model to understand finite volume grid cells (common in General Circulation Models) and provides interpolation and difference operations for such grids.
- `xmitgcm <http://xgcm.readthedocs.io/>`_: a python package for reading `MITgcm <http://mitgcm.org/>`_ binary MDS files into xarray data structures.
- `xshape <https://xshape.readthedocs.io/>`_: Tools for working with shapefiles, topographies, and polygons in xarray.
- `xskillscore <https://github.com/raybellwaves/xskillscore>`_: Metrics for verifying forecasts.

Machine Learning
~~~~~~~~~~~~~~~~
- `cesium <http://cesium-ml.org/>`_: machine learning for time series analysis
- `Elm <https://ensemble-learning-models.readthedocs.io>`_: Parallel machine learning on xarray data structures
- `sklearn-xarray (1) <https://phausamann.github.io/sklearn-xarray>`_: Combines scikit-learn and xarray (1).
- `sklearn-xarray (2) <https://sklearn-xarray.readthedocs.io/en/latest/>`_: Combines scikit-learn and xarray (2).

Extend xarray capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
- `eofs <https://ajdawson.github.io/eofs/>`_: EOF analysis in Python.
- `xarray_extras <https://github.com/crusaderky/xarray_extras>`_: Advanced algorithms for xarray objects (e.g. intergrations/interpolations).
- `xrft <https://github.com/rabernat/xrft>`_: Fourier transforms for xarray data.
- `xr-scipy <https://xr-scipy.readthedocs.io>`_: A lightweight scipy wrapper for xarray.
- `X-regression <https://github.com/kuchaale/X-regression>`_: Multiple linear regression from Statsmodels library coupled with Xarray library.
- `xyzpy <http://xyzpy.readthedocs.io>`_: Easily generate high dimensional data, including parallelization.

Visualization
~~~~~~~~~~~~~
- `Datashader <https://datashader.org>`_, `geoviews <http://geo.holoviews.org>`_, `holoviews <http://holoviews.org/>`_, : visualization packages for large data.
- `hvplot <https://hvplot.pyviz.org/>`_ : A high-level plotting API for the PyData ecosystem built on HoloViews.
- `psyplot <https://psyplot.readthedocs.io>`_: Interactive data visualization with python.

Other
~~~~~
- `ptsa <https://pennmem.github.io/ptsa_new/html/index.html>`_: EEG Time Series Analysis
- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python

More projects can be found at the `"xarray" Github topic <https://github.com/topics/xarray>`_.
12 changes: 11 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,20 @@ Bug fixes
attribute being set.
(:issue:`2201`)
By `Thomas Voigt <https://github.com/tv3141>`_.
- Fixed a bug in ``zarr`` backend which prevented use with datasets with
invalid chunk size encoding after reading from an existing store
(:issue:`2278`).
By `Joe Hamman <https://github.com/jhamman>`_.

- Tests can be run in parallel with pytest-xdist
- Follow up the renamings in dask; from dask.ghost to dask.overlap
By `Tony Tung <https://github.com/ttung>`_.

- Now raises a ValueError when there is a conflict between dimension names and
level names of MultiIndex. (:issue:`2299`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.

- Follow up the renamings in dask; from dask.ghost to dask.overlap
By `Keisuke Fujii <https://github.com/fujiisoup>`_.

- Now :py:func:`xr.apply_ufunc` raises a ValueError when the size of
``input_core_dims`` is inconsistent with the number of arguments.
Expand Down
7 changes: 0 additions & 7 deletions xarray/backends/rasterio_.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,13 +259,6 @@ def open_rasterio(filename, parse_coordinates=None, chunks=None, cache=None,
# Is the TIF tiled? (bool)
# We cast it to an int for netCDF compatibility
attrs['is_tiled'] = np.uint8(riods.value.is_tiled)
with warnings.catch_warnings():
# casting riods.value.transform to a tuple makes this future proof
warnings.simplefilter('ignore', FutureWarning)
if hasattr(riods.value, 'transform'):
# Affine transformation matrix (tuple of floats)
# Describes coefficients mapping pixel coordinates to CRS
attrs['transform'] = tuple(riods.value.transform)
if hasattr(riods.value, 'nodatavals'):
# The nodata values for the raster bands
attrs['nodatavals'] = tuple([np.nan if nodataval is None else nodataval
Expand Down
5 changes: 2 additions & 3 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,8 @@ def _determine_zarr_chunks(enc_chunks, var_chunks, ndim):
enc_chunks_tuple = tuple(enc_chunks)

if len(enc_chunks_tuple) != ndim:
raise ValueError("zarr chunks tuple %r must have same length as "
"variable.ndim %g" %
(enc_chunks_tuple, ndim))
# throw away encoding chunks, start over
return _determine_zarr_chunks(None, var_chunks, ndim)

for x in enc_chunks_tuple:
if not isinstance(x, int):
Expand Down
9 changes: 9 additions & 0 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -1876,12 +1876,15 @@ def assert_unique_multiindex_level_names(variables):
objects.
"""
level_names = defaultdict(list)
all_level_names = set()
for var_name, var in variables.items():
if isinstance(var._data, PandasIndexAdapter):
idx_level_names = var.to_index_variable().level_names
if idx_level_names is not None:
for n in idx_level_names:
level_names[n].append('%r (%s)' % (n, var_name))
if idx_level_names:
all_level_names.update(idx_level_names)

for k, v in level_names.items():
if k in variables:
Expand All @@ -1892,3 +1895,9 @@ def assert_unique_multiindex_level_names(variables):
conflict_str = '\n'.join([', '.join(v) for v in duplicate_names])
raise ValueError('conflicting MultiIndex level name(s):\n%s'
% conflict_str)
# Check confliction between level names and dimensions GH:2299
for k, v in variables.items():
for d in v.dims:
if d in all_level_names:
raise ValueError('conflicting level / dimension names. {} '
'already exists as a level name.'.format(d))
26 changes: 20 additions & 6 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -1404,12 +1404,6 @@ def test_chunk_encoding_with_dask(self):
with self.roundtrip(ds_chunk4) as actual:
self.assertEqual((4,), actual['var1'].encoding['chunks'])

# specify incompatible encoding
ds_chunk4['var1'].encoding.update({'chunks': (5, 5)})
with pytest.raises(ValueError) as e_info:
with self.roundtrip(ds_chunk4) as actual:
pass
assert e_info.match('chunks')

# TODO: remove this failure once syncronized overlapping writes are
# supported by xarray
Expand Down Expand Up @@ -1522,6 +1516,21 @@ def test_to_zarr_compute_false_roundtrip(self):
with self.open(store) as actual:
assert_identical(original, actual)

def test_encoding_chunksizes(self):
# regression test for GH2278
# see also test_encoding_chunksizes_unlimited
nx, ny, nt = 4, 4, 5
original = xr.Dataset({}, coords={'x': np.arange(nx),
'y': np.arange(ny),
't': np.arange(nt)})
original['v'] = xr.Variable(('x', 'y', 't'), np.zeros((nx, ny, nt)))
original = original.chunk({'t': 1, 'x': 2, 'y': 2})

with self.roundtrip(original) as ds1:
assert_equal(ds1, original)
with self.roundtrip(ds1.isel(t=0)) as ds2:
assert_equal(ds2, original.isel(t=0))


@requires_zarr
class ZarrDictStoreTest(BaseZarrTest, TestCase):
Expand Down Expand Up @@ -2809,6 +2818,7 @@ def test_utm(self):
assert isinstance(rioda.attrs['res'], tuple)
assert isinstance(rioda.attrs['is_tiled'], np.uint8)
assert isinstance(rioda.attrs['transform'], tuple)
assert len(rioda.attrs['transform']) == 6
np.testing.assert_array_equal(rioda.attrs['nodatavals'],
[np.NaN, np.NaN, np.NaN])

Expand All @@ -2830,6 +2840,7 @@ def test_non_rectilinear(self):
assert isinstance(rioda.attrs['res'], tuple)
assert isinstance(rioda.attrs['is_tiled'], np.uint8)
assert isinstance(rioda.attrs['transform'], tuple)
assert len(rioda.attrs['transform']) == 6

# See if a warning is raised if we force it
with self.assertWarns("transformation isn't rectilinear"):
Expand All @@ -2849,6 +2860,7 @@ def test_platecarree(self):
assert isinstance(rioda.attrs['res'], tuple)
assert isinstance(rioda.attrs['is_tiled'], np.uint8)
assert isinstance(rioda.attrs['transform'], tuple)
assert len(rioda.attrs['transform']) == 6
np.testing.assert_array_equal(rioda.attrs['nodatavals'],
[-9765.])

Expand Down Expand Up @@ -2886,6 +2898,7 @@ def test_notransform(self):
assert isinstance(rioda.attrs['res'], tuple)
assert isinstance(rioda.attrs['is_tiled'], np.uint8)
assert isinstance(rioda.attrs['transform'], tuple)
assert len(rioda.attrs['transform']) == 6

def test_indexing(self):
with create_tmp_geotiff(8, 10, 3, transform_args=[1, 2, 0.5, 2.],
Expand Down Expand Up @@ -3080,6 +3093,7 @@ def test_ENVI_tags(self):
assert isinstance(rioda.attrs['res'], tuple)
assert isinstance(rioda.attrs['is_tiled'], np.uint8)
assert isinstance(rioda.attrs['transform'], tuple)
assert len(rioda.attrs['transform']) == 6
# from ENVI tags
assert isinstance(rioda.attrs['description'], basestring)
assert isinstance(rioda.attrs['map_info'], basestring)
Expand Down
3 changes: 3 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2915,6 +2915,7 @@ def test_to_masked_array(self):
ma = da.to_masked_array()
assert len(ma.mask) == N

@pytest.mark.xfail # GH:2332 TODO fix this in upstream?
def test_to_and_from_cdms2_classic(self):
"""Classic with 1D axes"""
pytest.importorskip('cdms2')
Expand Down Expand Up @@ -2949,6 +2950,7 @@ def test_to_and_from_cdms2_classic(self):
assert_array_equal(original.coords[coord_name],
back.coords[coord_name])

@pytest.mark.xfail # GH:2332 TODO fix this in upstream?
def test_to_and_from_cdms2_sgrid(self):
"""Curvilinear (structured) grid
Expand All @@ -2975,6 +2977,7 @@ def test_to_and_from_cdms2_sgrid(self):
assert_array_equal(original.coords['lat'], back.coords['lat'])
assert_array_equal(original.coords['lon'], back.coords['lon'])

@pytest.mark.xfail # GH:2332 TODO fix this in upstream?
def test_to_and_from_cdms2_ugrid(self):
"""Unstructured grid"""
pytest.importorskip('cdms2')
Expand Down
12 changes: 12 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2456,6 +2456,18 @@ def test_assign_multiindex_level(self):
with raises_regex(ValueError, 'conflicting MultiIndex'):
data.assign(level_1=range(4))
data.assign_coords(level_1=range(4))
# raise an Error when any level name is used as dimension GH:2299
with pytest.raises(ValueError):
data['y'] = ('level_1', [0, 1])

def test_merge_multiindex_level(self):
data = create_test_multiindex()
other = Dataset({'z': ('level_1', [0, 1])}) # conflict dimension
with pytest.raises(ValueError):
data.merge(other)
other = Dataset({'level_1': ('x', [0, 1])}) # conflict variable name
with pytest.raises(ValueError):
data.merge(other)

def test_setitem_original_non_unique_index(self):
# regression test for GH943
Expand Down

0 comments on commit 87ef1cc

Please sign in to comment.