Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[skip-ci] Small updates to IO docs. #8452

Merged
merged 2 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 21 additions & 17 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ __ https://www.unidata.ucar.edu/software/netcdf/

.. _netCDF FAQ: https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF

Reading and writing netCDF files with xarray requires scipy or the
`netCDF4-Python`__ library to be installed (the latter is required to
read/write netCDF V4 files and use the compression options described below).
Reading and writing netCDF files with xarray requires scipy, h5netcdf, or the
`netCDF4-Python`__ library to be installed. SciPy only supports reading and writing
of netCDF V3 files.

__ https://github.com/Unidata/netcdf4-python

Expand Down Expand Up @@ -675,8 +675,8 @@ the same as the one that was saved.

.. note::

xarray does not write NCZarr attributes. Therefore, NCZarr data must be
opened in read-only mode.
xarray does not write `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html>`_ attributes.
Therefore, NCZarr data must be opened in read-only mode.

To store variable length strings, convert them to object arrays first with
``dtype=object``.
Expand All @@ -696,10 +696,10 @@ It is possible to read and write xarray datasets directly from / to cloud
storage buckets using zarr. This example uses the `gcsfs`_ package to provide
an interface to `Google Cloud Storage`_.

From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you
automatically when reading, such that you can open a dataset in a single
call. You should include any arguments to the storage backend as the
key ``storage_options``, part of ``backend_kwargs``.
General `fsspec`_ URLs, those that begin with ``s3://`` or ``gcs://`` for example,
are parsed and the store set up for you automatically when reading.
You should include any arguments to the storage backend as the
key ```storage_options``, part of ``backend_kwargs``.

.. code:: python

Expand All @@ -715,7 +715,7 @@ key ``storage_options``, part of ``backend_kwargs``.
This also works with ``open_mfdataset``, allowing you to pass a list of paths or
a URL to be interpreted as a glob string.

For older versions, and for writing, you must explicitly set up a ``MutableMapping``
For writing, you must explicitly set up a ``MutableMapping``
instance and pass this, as follows:

.. code:: python
Expand Down Expand Up @@ -769,10 +769,10 @@ Consolidated Metadata
~~~~~~~~~~~~~~~~~~~~~

Xarray needs to read all of the zarr metadata when it opens a dataset.
In some storage mediums, such as with cloud object storage (e.g. amazon S3),
In some storage mediums, such as with cloud object storage (e.g. `Amazon S3`_),
this can introduce significant overhead, because two separate HTTP calls to the
object store must be made for each variable in the dataset.
As of xarray version 0.18, xarray by default uses a feature called
By default Xarray uses a feature called
*consolidated metadata*, storing all metadata for the entire dataset with a
single key (by default called ``.zmetadata``). This typically drastically speeds
up opening the store. (For more information on this feature, consult the
Expand All @@ -796,16 +796,20 @@ reads. Because this fall-back option is so much slower, xarray issues a

.. _io.zarr.appending:

Appending to existing Zarr stores
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Modifying existing Zarr stores
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray supports several ways of incrementally writing variables to a Zarr
store. These options are useful for scenarios when it is infeasible or
undesirable to write your entire dataset at once.

1. Use ``mode='a'`` to add or overwrite entire variables,
2. Use ``append_dim`` to resize and append to exiting variables, and
3. Use ``region`` to write to limited regions of existing arrays.

.. tip::

If you can load all of your data into a single ``Dataset`` using dask, a
For ``Dataset`` objects containing dask arrays, a
single call to ``to_zarr()`` will write all of your data in parallel.

.. warning::
Expand Down Expand Up @@ -876,8 +880,8 @@ and then calling ``to_zarr`` with ``compute=False`` to write only metadata
ds.to_zarr(path, compute=False)

Now, a Zarr store with the correct variable shapes and attributes exists that
can be filled out by subsequent calls to ``to_zarr``. ``region`` can be
specified as ``"auto"``, which opens the existing store and determines the
can be filled out by subsequent calls to ``to_zarr``.
Setting ``region="auto"`` will open the existing store and determine the
correct alignment of the new data with the existing coordinates, or as an
explicit mapping from dimension names to Python ``slice`` objects indicating
where the data should be written (in index space, not label space), e.g.,
Expand Down
4 changes: 3 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Breaking changes
~~~~~~~~~~~~~~~~
- drop support for `cdms2 <https://github.com/CDAT/cdms>`_. Please use
`xcdat <https://github.com/xCDAT/xcdat>`_ instead (:pull:`8441`).
By `Justus Magin <https://github.com/keewis`_.
By `Justus Magin <https://github.com/keewis>`_.

- Bump minimum tested pint version to ``>=0.22``. By `Deepak Cherian <https://github.com/dcherian>`_.

Expand Down Expand Up @@ -75,6 +75,8 @@ Bug fixes

Documentation
~~~~~~~~~~~~~
- Small updates to documentation on distributed writes: See :ref:`io.zarr.appending` to Zarr.
By `Deepak Cherian <https://github.com/dcherian>`_.


Internal Changes
Expand Down