Skip to content

Commit

Permalink
guide for config
Browse files Browse the repository at this point in the history
  • Loading branch information
normanrz committed Dec 20, 2024
1 parent 71ff5e7 commit d4c8d79
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 69 deletions.
41 changes: 41 additions & 0 deletions docs/user-guide/config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Runtime configuration
=====================

The :mod:`zarr.core.config` module is responsible for managing the configuration of zarr
and is based on the `donfig <https://github.com/pytroll/donfig>`_ Python library.

Configuration values can be set using code like the following:

.. code-block:: python
import zarr
zarr.config.set({"array.order": "F"})
Alternatively, configuration values can be set using environment variables, e.g.
``ZARR_ARRAY__ORDER=F``.

The configuration can also be read from a YAML file in standard locations.
For more information, see the
`donfig documentation <https://donfig.readthedocs.io/en/latest/>`_.

Configuration options include the following:

- Default Zarr format ``default_zarr_version``
- Default array order in memory ``array.order``
- Default codecs ``array.v3_default_codecs`` and ``array.v2_default_compressor``
- Whether empty chunks are written to storage ``array.write_empty_chunks``
- Async and threading options, e.g. ``async.concurrency`` and ``threading.max_workers``
- Selections of implementations of codecs, codec pipelines and buffers

For selecting custom implementations of codecs, pipelines, buffers and ndbuffers,
first register the implementations in the registry and then select them in the config.
For example, an implementation of the bytes codec in a class "custompackage.NewBytesCodec",
requires the value of ``codecs.bytes.name`` to be "custompackage.NewBytesCodec".

This is the current default configuration:

.. ipython:: python
import zarr
zarr.config.pprint()
1 change: 1 addition & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ User Guide
arrays
groups
storage
config
v3_migration
todo

Expand Down
111 changes: 42 additions & 69 deletions src/zarr/core/config.py
Original file line number Diff line number Diff line change
@@ -1,62 +1,30 @@
"""
The config module is responsible for managing the configuration of zarr
and is based on the `donfig <https://github.com/pytroll/donfig>`_ Python library.
Configuration values can be set using code like the following:
.. code-block:: python
import zarr
zarr.config.set({"array.order": "F"})
Alternatively, configuration values can be set using environment variables, e.g.
``ZARR_ARRAY__ORDER=F``.
The configuration can also be read from a YAML file in standard locations.
For more information, see the
`donfig documentation <https://donfig.readthedocs.io/en/latest/>`_.
Configuration options include the following:
- Default Zarr format ``default_zarr_version``
- Default array order in memory ``array.order``
- Async and threading options, e.g. ``async.concurrency`` and ``threading.max_workers``
- Selections of implementations of codecs, codec pipelines and buffers
For selecting custom implementations of codecs, pipelines, buffers and ndbuffers,
first register the implementations in the registry and then select them in the config.
For example, an implementation of the bytes codec in a class "custompackage.NewBytesCodec",
requires the value of ``codecs.bytes.name`` to be "custompackage.NewBytesCodec".
This is the current default configuration:
.. code-block:: python
{
"default_zarr_version": 3,
"array": {"order": "C"},
"async": {"concurrency": 10, "timeout": None},
"threading": {"max_workers": None},
"json_indent": 2,
"codec_pipeline": {
"path": "zarr.core.codec_pipeline.BatchedCodecPipeline",
"batch_size": 1,
},
"codecs": {
"blosc": "zarr.codecs.blosc.BloscCodec",
"gzip": "zarr.codecs.gzip.GzipCodec",
"zstd": "zarr.codecs.zstd.ZstdCodec",
"bytes": "zarr.codecs.bytes.BytesCodec",
"endian": "zarr.codecs.bytes.BytesCodec",
"crc32c": "zarr.codecs.crc32c_.Crc32cCodec",
"sharding_indexed": "zarr.codecs.sharding.ShardingCodec",
"transpose": "zarr.codecs.transpose.TransposeCodec",
"vlen-utf8": "zarr.codecs.vlen_utf8.VLenUTF8Codec",
"vlen-bytes": "zarr.codecs.vlen_utf8.VLenBytesCodec",
},
"buffer": "zarr.core.buffer.cpu.Buffer",
"ndbuffer": "zarr.core.buffer.cpu.NDBuffer",
}
The config module is responsible for managing the configuration of zarr and is based on the Donfig python library.
For selecting custom implementations of codecs, pipelines, buffers and ndbuffers, first register the implementations
in the registry and then select them in the config.
Example:
An implementation of the bytes codec in a class ``your.module.NewBytesCodec`` requires the value of ``codecs.bytes``
to be ``your.module.NewBytesCodec``. Donfig can be configured programmatically, by environment variables, or from
YAML files in standard locations.
.. code-block:: python
from your.module import NewBytesCodec
from zarr.core.config import register_codec, config
register_codec("bytes", NewBytesCodec)
config.set({"codecs.bytes": "your.module.NewBytesCodec"})
Instead of setting the value programmatically with ``config.set``, you can also set the value with an environment
variable. The environment variable ``ZARR_CODECS__BYTES`` can be set to ``your.module.NewBytesCodec``. The double
underscore ``__`` is used to indicate nested access.
.. code-block:: bash
export ZARR_CODECS__BYTES="your.module.NewBytesCodec"
For more information, see the Donfig documentation at https://github.com/pytroll/donfig.
"""

from __future__ import annotations
Expand All @@ -71,7 +39,7 @@ class BadConfigError(ValueError):


class Config(DConfig): # type: ignore[misc]
"""Will collect configuration from config files and environment variables
"""The Config will collect configuration from config files and environment variables
Example environment variables:
Grabs environment variables of the form "ZARR_FOO__BAR_BAZ=123" and
Expand All @@ -89,21 +57,26 @@ def reset(self) -> None:
self.refresh()


# The config module is responsible for managing the configuration of zarr and is based on the Donfig python library.
# For selecting custom implementations of codecs, pipelines, buffers and ndbuffers, first register the implementations
# in the registry and then select them in the config.
# e.g. an implementation of the bytes codec in a class "NewBytesCodec", requires the value of codecs.bytes.name to be
# "NewBytesCodec".
# Donfig can be configured programmatically, by environment variables, or from YAML files in standard locations
# e.g. export ZARR_CODECS__BYTES__NAME="NewBytesCodec"
# (for more information see github.com/pytroll/donfig)
# Default values below point to the standard implementations of zarr-python
# The default configuration for zarr
config = Config(
"zarr",
defaults=[
{
"default_zarr_version": 3,
"array": {"order": "C"},
"array": {
"order": "C",
"write_empty_chunks": False,
"v2_default_compressor": {
"numeric": "zstd",
"string": "vlen-utf8",
"bytes": "vlen-bytes",
},
"v3_default_codecs": {
"numeric": ["bytes", "zstd"],
"string": ["vlen-utf8"],
"bytes": ["vlen-bytes"],
},
},
"async": {"concurrency": 10, "timeout": None},
"threading": {"max_workers": None},
"json_indent": 2,
Expand Down

0 comments on commit d4c8d79

Please sign in to comment.