Skip to content

Commit

Permalink
Merge branch 'main' into doc-warnings
Browse files Browse the repository at this point in the history
  • Loading branch information
joshmoore authored Oct 31, 2023
2 parents 0e0bec9 + cce501a commit 36773ed
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,7 @@ def setup(app):
intersphinx_mapping = {
"python": ("https://docs.python.org/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"numcodecs": ("https://numcodecs.readthedocs.io/en/stable/", None),
}


Expand Down
4 changes: 4 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Docs
* The documentation build now fails if there are any warnings.
By :user:`David Stansby <dstansby>` :issue:`1548`.

* Add links to ``numcodecs`` docs in the tutorial.
By :user:`David Stansby <dstansby>` :issue:`1535`.

Maintenance
~~~~~~~~~~~
Expand All @@ -40,6 +42,8 @@ Maintenance
* Allow ``black`` code formatter to be run with any Python version.
By :user:`David Stansby <dstansby>` :issue:`1549`.



.. _release_2.16.1:

2.16.1
Expand Down
17 changes: 9 additions & 8 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1175,8 +1175,9 @@ A fixed-length unicode dtype is also available, e.g.::
For variable-length strings, the ``object`` dtype can be used, but a codec must be
provided to encode the data (see also :ref:`tutorial_objects` below). At the time of
writing there are four codecs available that can encode variable length string
objects: :class:`numcodecs.VLenUTF8`, :class:`numcodecs.JSON`, :class:`numcodecs.MsgPack`.
and :class:`numcodecs.Pickle`. E.g. using ``VLenUTF8``::
objects: :class:`numcodecs.vlen.VLenUTF8`, :class:`numcodecs.json.JSON`,
:class:`numcodecs.msgpacks.MsgPack`. and :class:`numcodecs.pickles.Pickle`.
E.g. using ``VLenUTF8``::

>>> import numcodecs
>>> z = zarr.array(text_data, dtype=object, object_codec=numcodecs.VLenUTF8())
Expand All @@ -1201,8 +1202,8 @@ is a short-hand for ``dtype=object, object_codec=numcodecs.VLenUTF8()``, e.g.::
'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)

Variable-length byte strings are also supported via ``dtype=object``. Again an
``object_codec`` is required, which can be one of :class:`numcodecs.VLenBytes` or
:class:`numcodecs.Pickle`. For convenience, ``dtype=bytes`` (or ``dtype=str`` on Python
``object_codec`` is required, which can be one of :class:`numcodecs.vlen.VLenBytes` or
:class:`numcodecs.pickles.Pickle`. For convenience, ``dtype=bytes`` (or ``dtype=str`` on Python
2.7) can be used as a short-hand for ``dtype=object, object_codec=numcodecs.VLenBytes()``,
e.g.::

Expand All @@ -1218,7 +1219,7 @@ e.g.::
b'\xe0\xb9\x80\xe0\xb8\xae\xe0\xb8\xa5\xe0\xb9\x82\xe0\xb8\xa5\xe0\xb9\x80\xe0\xb8\xa7\xe0\xb8\xb4\xe0\xb8\xa5\xe0\xb8\x94\xe0\xb9\x8c'], dtype=object)

If you know ahead of time all the possible string values that can occur, you could
also use the :class:`numcodecs.Categorize` codec to encode each unique string value as an
also use the :class:`numcodecs.categorize.Categorize` codec to encode each unique string value as an
integer. E.g.::

>>> categorize = numcodecs.Categorize(greetings, dtype=object)
Expand All @@ -1245,7 +1246,7 @@ The best codec to use will depend on what type of objects are present in the arr

At the time of writing there are three codecs available that can serve as a general
purpose object codec and support encoding of a mixture of object types:
:class:`numcodecs.JSON`, :class:`numcodecs.MsgPack`. and :class:`numcodecs.Pickle`.
:class:`numcodecs.json.JSON`, :class:`numcodecs.msgpacks.MsgPack`. and :class:`numcodecs.pickles.Pickle`.

For example, using the JSON codec::

Expand All @@ -1258,7 +1259,7 @@ For example, using the JSON codec::
array([42, 'foo', list(['bar', 'baz', 'qux']), {'a': 1, 'b': 2.2}, None], dtype=object)

Not all codecs support encoding of all object types. The
:class:`numcodecs.Pickle` codec is the most flexible, supporting encoding any type
:class:`numcodecs.pickles.Pickle` codec is the most flexible, supporting encoding any type
of Python object. However, if you are sharing data with anyone other than yourself, then
Pickle is not recommended as it is a potential security risk. This is because malicious
code can be embedded within pickled data. The JSON and MsgPack codecs do not have any
Expand All @@ -1270,7 +1271,7 @@ Ragged arrays

If you need to store an array of arrays, where each member array can be of any length
and stores the same primitive type (a.k.a. a ragged array), the
:class:`numcodecs.VLenArray` codec can be used, e.g.::
:class:`numcodecs.vlen.VLenArray` codec can be used, e.g.::

>>> z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
>>> z
Expand Down

0 comments on commit 36773ed

Please sign in to comment.