Skip to content

Commit

Permalink
JSON/TOML backend: introduce abbreviated IO modes (#1493)
Browse files Browse the repository at this point in the history
* Introduce dataset template mode to JSON backend

* Write used mode to JSON file

* Use Attribute::getOptional for snapshot attribute

* Introduce attribute mode

* Add example 14_toml_template.cpp

* Use Datatype::UNDEFINED to indicate no dataset definition in template

* Extend example

* Test short attribute mode

* Copy datatypeToString to JSON implementation

* Fix after rebase: Init JSON config in parallel mode

* Fix after rebase: Don't erase JSON datasets when writing

* openpmd-pipe: use short modes for test

* Less intrusive warnings, allow disabling them

* TOML: Use short modes by default

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Documentation

* Short mode in default in openPMD >= 2.

* Short value by default in TOML

* Store the openPMD version information in the IOHandler

* Fixes

* Adapt test to recent rebase

Reading the chunk table requires NOT using template mode, otherwise the
string just consists of '\0' bytes.

* toml11 4.0 compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip: cleanup

* wip: cleanup

* Cleanup

* Extensive testing

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
franzpoeschel and pre-commit-ci[bot] authored Dec 16, 2024
1 parent c639257 commit 2e246cc
Show file tree
Hide file tree
Showing 18 changed files with 1,560 additions and 160 deletions.
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,7 @@ set(openPMD_EXAMPLE_NAMES
10_streaming_read
12_span_write
13_write_dynamic_configuration
14_toml_template
)
set(openPMD_PYTHON_EXAMPLE_NAMES
2_read_serial
Expand Down Expand Up @@ -1327,6 +1328,9 @@ if(openPMD_BUILD_TESTING)
${openPMD_RUNTIME_OUTPUT_DIRECTORY}/openpmd-pipe \
--infile ../samples/git-sample/thetaMode/data_%T.bp \
--outfile ../samples/git-sample/thetaMode/data%T.json \
--outconfig ' \
json.attribute.mode = \"short\" \n\
json.dataset.mode = \"template_no_warn\"' \
"
WORKING_DIRECTORY ${openPMD_RUNTIME_OUTPUT_DIRECTORY}
)
Expand Down
37 changes: 32 additions & 5 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,47 @@ when working with the JSON backend.
Datasets and groups have the same namespace, meaning that there may not be a subgroup
and a dataset with the same name contained in one group.

Any **openPMD dataset** is a JSON object with three keys:
Datasets
........

* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
* ``datatype``: A string describing the type of the stored data.
* ``data`` A nested array storing the actual data in row-major manner.
Datasets can be stored in two modes, either as actual datasets or as dataset templates.
The mode is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.dataset.mode`` (resp. ``toml.dataset.mode``) with possible values ``["dataset", "template"]`` (default: ``"dataset"``).

Stored as an actual dataset, an **openPMD dataset** is a JSON object with three JSON keys:

* ``datatype`` (required): A string describing the type of the stored data.
* ``data`` (required): A nested array storing the actual data in row-major manner.
The data needs to be consistent with the fields ``datatype`` and ``extent``.
Checking whether this key points to an array can be (and is internally) used to distinguish groups from datasets.
* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.

Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:

* ``datatype`` (required): As above.
* ``extent`` (required): A list of integers, describing the extent of the dataset.
This replaces the ``data`` key from the non-template representation.
* ``attributes``: As above.

**Attributes** are stored as a JSON object with a key for each attribute.
This mode stores only the dataset metadata.
Chunk load/store operations are ignored.

Attributes
..........

In order to avoid name clashes, attributes are generally stored within a separate subgroup ``attributes``.

Attributes can be stored in two formats.
The format is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.attribute.mode`` (resp. ``toml.attribute.mode``) with possible values ``["long", "short"]`` (default: ``"long"`` for JSON in openPMD 1.*, ``"short"`` otherwise, i.e. generally in openPMD 2.*, but always in TOML).

Attributes in **long format** store the datatype explicitly, by representing attributes as JSON objects.
Every such attribute is itself a JSON object with two keys:

* ``datatype``: A string describing the type of the value.
* ``value``: The actual value of type ``datatype``.

Attributes in **short format** are stored as just the simple value corresponding with the attribute.
Since JSON/TOML values are pretty-printed into a human-readable format, byte-level type details can be lost when reading those values again later on (e.g. the distinction between different integer types).

TOML File Format
----------------

Expand Down
23 changes: 19 additions & 4 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ The key ``rank_table`` allows specifying the creation of a **rank table**, used
Configuration Structure per Backend
-----------------------------------

Please refer to the respective backends' documentations for further information on their configuration.

.. _backendconfig-adios2:

ADIOS2
Expand Down Expand Up @@ -231,8 +233,21 @@ The parameters eligible for being passed to flush calls may be configured global

.. _backendconfig-other:

Other backends
^^^^^^^^^^^^^^
JSON/TOML
^^^^^^^^^

Do currently not read the configuration string.
Please refer to the respective backends' documentations for further information on their configuration.
A full configuration of the JSON backend:

.. literalinclude:: json.json
:language: json

The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.

All keys found under ``json.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read (i.e. write/read operations will be skipped).
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.* and generally in TOML).
The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.
10 changes: 10 additions & 0 deletions docs/source/details/json.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"json": {
"dataset": {
"mode": "template"
},
"attribute": {
"mode": "short"
}
}
}
111 changes: 111 additions & 0 deletions examples/14_toml_template.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#include <openPMD/openPMD.hpp>

std::string backendEnding()
{
auto extensions = openPMD::getFileExtensions();
if (auto it = std::find(extensions.begin(), extensions.end(), "toml");
it != extensions.end())
{
return *it;
}
else
{
// Fallback for buggy old NVidia compiler
return "json";
}
}

void write()
{
std::string config = R"(
{
"iteration_encoding": "variable_based",
"json": {
"dataset": {"mode": "template"},
"attribute": {"mode": "short"}
},
"toml": {
"dataset": {"mode": "template"},
"attribute": {"mode": "short"}
}
}
)";

openPMD::Series writeTemplate(
"../samples/tomlTemplate." + backendEnding(),
openPMD::Access::CREATE,
config);
auto iteration = writeTemplate.writeIterations()[0];

openPMD::Dataset ds{openPMD::Datatype::FLOAT, {5, 5}};

auto temperature =
iteration.meshes["temperature"][openPMD::RecordComponent::SCALAR];
temperature.resetDataset(ds);

auto E = iteration.meshes["E"];
E["x"].resetDataset(ds);
E["y"].resetDataset(ds);
/*
* Don't specify datatype and extent for this one to indicate that this
* information is not yet known.
*/
E["z"].resetDataset({});

ds.extent = {10};

auto electrons = iteration.particles["e"];
electrons["position"]["x"].resetDataset(ds);
electrons["position"]["y"].resetDataset(ds);
electrons["position"]["z"].resetDataset(ds);

electrons["positionOffset"]["x"].resetDataset(ds);
electrons["positionOffset"]["y"].resetDataset(ds);
electrons["positionOffset"]["z"].resetDataset(ds);
electrons["positionOffset"]["x"].makeConstant(3.14);
electrons["positionOffset"]["y"].makeConstant(3.14);
electrons["positionOffset"]["z"].makeConstant(3.14);

ds.dtype = openPMD::determineDatatype<uint64_t>();
electrons.particlePatches["numParticles"][openPMD::RecordComponent::SCALAR]
.resetDataset(ds);
electrons
.particlePatches["numParticlesOffset"][openPMD::RecordComponent::SCALAR]
.resetDataset(ds);
electrons.particlePatches["offset"]["x"].resetDataset(ds);
electrons.particlePatches["offset"]["y"].resetDataset(ds);
electrons.particlePatches["offset"]["z"].resetDataset(ds);
electrons.particlePatches["extent"]["x"].resetDataset(ds);
electrons.particlePatches["extent"]["y"].resetDataset(ds);
electrons.particlePatches["extent"]["z"].resetDataset(ds);
}

void read()
{
/*
* The config is entirely optional, these things are also detected
* automatically when reading
*/

// std::string config = R"(
// {
// "iteration_encoding": "variable_based",
// "toml": {
// "dataset": {"mode": "template"},
// "attribute": {"mode": "short"}
// }
// }
// )";

openPMD::Series read(
"../samples/tomlTemplate." + backendEnding(),
openPMD::Access::READ_LINEAR);
read.parseBase();
openPMD::helper::listSeries(read);
}

int main()
{
write();
read();
}
28 changes: 25 additions & 3 deletions include/openPMD/Dataset.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,40 @@ class Dataset
public:
enum : std::uint64_t
{
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max()
/**
* Setting one dimension of the extent as JOINED_DIMENSION means that
* the extent along that dimension will be defined by the sum of all
* parallel processes' contributions.
* Only one dimension can be joined. For store operations, the offset
* should be an empty array and the extent should give the actual
* extent of the chunk (i.e. the number of joined elements along the
* joined dimension, equal to the global extent in all other
* dimensions). For more details, refer to
* docs/source/usage/workflow.rst.
*/
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max(),
/**
* Some backends (i.e. JSON and TOML in template mode) support the
* creation of dataset with undefined datatype and extent.
* The extent should be given as {UNDEFINED_EXTENT} for that.
*/
UNDEFINED_EXTENT = std::numeric_limits<std::uint64_t>::max() - 1
};

Dataset(Datatype, Extent, std::string options = "{}");

/**
* @brief Constructor that sets the datatype to undefined.
*
* Helpful for resizing datasets, since datatypes need not be given twice.
* Helpful for:
*
* 1. Resizing datasets, since datatypes need not be given twice.
* 2. Initializing datasets as undefined, as used by template mode in the
* JSON/TOML backend. In this case, the default (undefined) specification
* for the Extent may be used.
*
*/
Dataset(Extent);
Dataset(Extent = {UNDEFINED_EXTENT});

Dataset &extend(Extent newExtent);

Expand Down
1 change: 1 addition & 0 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ class AbstractIOHandler
{
friend class Series;
friend class ADIOS2IOHandlerImpl;
friend class JSONIOHandlerImpl;
friend class detail::ADIOS2File;

private:
Expand Down
1 change: 1 addition & 0 deletions include/openPMD/IO/JSON/JSONIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/JSON/JSONIOHandlerImpl.hpp"
#include "openPMD/auxiliary/JSON_internal.hpp"

#if openPMD_HAVE_MPI
#include <mpi.h>
Expand Down
Loading

0 comments on commit 2e246cc

Please sign in to comment.