Custom Hierarchies #1432

franzpoeschel · 2023-05-03T12:04:33Z

The openPMD standard works by defining "what must be there", but does not impose restrictions as to "what must not be there". By this principle, openPMD is an extensible standard.
So far, standard extensions relied mostly on defining additional metadata in terms of attributes, e.g. for storing the name of the employed field solver for the ED-PIC extension. Custom hierarchies and custom n-dimensional datasets ("heavy" data in comparison to lightweight metadata) have not been employed so far despite the theoretical possibility to do so, granted by the openPMD standard. The major hindrance to such data organization has been the lacking support at the level of the openPMD-api, i.e. the implementation of the standard.

As the first part of this PR, the openPMD-api now supports writing custom-defined hierarchies and datasets within the basepath, i.e. within Iterations. This change is entirely independent from the standard as it makes use of the already existing liberty within the standard's conception as explained in the introduction.

This alone finds useful applications already:

Data that has been marked up according to another standard can be embedded side-by-side with openPMD-formatted particle-mesh data. A short example is given as part of this PR that writes an openPMD-formatted temperature mesh side by side with a simple NeXus example. The resulting dataset is shown below:

  string       /basePath                                                attr   = "/data/%T/"
  string       /date                                                    attr   = "2024-08-12 16:58:01 +0200"
  string       /iterationEncoding                                       attr   = "groupBased"
  string       /iterationFormat                                         attr   = "/data/%T/"
  string       /meshesPath                                              attr   = "meshes/"
  string       /openPMD                                                 attr   = "1.1.0"
  uint32_t     /openPMDextension                                        attr   = 0
  string       /software                                                attr   = "openPMD-api"
  string       /softwareVersion                                         attr   = "0.16.0-dev"
  double       /data/100/dt                                             attr   = 1
  double       /data/100/time                                           attr   = 0
  double       /data/100/timeUnitSI                                     attr   = 1
  string       /data/100/Scan/NX_class                                  attr   = "NXentry"
  string       /data/100/Scan/data/NX_class                             attr   = "NXdata"
  string       /data/100/Scan/data/axes                                 attr   = {"two_theta"}
  int64_t      /data/100/Scan/data/counts                               {15} = 0 / 0
  string       /data/100/Scan/data/counts/long_name                     attr   = "photodiode counts"
  string       /data/100/Scan/data/counts/units                         attr   = "counts"
  string       /data/100/Scan/data/signal                               attr   = "counts"
  double       /data/100/Scan/data/two_theta                            {15} = 0 / 0
  string       /data/100/Scan/data/two_theta/long_name                  attr   = "two_theta (degrees)"
  string       /data/100/Scan/data/two_theta/units                      attr   = "degrees"
  uint8_t      /data/100/Scan/data/two_theta_indices                    attr   = {0}
  string       /data/100/Scan/default                                   attr   = "data"
  double       /data/100/meshes/temperature                             {5, 5} = 0 / 0
  string       /data/100/meshes/temperature/axisLabels                  attr   = {"x", "y"}
  string       /data/100/meshes/temperature/dataOrder                   attr   = "C"
  string       /data/100/meshes/temperature/geometry                    attr   = "cartesian"
  double       /data/100/meshes/temperature/gridGlobalOffset            attr   = {0, 0}
  double       /data/100/meshes/temperature/gridSpacing                 attr   = {1, 1}
  double       /data/100/meshes/temperature/gridUnitSI                  attr   = 1
  long double  /data/100/meshes/temperature/position                    attr   = {0.5, 0.5}
  float        /data/100/meshes/temperature/timeOffset                  attr   = 0
  double       /data/100/meshes/temperature/unitDimension               attr   = {0, 0, 1, 0, 0, 0, 0}
  double       /data/100/meshes/temperature/unitSI                      attr   = 1

Embedding non-physical information into output files. An example is the particle-in-cell simulation PIConGPU that uses openPMD for regular output as well as for checkpoint-restart output. In the case of checkpoint-restart, internal program state must be serialized along with the physical state of the simulation, currently only possible by pretending that the internal state is a mesh which confuses many post-processing tools such as visualizers. PIConGPU has been adapted to make use of this change on this Git tree, check here for a diff. A shortened example output is pasted below, demonstrating that internal state information is now cleanly separated from physical data:

  float     /data/100/fields/E/x                                      {192, 1024, 192}
  float     /data/100/fields/E/y                                      {192, 1024, 192}
  float     /data/100/fields/E/z                                      {192, 1024, 192}
  float     /data/100/particles/e/momentum/x                          {71958528}
  float     /data/100/particles/e/momentum/y                          {71958528}
  float     /data/100/particles/e/momentum/z                          {71958528}
  float     /data/100/particles/e/position/x                          {71958528}
  float     /data/100/particles/e/position/y                          {71958528}
  float     /data/100/particles/e/position/z                          {71958528}
  int32_t   /data/100/particles/e/positionOffset/x                    {71958528}
  int32_t   /data/100/particles/e/positionOffset/y                    {71958528}
  int32_t   /data/100/particles/e/positionOffset/z                    {71958528}
  float     /data/100/particles/e/weighting                           {71958528}
  char      /data/100/picongpu_internal/RNG/RNGProvider3XorMin        {48, 128, 147456}
  uint64_t  /data/100/picongpu_internal/idProvider/nextId             {1, 1, 1}
  uint64_t  /data/100/picongpu_internal/idProvider/startId            {1, 1, 1}

Building on top of this, the other logical component of this PR consists in the support of this standard extension. While the PR as described so far brings custom hierarchies and datasets to the openPMD-api in a way that is transparent to the standard itself, the purpose of this next standard extension is to now make the standard aware of these hierarchies by embedding openPMD markup within them.

The schematic idea behind this is pictured below:

With this, the data organization can step back into openPMD markup from anywhere within a custom-defined hierarchy. This further extends the use of this PR to:

Using openPMD markup within another standard, rather than merely beside it. This is currently being applied exploratively in this script for a sample dataset collected in the POLARIS laboratory.
For more complex setups, this permits a better organization of output data. As an example, meshes can be of different kinds such as 3-dimensional physical fields or 2-dimensional images; also there might be similar kinds of dependencies between particle data. It is desirable to group such data in a way that reflects the logical adjacencies and interdependencies between them.

A particular instance of the above is mesh refinement, currently proposed in a standard extension as a suffix-based naming scheme. Switching to an approach based on custom hierarchies, this comment details a more natural and more easily parsed approach at mesh refinement. A mesh-refined dataset of this type might be structured as follows:

/data/0/refined_mesh_levels/0/meshes/E
/data/0/refined_mesh_levels/0/meshes/B
/data/0/refined_mesh_levels/1/meshes/E
/data/0/refined_mesh_levels/1/meshes/B
/data/0/refined_mesh_levels/2/meshes/E
/data/0/refined_mesh_levels/2/meshes/B
+++++++ ––––––––––––––––––––– ++++++++
standard        custom        standard

/data/0/simulation_internal/some_checkpointing_info
+++++++ –––––––––––––––––––––––––––––––––––––––––––
standard                  custom

TODO

Diff: https://github.com/franzpoeschel/openPMD-api/compare/topic-remove-scalar-component..topic-custom-hierarchies

include/openPMD/backend/BaseRecord.hpp

+
+private:
+    template <typename... Arg>
+    iterator makeIterator(Arg &&...arg)


include/openPMD/backend/BaseRecord.hpp

+        return iterator{this, std::forward<Arg>(arg)...};
+    }
+    template <typename... Arg>
+    const_iterator makeIterator(Arg &&...arg) const


test/CoreTest.cpp

+    REQUIRE(r["x"].resetDataset(dset).numAttributes() == 0); /* unitSI */
+    // REQUIRE(r["y"].unitSI() == 1);
+    REQUIRE(r["y"].resetDataset(dset).numAttributes() == 0); /* unitSI */
+    // REQUIRE(r["z"].unitSI() == 1);


test/CoreTest.cpp

+    // unitSI is set upon flushing
+    // REQUIRE(r["x"].unitSI() == 1);
+    REQUIRE(r["x"].resetDataset(dset).numAttributes() == 0); /* unitSI */
+    // REQUIRE(r["y"].unitSI() == 1);


test/CoreTest.cpp

@@ -966,6 +968,27 @@
 #endif
 }

+TEST_CASE("baserecord_test", "[core]")


include/openPMD/backend/BaseRecord.hpp

+    // for (auto it = this->container().begin(); it != end; ++it)
+    // {
+    //     if (it->first == RecordComponent::SCALAR)
+    //     {
+    //         this->container().erase(it);
+    //         throw error::WrongAPIUsage(detail::NO_SCALAR_INSERT);
+    //     }
+    // }


include/openPMD/backend/BaseRecord.hpp

+    // for (auto it = this->container().begin(); it != end; ++it)
+    // {
+    //     if (it->first == RecordComponent::SCALAR)
+    //     {
+    //         this->container().erase(it);
+    //         throw error::WrongAPIUsage(detail::NO_SCALAR_INSERT);
+    //     }
+    // }


test/CoreTest.cpp

@@ -1353,3 +1378,44 @@
    UniquePtrWithLambda<int[]> arrptrFilledCustom{
        new int[5]{}, [](int const *p) { delete[] p; }};
 }
+
+TEST_CASE("scalar_and_vector", "[core]")


test/CoreTest.cpp

@@ -156,6 +159,39 @@
    }
 }

+TEST_CASE("custom_hierarchies", "[core]")


test/CoreTest.cpp

@@ -156,6 +159,129 @@
    }
 }

+TEST_CASE("custom_hierarchies", "[core]")


franzpoeschel · 2023-06-19T09:22:25Z

comment removed, updated version in comments below

test/CoreTest.cpp

src/CustomHierarchy.cpp

franzpoeschel · 2023-07-13T12:53:54Z

For the meshesPath (equivalently for particlesPath), I have now implemented a prototype that does the following:

A path /data/0/custom/group/meshes/E is a mesh if the meshesPath contains any of the following:

Full path to the group containing the mesh: /custom/group/meshes/
Full path to the mesh itself: /custom/group/meshes/E No longer supported
Shorthand notation: meshes/

The underlying rule: Full paths are denoted by a leading slash and are based on the data path (/data/%T)

Remark: The shorthand notation achieves backwards compatibility with old openPMD files

franzpoeschel · 2023-07-13T13:11:50Z

One nontrivial design question is how to deal with the traditional openPMD hierarchy, especially with the paths /data/%T/meshes and /data/%T/particles. There is no definition of any form of physical data for those groups in the openPMD standard, a normal openPMD file contains no attributes /data/%T/meshes/<attr_name>.

This suggests to me that in the extended openPMD standard with custom hierarchies these paths should be treated as "nothing special". Rather, they become the canonical, but not mandatory layout/organization of a simple openPMD dataset.

Two somewhat tricky consequences from this point of view:

1. There might be more than 1 meshes paths in the same group
E.g. the paths /data/%T/meshes and /data/%T/images might exist side by side. In the openPMD standard, this is no problem, in the openPMD-api this becomes challenging.
The problem is with the member Iteration::meshes (made even worse by the fact that it's not a getter method, but a data member). Should it point to /data/%T/meshes? To a union of both? What about writing?

Imo, the best solution is to consider Iteration::meshes a shorthand API that should not be used in more complex setups. Rather, since /data/%T/meshes is now just another normal path in the custom Iteration hierarchy, one should access iteration["meshes"].asContainerOf<Mesh>() for clarity.

Iteration::meshes will point to the first user-specified meshes path that takes the form of a shorthand notation. E.g., after series.setMeshesPath({"fields/"}), the call iteration.meshes will be the same as iteration["fields"].asContainerOf<Mesh>(). This ensures backwards compatibility.

(Note: Since Iteration::meshes is unfortunately a member and not a method, this means that the meshes path must be set before creating or opening any Iteration. And it was enough fighting with pointers to get things to that state.)

2. There might be custom data inside /data/%T/meshes
This is not really a problem, but could be unexpected. When setting series.setMeshesPath({"/meshes/E"}), you state that only the E field is a mesh. Since /data/%T/meshes is otherwise "just a regular group" with no special meaning, there might be other data in there, too, e.g. /data/%T/meshes/custom/hierarchy. It's the job of the user to create a meaningful data layout here.
With the more restricted definition of meshesPath and particlesPath, this is no longer supported.

src/CustomHierarchy.cpp

test/CoreTest.cpp

+    }
+}
+
+TEST_CASE("custom_hierarchies_no_rw", "[core]")


Introduction of iteration["meshes"].asContainerOf<Mesh>() as a more explicit variant for iteration.meshes.

Overload resolution

TODO: Since meshes/particles can no longer be directly addressed with this, maybe adapt the class hierarchy to disallow mixed groups that contain meshes, particles, groups and datasets at the same time. Only maybe though..

The have their own meaning now and are no longer just carefully maintained for backwards compatibility. Instead, they are supposed to serve as a shortcut to all openPMD data found further down the hierarchy.

for more information, see https://pre-commit.ci

franzpoeschel added discussion api: new additions to the API feature request labels May 3, 2023

github-advanced-security bot found potential problems May 3, 2023

View reviewed changes

franzpoeschel force-pushed the topic-custom-hierarchies branch from 6c7f23a to c692dc7 Compare May 8, 2023 09:22

github-advanced-security bot found potential problems May 8, 2023

View reviewed changes

test/CoreTest.cpp

@@ -156,6 +159,39 @@

}

}

TEST_CASE("custom_hierarchies", "[core]")

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function C_A_T_C_H_T_E_S_T_4 is unreachable (

autoRegistrar5

Loading
must be removed at the same time)

franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from c8a68a5 to 6c87958 Compare May 11, 2023 09:19

franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 86d8a73 to 399e6cd Compare May 30, 2023 12:43

github-advanced-security bot found potential problems May 30, 2023

View reviewed changes

test/CoreTest.cpp

@@ -156,6 +159,129 @@

}

}

TEST_CASE("custom_hierarchies", "[core]")

Check warning

Code scanning / CodeQL

Poorly documented large function Warning test

Poorly documented function: fewer than 2% comments for a function of 194 lines.

franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 8c28fab to 605bd55 Compare June 29, 2023 11:11

github-advanced-security bot found potential problems Jun 29, 2023

View reviewed changes

test/CoreTest.cpp Fixed Show fixed Hide fixed

franzpoeschel commented Jun 29, 2023

View reviewed changes

src/CustomHierarchy.cpp Outdated Show resolved Hide resolved

franzpoeschel force-pushed the topic-custom-hierarchies branch from bef9c6b to b4779a3 Compare July 13, 2023 12:24

github-advanced-security bot found potential problems Jul 13, 2023

View reviewed changes

src/CustomHierarchy.cpp Fixed Show fixed Hide fixed

franzpoeschel force-pushed the topic-custom-hierarchies branch from b0d370e to 4873e21 Compare July 24, 2023 14:34

franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 53f968c to ba10099 Compare August 1, 2023 13:37

github-advanced-security bot found potential problems Aug 1, 2023

View reviewed changes

test/CoreTest.cpp

}

}

TEST_CASE("custom_hierarchies_no_rw", "[core]")

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function C_A_T_C_H_T_E_S_T_6 is unreachable (

autoRegistrar7

Loading
must be removed at the same time)

franzpoeschel force-pushed the topic-custom-hierarchies branch from ba10099 to d86fa69 Compare August 1, 2023 14:43

franzpoeschel mentioned this pull request Aug 1, 2023

Update Pybind to 2.11.1 #1489

Merged

1 task

franzpoeschel force-pushed the topic-custom-hierarchies branch 4 times, most recently from 31c7a25 to 1d47d17 Compare August 3, 2023 09:25

franzpoeschel force-pushed the topic-custom-hierarchies branch from 20f2cd5 to ce5704d Compare November 15, 2024 14:23

franzpoeschel mentioned this pull request Nov 26, 2024

Scientific default values #1439

Open

franzpoeschel force-pushed the topic-custom-hierarchies branch from 904e952 to 03ba4bc Compare December 10, 2024 16:32

franzpoeschel and others added 26 commits December 17, 2024 11:57

JSON backend: Fail when trying to open non-existing groups

c3522f0

Insert CustomHierarchy class to Iteration

506363f

Help older compilers deal with this

a57bf26

Add vector variants of meshes/particlesPath

56363a5

Move meshes and particles over to CustomHierarchies class

83d2e12

Move dirtyRecursive to CustomHierarchy

0ee50f4

Move Iteration reading logic to CustomHierarchy

10b6b9e

Move Iteration flushing logic to CustomHierarchy class

0b3bb43

Support for custom datasets

90cd659

Treat "meshes"/"particles" as normal subgroups

b321772

Introduction of iteration["meshes"].asContainerOf<Mesh>() as a more explicit variant for iteration.meshes.

Regex-based list of meshes/particlesPaths

9335e13

More extended testing

1415e90

Fix Python bindings without adding new functionality yet

6303b14

Overload resolution

Add simple Python bindings and an example

9468600

Replace Regexes with Globbing

1876759

TODO: Since meshes/particles can no longer be directly addressed with this, maybe adapt the class hierarchy to disallow mixed groups that contain meshes, particles, groups and datasets at the same time. Only maybe though..

Move .meshes and .particles back to Iteration class

01e59a7

The have their own meaning now and are no longer just carefully maintained for backwards compatibility. Instead, they are supposed to serve as a shortcut to all openPMD data found further down the hierarchy.

Some fixes in read error handling

3849c74

More symmetric design for container types

4acd1f8

Don't write unitSI in custom datasets

0637683

Discouraged support for custom datasets inside the particlesPath

a869c3d

Fix after rebase: dirtyRecursive

8dbe1c2

Fixes to the dirty/dirtyRecursive logic

a4512d7

[pre-commit.ci] auto fixes from pre-commit.com hooks

59a4a1c

for more information, see https://pre-commit.ci

Some cleanup in CustomHierarchies class

583ec1b

Use polymorphism for meshes/particlesPath in Python

317cc0f

Remove hasMeshes / hasParticles logic

1032573

franzpoeschel force-pushed the topic-custom-hierarchies branch from 03ba4bc to 1032573 Compare December 17, 2024 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Hierarchies #1432

Custom Hierarchies #1432

franzpoeschel commented May 3, 2023 •

edited

Loading

franzpoeschel commented Jun 19, 2023 •

edited

Loading

franzpoeschel commented Jul 13, 2023 •

edited

Loading

franzpoeschel commented Jul 13, 2023 •

edited

Loading

Custom Hierarchies #1432

Are you sure you want to change the base?

Custom Hierarchies #1432

Conversation

franzpoeschel commented May 3, 2023 • edited Loading

franzpoeschel commented Jun 19, 2023 • edited Loading

franzpoeschel commented Jul 13, 2023 • edited Loading

franzpoeschel commented Jul 13, 2023 • edited Loading

franzpoeschel commented May 3, 2023 •

edited

Loading

franzpoeschel commented Jun 19, 2023 •

edited

Loading

franzpoeschel commented Jul 13, 2023 •

edited

Loading

franzpoeschel commented Jul 13, 2023 •

edited

Loading