Expose internal buffers to writers #901

franzpoeschel · 2021-01-14T13:17:16Z

~~Based on #904 (see below for the reason), for comparison: franzpoeschel/openPMD-api@topic-pipe-executable...franzpoeschel:topic-span~~

This makes the functionality provided by adios2::Variable::Span<T> available to users of the openPMD API.

Use case: If in the writing application, write buffers are newly allocated right before RecordComponent::storeChunk() (as opposed to passing buffers to storeChunk() that already exist in the application), this PR allows the buffer to be allocated by the openPMD backend (e.g. in ADIOS2: provide a view into the serialization buffers). In such a case, this may avoid memcopies, ideally cutting memory usage in half.

Possible benefitting applications:

PIConGPU: Before storing record components, the openPMD plugin in PIConGPU converts data from the PIConGPU-internal AoS-style representation into the openPMD-style SoA-style representation. Using this PR, data can then be converted directly into backend buffers.
UPDATE: I've implemented this in a PIConGPU branch, this diff gives an example on usage.
openPMD-pipe: Data can be directly loaded from the reading backend into the writing backend, avoiding the detour through user-space buffers.

TODO:

franzpoeschel · 2021-02-23T13:13:19Z

I've benchmarked this for a PIConGPU simulation that dumped 124GB of data in 4 IO passes from 1 GPU (¹).
Note that PIConGPU already saves memory by reusing store buffers and flushing them to the backend for each single dataset anew, reducing the amount of improvement attainable by using the span-based API.

The current memory usage, as profiled by KDE Heaptrack, peaks at 87.1GB:

Simulation times for the same run, but without Heaptrack:

PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: IO plugin ran for 32sec 737msec (average: 35sec 126msec)
calculation  simulation time:  3min 43sec 439msec = 223 sec
full simulation time:  4min 54sec 746msec = 294 sec

After using this PR in PIConGPU, it peaks at 85.4GB, saving 1.7 ~ 1.8GB of heap memory:

Simulation times for the same run, but without Heaptrack:

PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: IO plugin ran for 28sec 372msec (average: 31sec 969msec)
calculation  simulation time:  3min 30sec 703msec = 210 sec
full simulation time:  5min 24sec 709msec = 324 sec

So, we saved ~3 seconds of memory allocation time per run of the IO plugin.

The remaining memory spikes stem from the DoubleBuffer allocation strategy in the openPMD plugin.
I expect starker differences in our little openpmd-pipe tool, will come back to that.

[1] KelvinHelmholtz example called with picongpu -s 150 -d 1 1 1 -g 256 512 128 --openPMD.period 50 --openPMD.file dump --openPMD.infix NULL --openPMD.ext bp --openPMD.json '{ "adios2": { "engine": { "usesteps": true, "parameters": { "InitialBufferSize": "34Gb", "Profile": "On" } }, "dataset": { "operators": [ ] } } } ' --versionOnce

include/openPMD/RecordComponent.hpp

include/openPMD/IO/AbstractIOHandler.hpp

ax3l · 2021-03-04T01:39:52Z

include/openPMD/IO/AbstractIOHandler.hpp

+    UserFlush,
+    /**
+     * Default mode, flush everything that can be flushed
+     * Does not need to uphold user-level guarantees about clearing and filling


I don't understand the 2nd sentence yet:
Are you saying that this cannot check the API contract we have with the user, aka that the user must have given us valid buffers and they are ready to use at this point?

I've clarified documentation now by introducing a concept of flush points. Difference between both modes is that UserFlush defines a flush point and ~~FlushEverything~~ InternalFlush does not.

ax3l · 2021-03-04T01:41:37Z

test/SerialIOTest.cpp

+                 * Hijack the functor that is called for buffer creation if the
+                 * backend doesn't support the task to see whether the backend
+                 * did support it or not.


Can you please split this into two sentences? It's a bit hard to parse for me :)

Should we make this lambda a helper function? Looks like a generally useful implementation that people would use?

Can you please split this into two sentences? It's a bit hard to parse for me :)

Will do

Should we make this lambda a helper function? Looks like a generally useful implementation that people would use?

I'm not sure about it.. This sidechannels the return value (the boolean) as a value caught-by-reference. Can we write this in a way that would actually be a good API?

Will do

Done

include/openPMD/RecordComponent.hpp

ax3l · 2021-04-05T21:49:36Z

@franzpoeschel can you please rebase this one? :)

UserFlush: Exposed flush point, buffers are read from / written to FlushEverything (todo: rename to InternalFlush): Flush everything except for things that are only allowed to be flushed at a flush point. Attributes must be flushed. SkeletonOnly: Guarantees to setup the openPMD hierarchy in the backends. Datasets are not created yet.

Some documentation clarifications Whitespacing And Doxygen fixes

ax3l

Looks already in great shape! Notes inline :)

include/openPMD/IO/AbstractIOHandlerImpl.hpp

docs/source/usage/workflow.rst

examples/12_span_write.cpp

examples/12_span_write.py

ax3l · 2021-04-08T21:38:51Z

include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp

@@ -1041,7 +1068,9 @@ namespace detail
        std::vector< std::unique_ptr< BufferedAction > > m_buffer;
        std::map< std::string, BufferedAttributeWrite > m_attributeWrites;
        std::vector< BufferedAttributeRead > m_attributeReads;
+        std::vector< std::unique_ptr< BufferedAction > > m_alreadyEnqueued;


Should we add further doxygen strings to these member variables?
The amount of member variables indicates we are doing quite involved things here :)

Good point, will do

include/openPMD/RecordComponent.tpp

include/openPMD/Span.hpp

ax3l · 2021-04-08T21:58:21Z

src/Series.cpp

 {
-    switch( iterationEncoding() )
+    IOHandler()->m_flushLevel = level;
+    try


Here and below: why do we use exception handling at this point and rethrow another exception?
I am just concerned that we absorb legit exceptions from backends, e.g. if a low-level ADIOS/HDF5 operation fails.

The try-catch pattern on this high level looks a bit duck-tape-y to me.

The purpose of this is to emulate the same thing that the Python or Java finally keywords do. This does not absorb exceptions from a layer below, this construct does some cleanup and passes the exception on:

Rethrows the currently handled exception. Abandons the execution of the current catch block and passes control to the next matching exception handler (but not to another catch clause after the same try block: its compound-statement is considered to have been 'exited'), reusing the existing exception object: no new objects are made.

(https://en.cppreference.com/w/cpp/language/throw)

The purpose is to ensure that the base state is restored even if an exception is thrown.

Perfect, thanks!

Co-authored-by: Axel Huebl <[email protected]>

ax3l

Thank you for the great PR, this is ready to go! 🚀 ✨

franzpoeschel added api: new additions to the API backend backend: ADIOS2 labels Jan 14, 2021

franzpoeschel force-pushed the topic-span branch 3 times, most recently from 5d6a2f1 to a5b5413 Compare January 15, 2021 10:15

franzpoeschel changed the title ~~Expose internal buffers for writers~~ Expose internal buffers to writers Jan 18, 2021

franzpoeschel force-pushed the topic-span branch 8 times, most recently from 9c830a3 to 5586e73 Compare January 22, 2021 18:48

ax3l self-requested a review January 22, 2021 22:28

franzpoeschel mentioned this pull request Jan 29, 2021

Python: fix load_chunk to temporary #913

Merged

franzpoeschel force-pushed the topic-span branch from 92e29da to 242cb1d Compare February 4, 2021 19:07

franzpoeschel mentioned this pull request Feb 5, 2021

ParticleSpecies: Read to pandas.DataFrame #923

Merged

1 task

franzpoeschel force-pushed the topic-span branch 2 times, most recently from c823946 to d3e8f15 Compare February 22, 2021 14:22

franzpoeschel force-pushed the topic-span branch 2 times, most recently from 1368ef8 to 9db8694 Compare February 23, 2021 15:24

franzpoeschel force-pushed the topic-span branch 2 times, most recently from 7da46a0 to f8d2366 Compare March 2, 2021 16:03

ax3l reviewed Mar 4, 2021

View reviewed changes

ax3l self-assigned this Mar 9, 2021

franzpoeschel mentioned this pull request Mar 10, 2021

[Discussion] Auxiliary backend for statistics? #937

Closed

franzpoeschel force-pushed the topic-span branch from 6864ced to c348616 Compare March 10, 2021 17:12

franzpoeschel mentioned this pull request Apr 1, 2021

Deferred span-based Engine::Put() ornladios/ADIOS2#2664

Open

franzpoeschel added 18 commits April 7, 2021 14:40

Add GET_BUFFER_VIEW task

eae080b

Implement GET_BUFFER_VIEW task in ADIOS2 backend

dff3f85

Expose span-based storeChunk API to Python

df55795

Implicitly defer task in ADIOS2

756367d

Support span creation only for BP4 engine

3ba73d2

Catch missing attributes

f5b0231

Move Span.hpp to its own header

eb8396b

Make Span members RAII

bb799cb

Documentation: Flush points

ecca26b

Add C++ span example

3015b21

Add Python-based span example

df47201

Fix flush level: advance() is a UserFlush

09f7dc4

Rephrase a sentence to be actually intelligible.

75eb1d8

Cleanup

c0862de

Some documentation clarifications Whitespacing And Doxygen fixes

Make buffer reallocation more explicit in C++ API

89ff1cb

Documentation Update

a2276a3

Two little fixes in the Python bindings

a5dd63e

franzpoeschel force-pushed the topic-span branch from 47f3056 to a5dd63e Compare April 7, 2021 13:52

ax3l reviewed Apr 8, 2021

View reviewed changes

Apply suggestions from code review

d4e5672

Co-authored-by: Axel Huebl <[email protected]>

franzpoeschel force-pushed the topic-span branch from 695b3ee to d4e5672 Compare April 9, 2021 12:39

Rename taskSupportedByBackend -> backendManagedBuffer

9884dc1

Co-authored-by: Axel Huebl <[email protected]>

franzpoeschel force-pushed the topic-span branch from 6cea2ea to 9884dc1 Compare April 16, 2021 09:07

ax3l approved these changes Apr 21, 2021

View reviewed changes

ax3l merged commit 214f31d into openPMD:dev Apr 21, 2021

franzpoeschel mentioned this pull request May 3, 2021

Use span-based storeChunk API in openPMD plugin (more memory efficient in ADIOS2) ComputationalRadiationPhysics/picongpu#3609

Merged

1 task

ax3l mentioned this pull request Jul 8, 2021

[WIP] Python: in-memory version of load_chunk #914

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose internal buffers to writers #901

Expose internal buffers to writers #901

franzpoeschel commented Jan 14, 2021 •

edited

Loading

franzpoeschel commented Feb 23, 2021 •

edited

Loading

ax3l Mar 4, 2021

franzpoeschel Mar 10, 2021

ax3l Mar 4, 2021

ax3l Mar 4, 2021

franzpoeschel Mar 10, 2021

franzpoeschel Mar 17, 2021

ax3l commented Apr 5, 2021

ax3l left a comment

ax3l Apr 8, 2021

franzpoeschel Apr 9, 2021

ax3l Apr 8, 2021

franzpoeschel Apr 9, 2021

ax3l Apr 21, 2021

ax3l left a comment

Expose internal buffers to writers #901

Expose internal buffers to writers #901

Conversation

franzpoeschel commented Jan 14, 2021 • edited Loading

franzpoeschel commented Feb 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ax3l commented Apr 5, 2021

ax3l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ax3l left a comment

Choose a reason for hiding this comment

franzpoeschel commented Jan 14, 2021 •

edited

Loading

franzpoeschel commented Feb 23, 2021 •

edited

Loading