Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Auxiliary backend for statistics? #937

Closed
steindev opened this issue Mar 1, 2021 · 4 comments
Closed

[Discussion] Auxiliary backend for statistics? #937

steindev opened this issue Mar 1, 2021 · 4 comments

Comments

@steindev
Copy link

steindev commented Mar 1, 2021

Hei, in PIConGPU we encountered an issue with ADIOS2 ComputationalRadiationPhysics/picongpu#3506 which may find a generally applicable solution within openPMD-api.
ADIOS2 uses std::vector as buffers, and continuously resizes the buffer from a configured initial size as long as data is written into it.
This leads to the unfortunate fact, that only half of a compute node's memory can be used for data output, as the rest needs to be kept free in order to allow resizing. See the ADIOS2 issue ornladios/ADIOS2#2629, too.
Additionally, multiple resizing results in a performance penalty.

The question came up, whether it would be generally useful to implement an auxiliary statics backend in openPMD (or ADIOS2) which e.g. counts the required memory.
Thus one could allocate a buffer with the correct size right from the start and avoid resizing.

cc'ing @franzpoeschel @psychocoderHPC

@psychocoderHPC
Copy link
Member

I am not sure if this helps a lot. It could be useful for debugging but requires deep knowledge about the openPMD backend used e.g. ADIOS2.
We know that ADIOS is using this internal buffer but each backend can be implemented differently, flush in between, resize the buffer without there requirement of additional memory, ... ADIOS is also using temporary buffer space for operations e.g compression. We would not get such information out of an auxiliary backend.
In that case, knowing how much data is written will IMO not provide very useful information.

@ax3l
Copy link
Member

ax3l commented Mar 9, 2021

We discussed this last week with the ADIOS team since we encounter this already for a bit (ornladios/ADIOS2#1814) and had a quite verbose solution in ADIOS1; @franzpoeschel will post the minutes here.

@franzpoeschel
Copy link
Contributor

Summing up what was discussed in VCs so far:

  • The workflow suggested here would be a huge overhead in development, maintenance, usage difficulty and runtime cost for what would still essentially be a workaround.
  • Any proper solution will require code changes in ADIOS, openPMD and PIConGPU, so this will stay a relevant topic for at least a while.
  • The current (non-)solution is to set the ADIOS2 engine parameter InitialBufferSize correctly.
  • A short/mid-term solution is the extension of the adios2::Engine::Flush() call to MPI-collectively dump all data currently held in ADIOS2 buffers to disk. Would only work for file-based engines.
  • For the long-term solution, the ADIOS2 team is planning to implement a new serializer by late 2021. Apparently, it is planned to use more flexible data structures for serialization that would allow for resizing without reallocation of old data.
  • Orthogonal approaches at saving host-memory:
    • Use --openPMD.dataPreparationStrategy mappedMemory in PIConGPU. This comes at a cost in runtime.
    • Use Span-based adios2::Engine::Put API. Requires development in openPMD API and PIConGPU (already usable on WIP branches, see here for some benchmark results)
    • Set the ADIOS2 engine parameter AggregatorRatio to 1 to avoid extra memory usage by aggregation. Comes at the cost of bottlenecking the file system at large scale and slower reading.
    • There is WIP in ADIOS2 to support GPU Direct, allowing us to bypass in-between buffers. Will take a while to be usable.

@steindev
Copy link
Author

As it seems, the solution for our original problem will be found within ADIOS2.
And as there is a working work-around, I will close this issue.

Thanks to @franzpoeschel and @ax3l for picking up the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants