Quadratic increasing costs for Engine::Put() during one step when using the SST Engine #1731

franzpoeschel · 2019-09-11T11:33:19Z

While using ADIOS2 in PIConGPU for streaming IO, I noticed that during one step, each call to Engine::Put() took at least as long as the previous call. While investigating this, I found out the following:

The SST engine does not distinguish between sync and deferred mode (see → and →), so each written dataset is instantly marshalled and written to an internal buffer.
The total incoming data is not known up front, so reallocation becomes necessary.
Reallocation in the BP3 serializer (which is used by SST) intentionally overrides the STL GNU default power of 2 reallocation and instead enforces a linear behavior (see →)

This means that data written during one step in the SST engine will be reallocated as many times as subsequent chunks are written. This is probably fine for BP3 where deferred workflows are encouraged that avoid this behavior, but this will not work for SST.

By uncommenting this line, I was able to avoid this issue for now.

The text was updated successfully, but these errors were encountered:

williamfgc · 2019-09-11T14:13:04Z

@franzpoeschel your assumptions are correct, thanks for reporting this. As you guessed, we need to limit the power of 2 as it's something implementations decide upon (not standardized) and can create problems when large memory chunks are allocated. There are buffer settings you can try on the BP serializer. Specifically, the InitialBufferSize and BufferGrowthFactor. Let us know if it helps. Thanks!

franzpoeschel · 2019-09-11T16:01:27Z

I'll check it out, thanks for the hint!

franzpoeschel · 2019-09-13T12:49:16Z

I tested it out and it helped fix my issues. I think it would be helpful if the documentation mentioned that engines using the BP3 serializer also accept those parameters (or did I miss that?).
Also, the current solution boils down to tuning reallocation instead of avoiding it. In the long run, it might be desirable to support a deferred Put mode – is something like this planned?

williamfgc · 2019-09-14T14:09:44Z

@franzpoeschel point taken, @eisenhauer @JasonRuonanWang maintain SST. Can you take a look at the docs? Thanks.

JasonRuonanWang · 2019-09-14T15:24:49Z

@franzpoeschel Yes it is planned. We are working on replacing BP3 with BP4 in SST, and we will re-write many things during this process, including a true deferred mode on the writer side, and optimizations on the reader side as well.

franzpoeschel · 2019-09-18T11:19:45Z

This is good to know, thanks!
I haven't used ADIOS1 myself, but from what I have heard, in ADIOS1 it was required to announce all chunks to store before storing the first one (correct me if I'm wrong on that).
While this style is more tedious to write in, it still has an advantage over the deferred put mode to solve the issue above: In a deferred put mode, all data needs to be allocated at the same time (at EndStep time), while by announcing all chunks beforehand, the SST engine can already allocate a correctly-sized buffer in the beginning and then accept incoming chunks one after another. I thought about solving this issue in memory-critical applications by using the initial buffer size parameter, but this parameter is global for one engine and cannot be changed across steps (or can it?).
Long story short: Is it possible (or planned) for memory-critical applications to avoid reallocations without having to give ADIOS2 all data at once, as would become necessary in deferred mode?

williamfgc · 2019-09-19T17:50:31Z

@franzpoeschel your concerns are valid, the current issue is that the BP serializer inside SST is created and destroyed at every step, unlike inside the file engines. Refactoring should take care of this problem, since the allocated buffer with InitialBufferSize would be persistent across steps, thus avoiding frequent reallocations.

franzpoeschel · 2019-09-23T10:52:23Z

So, the solution would be setting the buffer size once to the maximum needed size initially? If so, that's good to know. Thanks!

pnorbert · 2019-10-07T20:28:57Z

Suggestion: a new function after BeginStep, for buffer allocation. @ax3l can then handle this situation in PIConGPU.

franzpoeschel · 2019-10-10T21:52:27Z

I think this is a good idea, since it would also cover engines that do not reuse buffers across steps.

franzpoeschel · 2021-02-22T17:11:22Z

Coming back to this, I think that serialization engines that are persisted across steps (such as in BP4) are the most elegant way to solve this and specifying InitialBufferSize a sufficient way to give more hints to the backend. Since those methods allow me to avoid the problems described in the issue, I think I can close this.

williamfgc mentioned this issue Nov 27, 2019

Large memory consumption in BufferSTL when using PerformPuts and Flush instead of Begin/EndStep (BP4) #1891

Closed

franzpoeschel closed this as completed Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic increasing costs for Engine::Put() during one step when using the SST Engine #1731

Quadratic increasing costs for Engine::Put() during one step when using the SST Engine #1731

franzpoeschel commented Sep 11, 2019

williamfgc commented Sep 11, 2019 •

edited

Loading

franzpoeschel commented Sep 11, 2019

franzpoeschel commented Sep 13, 2019 •

edited

Loading

williamfgc commented Sep 14, 2019

JasonRuonanWang commented Sep 14, 2019

franzpoeschel commented Sep 18, 2019

williamfgc commented Sep 19, 2019

franzpoeschel commented Sep 23, 2019

pnorbert commented Oct 7, 2019

franzpoeschel commented Oct 10, 2019

franzpoeschel commented Feb 22, 2021

Quadratic increasing costs for Engine::Put() during one step when using the SST Engine #1731

Quadratic increasing costs for Engine::Put() during one step when using the SST Engine #1731

Comments

franzpoeschel commented Sep 11, 2019

williamfgc commented Sep 11, 2019 • edited Loading

franzpoeschel commented Sep 11, 2019

franzpoeschel commented Sep 13, 2019 • edited Loading

williamfgc commented Sep 14, 2019

JasonRuonanWang commented Sep 14, 2019

franzpoeschel commented Sep 18, 2019

williamfgc commented Sep 19, 2019

franzpoeschel commented Sep 23, 2019

pnorbert commented Oct 7, 2019

franzpoeschel commented Oct 10, 2019

franzpoeschel commented Feb 22, 2021

williamfgc commented Sep 11, 2019 •

edited

Loading

franzpoeschel commented Sep 13, 2019 •

edited

Loading