-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quadratic increasing costs for Engine::Put() during one step when using the SST Engine #1731
Comments
@franzpoeschel your assumptions are correct, thanks for reporting this. As you guessed, we need to limit the power of 2 as it's something implementations decide upon (not standardized) and can create problems when large memory chunks are allocated. There are buffer settings you can try on the BP serializer. Specifically, the InitialBufferSize and BufferGrowthFactor. Let us know if it helps. Thanks! |
I'll check it out, thanks for the hint! |
I tested it out and it helped fix my issues. I think it would be helpful if the documentation mentioned that engines using the BP3 serializer also accept those parameters (or did I miss that?). |
@franzpoeschel point taken, @eisenhauer @JasonRuonanWang maintain SST. Can you take a look at the docs? Thanks. |
@franzpoeschel Yes it is planned. We are working on replacing BP3 with BP4 in SST, and we will re-write many things during this process, including a true deferred mode on the writer side, and optimizations on the reader side as well. |
This is good to know, thanks! |
@franzpoeschel your concerns are valid, the current issue is that the BP serializer inside SST is created and destroyed at every step, unlike inside the file engines. Refactoring should take care of this problem, since the allocated buffer with |
So, the solution would be setting the buffer size once to the maximum needed size initially? If so, that's good to know. Thanks! |
Suggestion: a new function after BeginStep, for buffer allocation. @ax3l can then handle this situation in PIConGPU. |
I think this is a good idea, since it would also cover engines that do not reuse buffers across steps. |
Coming back to this, I think that serialization engines that are persisted across steps (such as in BP4) are the most elegant way to solve this and specifying |
While using ADIOS2 in PIConGPU for streaming IO, I noticed that during one step, each call to
Engine::Put()
took at least as long as the previous call. While investigating this, I found out the following:The total incoming data is not known up front, so reallocation becomes necessary.
This means that data written during one step in the SST engine will be reallocated as many times as subsequent chunks are written. This is probably fine for BP3 where deferred workflows are encouraged that avoid this behavior, but this will not work for SST.
By uncommenting this line, I was able to avoid this issue for now.
The text was updated successfully, but these errors were encountered: