-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large memory consumption in BufferSTL when using PerformPuts and Flush instead of Begin/EndStep (BP4) #1891
Comments
@franzpoeschel first of all, thanks for providing numbers as it always enriches the conversation. The memory growth you're seeing is intended within a single adios2 step, as you reported you are only using one adios2 step.
adios2 Put can do two things in terms of memory management:
Let me know if you have any questions. Thanks! |
It seems, I made an error yesterday when investigating the behavior of the SST engine. In my latest test run, that used the SST engine – where I definitely use ADIOS steps – I see the same issue, but with slightly lower numbers. I will further investigate this, but do you have an idea what could be the issue? EDIT: I think, this measurement was simply due to having no queue limit and a slow reader. I will run the test again. |
@franzpoeschel thanks for the detailed explanation, it's very helpful in our understanding. SST uses the BP3 buffering strategy per step, so no surprises with your results. The only caveat is that the memory is reallocated at every step which is discussed in #1731.
Makes sense, sync mode is basically deferred+PerformPuts, you just have a few Put calls in deferred mode+PerformPuts.
Is this helping your problem? The trade-off is that you'd be calling the filesystem more often, but seems that you have to since you're memory bound. Another option in BP-based engines is to set the buffer growth strategy check InitialBufferSize in the docs. The only issue is that you'd have to come up with some sort of heuristics to set the appropriate
Not sure if it helps openPMD model, but the only things that must remain constant across adios2 steps for a variable are the name, type, ShapeID, and number of dimensions. Basically, those are the variable invariants in adios2, as you know attributes are only loosely coupled through the IO factory. Everything else can change, not even per step, but per block (a single call to Put), dimension values can change, variables might not exist at certain steps, number of blocks per step, etc. Hope this helps. Let me know if you have more questions. Basically, adios2 provides a few options to control your I/O workflow, but it's subject to the physical limits of your data, metadata and system resources (memory, network, file disks, etc). |
I think, I should come back to report on this one. I used a little example program to check the behavior of #include <adios2.h>
#include <numeric>
#include <vector>
int main( int argsc, char ** argsv )
{
constexpr size_t length = 10000;
std::string engine_type = "bp4";
if( argsc > 1 )
{
engine_type = argsv[ 1 ];
}
adios2::ADIOS adios;
adios2::IO IO = adios.DeclareIO( "IO" );
IO.SetEngine( engine_type );
adios2::Engine engine = IO.Open( "no_steps.bp", adios2::Mode::Write );
using datatype = double;
std::vector< datatype > streamData( length );
std::iota( streamData.begin(), streamData.end(), 0. );
for( unsigned step = 0; step < 1000; ++step )
{
auto variable = IO.DefineVariable< datatype >(
"var" + std::to_string( step ),
{ length },
{ 0 },
{ length },
/* constantDims = */ true );
engine.Put( variable, streamData.data() );
// move to ADIOS buffer
engine.PerformPuts();
// move to file
engine.Flush();
}
engine.Close();
} Memory consumption peaks at 1.4MB:
I don't know whether this is intended behavior, but since it is no longer relevant for our usage, I think that I can close the issue anyway. I haven't tested any streaming engines since we make the use of steps mandatory for streaming in openPMD anyway. Without calling
An iteration schema that makes better use of ADIOS steps is currently WIP. |
During performance evaluations of the ADIOS2 backend in the openPMD API, I noticed an unusually large heap memory consumption in non-streaming workflows (in this case the BP4 engine) and traced it back to memory not being freed from the marshalling buffer (
BufferSTL
).Since openPMD's iterations cannot be easily modeled using the ADIOS2 step concept, this backend only uses steps for streaming engines. For disk-based engines, we use
Engine::PerformPuts
andEngine::Flush
instead. The documentation for the latter method says:This suggests that data should not be present in ADIOS after this call. (?) The figure below shows the memory trace from a small example, writing 30 openPMD iterations from PIConGPU to disk with several flushes per iteration. This memory buildup is not visible when using ADIOS steps.
Did I understand the functionality of
Engine::Flush
correctly? In that case, calling it should free the buffer. If not, is there a suggested alternative to avoid using ADIOS steps without building up heap memory usage in the described way?The text was updated successfully, but these errors were encountered: