-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on some systems when creating compressed datasets with span-based Put API #2965
Comments
The error seems to be system-specific. I tried a smaller dataset from our CI now and could reproduce the same error. On another system, I did not see the error. (I used the same version of ADIOS2 on that system.) |
I can confirm the above problem and reproduce it on a third system given the same dataset as @franzpoeschel and me tried on hemera @ HZDR. Compressed data, while purportedly being written without a hitch can not be read back and takes up the same amount of disk space as the uncompressed data - indicating that no compression has been performed irrespective of the compression level specified. Uncompressed data can be written and read without issues. Run-time environment
Prerequisites: Zlib:Version 1.2.11 BLOSC:Version 1.15.0 Build type: Release ADIOS 1 (for input data)Version 1.13.1 ADIOS 2 (for output data):Version 2.7.1 (git commit 94c2e37) openPMD (the glue):Git commit 79077f8f56033cb67b870fe947553f41ca147485 BLOSC, ADIOS(2) and openPMD versions are identical on this system and hemera @ HZDR (where I first encountered this problem). Valgrind output
|
Reproducing example I've now been able to see the issue with a pure ADIOS2-based example code. For this, I wrote a small script to compress one of our example datasets. Steps for reproducing:
#include <adios2.h>
template< typename VarType >
void variableFromSourceToSink(
std::string const & varName,
adios2::IO readIO,
adios2::IO writeIO,
adios2::Engine readEngine,
adios2::Engine writeEngine,
adios2::Operator compress )
{
auto readVar = readIO.InquireVariable< VarType >( varName );
auto shape = readVar.Shape();
adios2::Dims start( shape.size(), 0 );
auto writeVar =
writeIO.DefineVariable< VarType >( varName, shape, start, shape );
writeVar.AddOperation( compress );
if( shape.size() == 0 )
{
throw std::runtime_error( "Scalar variables not supported" );
}
auto span = writeEngine.Put( writeVar );
readEngine.Get(
readVar,
span.data(),
adios2::Mode::Sync ); // we need sync access here since the span pointer
// might change due to reallocations
}
int main()
{
adios2::ADIOS entryPoint;
adios2::Operator compress = entryPoint.DefineOperator( "bzip2", "bzip2" );
adios2::IO readIO = entryPoint.DeclareIO( "read" );
adios2::IO writeIO = entryPoint.DeclareIO( "write" );
adios2::Engine readEngine =
readIO.Open( "example-3d-bp4.bp", adios2::Mode::Read );
adios2::Engine writeEngine = readIO.Open( "out.bp", adios2::Mode::Write );
while( readEngine.BeginStep() != adios2::StepStatus::EndOfStream )
{
writeEngine.BeginStep();
for( auto const & pair : readIO.AvailableVariables() )
{
std::string const & varName = pair.first;
auto const & info = pair.second;
auto const & typeLabel = info.at( "Type" );
#define SWITCH( type, label ) \
if( typeLabel == #label || typeLabel == #type ) \
{ \
variableFromSourceToSink< type >( \
varName, readIO, writeIO, readEngine, writeEngine, compress ); \
} \
else
ADIOS2_FOREACH_PRIMITVE_STDTYPE_2ARGS( SWITCH )
// else
{
throw std::runtime_error( "Type not found: " + typeLabel );
}
#undef SWITCH
}
readEngine.EndStep();
writeEngine.EndStep();
}
readEngine.Close();
writeEngine.Close();
return 0;
} |
The issue seems to be with span-based writing The following example does not trigger the issue: #include <adios2.h>
#include <vector>
template< typename VarType >
void variableFromSourceToSink(
std::string const & varName,
adios2::IO readIO,
adios2::IO writeIO,
adios2::Engine readEngine,
adios2::Engine writeEngine,
adios2::Operator compress )
{
auto readVar = readIO.InquireVariable< VarType >( varName );
auto shape = readVar.Shape();
adios2::Dims start( shape.size(), 0 );
auto writeVar =
writeIO.DefineVariable< VarType >( varName, shape, start, shape );
writeVar.AddOperation( compress );
if( shape.size() == 0 )
{
throw std::runtime_error( "Scalar variables not supported" );
}
std::vector< VarType > buffer;
readEngine.Get(
readVar,
buffer,
adios2::Mode::Sync );
writeEngine.Put( writeVar, buffer.data(), adios2::Mode::Sync );
}
int main()
{
adios2::ADIOS entryPoint;
adios2::Operator compress = entryPoint.DefineOperator( "bzip2", "bzip2" );
adios2::IO readIO = entryPoint.DeclareIO( "read" );
adios2::IO writeIO = entryPoint.DeclareIO( "write" );
adios2::Engine readEngine =
readIO.Open( "example-3d-bp4.bp", adios2::Mode::Read );
adios2::Engine writeEngine = readIO.Open( "out.bp", adios2::Mode::Write );
while( readEngine.BeginStep() != adios2::StepStatus::EndOfStream )
{
writeEngine.BeginStep();
for( auto const & pair : readIO.AvailableVariables() )
{
std::string const & varName = pair.first;
auto const & info = pair.second;
auto const & typeLabel = info.at( "Type" );
#define SWITCH( type, label ) \
if( typeLabel == #label || typeLabel == #type ) \
{ \
variableFromSourceToSink< type >( \
varName, readIO, writeIO, readEngine, writeEngine, compress ); \
} \
else
ADIOS2_FOREACH_PRIMITVE_STDTYPE_2ARGS( SWITCH )
// else
{
throw std::runtime_error( "Type not found: " + typeLabel );
}
#undef SWITCH
}
readEngine.EndStep();
writeEngine.EndStep();
}
readEngine.Close();
writeEngine.Close();
return 0;
} |
@psychocoderHPC The way to workaround the issue on the affected systems is by not using the span-based API. openPMD exposes both variants, so it's up to applications to use either one. |
@franzpoeschel FYI, span+compression was never supported in BP3/BP4, it's listed as a limitation in the docs for |
So there will be no fix? |
@franzpoeschel I agree, there should be an exception thrown. That's probably an oversight from my side, I'll try to do a PR. As for implementing it, I don't think @pnorbert and @eisenhauer are adding new features to BP3/BP4, but BP5 (please correct me if I'm wrong) since this is not a trivial change. Hope it helps. |
No, the combination will not be available in BP5. Generally putting the two together doesn't make much sense. The function of Span is to allocate space for to-be-generated data in ADIOS internal buffers, returning the address of that space, with the idea that we can avoid a data copy that way. But for that to work, the data size can't change (as it would with compression), and of course most compression operations don't operate in-place, so they're essentially a copy anyway. (Also, BP5 tries to do zero-copy writes for large deferred Put()s, so Span is immaterial. We've implemented it, for backwards compatibility, but it will generally not provide a gain in performance.) |
Ah, so the operators are applied inside ADIOS2 while performing Puts? Then I agree that it does not quite fit the workflow. |
@franzpoeschel Operators (compression really in this case) can be heavy operations so yes, they are applied in |
Thanks for the clarification @williamfgc |
Thanks for adding a clear error message now :) |
Describe the bug
Writing a dataset with a compression operator seems to work without any error. The resulting datasets have two issues:
The location of the segfault can be pinpointed with valgrind:
The offending line is:
where apparently
bufferIn
is a null pointer.I could observe this behavior with Blosc compression as well as with Bzip2. When not compressing the data, reading works fine.
To Reproduce
~~I don't have a reproducing minimal example at hand. The error occured when using
openpmd-pipe
for a post-hoc compression of a 77Gb dataset (per step). Each mesh had a size of ~4.23Gb and was written as a single chunk.I hope that maybe you already have an idea on what's going wrong from the above error output. If not, I can try out how reproducible the error is.~~
See comment below for reproducing
Expected behavior
Writing a compressed file should alter the file size and result in a readable file.
Desktop (please complete the following information):
Additional context
Following up
The text was updated successfully, but these errors were encountered: