Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on some systems when creating compressed datasets with span-based Put API #2965

Closed
franzpoeschel opened this issue Dec 2, 2021 · 13 comments

Comments

@franzpoeschel
Copy link
Contributor

franzpoeschel commented Dec 2, 2021

Describe the bug
Writing a dataset with a compression operator seems to work without any error. The resulting datasets have two issues:

  1. They are apparently not compressed, their sizes are equivalent to those of uncompressed datasets
  2. Reading fails with a segfault

The location of the segfault can be pinpointed with valgrind:

bash-4.2$ valgrind bpls compressed.bp/ -e '.*/B/x' -d
==8843== Memcheck, a memory error detector
==8843== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8843== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==8843== Command: bpls compressed.bp/ -e .*/B/x -d
==8843== 
  float    /data/24000/fields/B/x                              {640, 3000, 592}
==8843== Warning: set address range perms: large range [0xd731040, 0x5132d040) (undefined)
==8843== Invalid read of size 1
==8843==    at 0x5A1D82A: adios2::core::compress::Decompress(char const*, unsigned long, char*) (CompressorFactory.cpp:196)
==8843==    by 0x599EC57: void adios2::format::BP4Deserializer::PostDataRead<float>(adios2::core::Variable<float>&, adios2::core::Variable<float>::BPInfo&, adios2::helper::SubStreamBoxInfo const&, bool, unsigned long) (BP4Deserializer.tcc:529)
==8843==    by 0x57BBF74: void adios2::core::engine::BP4Reader::ReadVariableBlocks<float>(adios2::core::Variable<float>&) (BP4Reader.tcc:123)
==8843==    by 0x57BF487: void adios2::core::engine::BP4Reader::GetSyncCommon<float>(adios2::core::Variable<float>&, float*) (BP4Reader.tcc:44)
==8843==    by 0x57B203B: adios2::core::engine::BP4Reader::DoGetSync(adios2::core::Variable<float>&, float*) (BP4Reader.cpp:789)
==8843==    by 0x558469F: void adios2::core::Engine::Get<float>(adios2::core::Variable<float>&, float*, adios2::Mode) (Engine.tcc:98)
==8843==    by 0x5584A0A: void adios2::core::Engine::Get<float>(adios2::core::Variable<float>&, std::vector<float, std::allocator<float> >&, adios2::Mode) (Engine.tcc:132)
==8843==    by 0x483916: int adios2::utils::readVar<float>(adios2::core::Engine*, adios2::core::IO*, adios2::core::Variable<float>*) (bpls.cpp:2104)
==8843==    by 0x4608A7: int adios2::utils::printVariableInfo<float>(adios2::core::Engine*, adios2::core::IO*, adios2::core::Variable<float>*) (bpls.cpp:1239)
==8843==    by 0x454CD4: adios2::utils::doList_vars(adios2::core::Engine*, adios2::core::IO*) (bpls.cpp:1009)
==8843==    by 0x455A4D: adios2::utils::doList(char const*) (bpls.cpp:1610)
==8843==    by 0x453869: adios2::utils::bplsMain(int, char**) (bpls.cpp:665)
==8843==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==8843== 

bpls:8843 terminated with signal 11 at PC=5a1d82a SP=1ffeff7720.  Backtrace:
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios24core8compress10DecompressEPKcmPc+0x28)[0x5a1d82a]
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios26format15BP4Deserializer12PostDataReadIfEEvRNS_4core8VariableIT_EERNS6_6BPInfoERKNS_6helper16SubStreamBoxInfoEbm+0x210)[0x599ec58]
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios24core6engine9BP4Reader18ReadVariableBlocksIfEEvRNS0_8VariableIT_EE+0x687)[0x57bbf75]
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios24core6engine9BP4Reader13GetSyncCommonIfEEvRNS0_8VariableIT_EEPS5_+0x8a)[0x57bf488]
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios24core6engine9BP4Reader9DoGetSyncERNS0_8VariableIfEEPf+0x1c6)[0x57b203c]
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios24core6Engine3GetIfEEvRNS0_8VariableIT_EEPS4_NS_4ModeE+0x172)[0x55846a0]
/home/poesch58/pic_env/local/bin/../lib64/libadios2_core.so.2(_ZN6adios24core6Engine3GetIfEEvRNS0_8VariableIT_EERSt6vectorIS4_SaIS4_EENS_4ModeE+0xa5)[0x5584a0b]
bpls[0x483917]
bpls[0x4608a8]
bpls[0x454cd5]
bpls[0x455a4e]
bpls[0x45386a]
bpls[0x4580b5]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x8cf1555]
bpls[0x4520a9]
==8843== 
==8843== HEAP SUMMARY:
==8843==     in use at exit: 1,144,043,077 bytes in 2,702 blocks
==8843==   total heap usage: 10,760 allocs, 8,058 frees, 1,144,626,530 bytes allocated
==8843== 
==8843== LEAK SUMMARY:
==8843==    definitely lost: 0 bytes in 0 blocks
==8843==    indirectly lost: 0 bytes in 0 blocks
==8843==      possibly lost: 0 bytes in 0 blocks
==8843==    still reachable: 1,144,043,077 bytes in 2,702 blocks
==8843==         suppressed: 0 bytes in 0 blocks
==8843== Rerun with --leak-check=full to see details of leaked memory
==8843== 
==8843== For counts of detected and suppressed errors, rerun with: -v
==8843== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

The offending line is:

std::memcpy(&compressorType, bufferIn, 1);

where apparently bufferIn is a null pointer.

I could observe this behavior with Blosc compression as well as with Bzip2. When not compressing the data, reading works fine.

To Reproduce
~~I don't have a reproducing minimal example at hand. The error occured when using openpmd-pipe for a post-hoc compression of a 77Gb dataset (per step). Each mesh had a size of ~4.23Gb and was written as a single chunk.

I hope that maybe you already have an idea on what's going wrong from the above error output. If not, I can try out how reproducible the error is.~~

See comment below for reproducing

Expected behavior
Writing a compressed file should alter the file size and result in a readable file.

Desktop (please complete the following information):

  • OS/Platform: Hemera cluster at HZDR, CentOS Linux release 7.9.2009 (Core)
  • Build [e.g. compiler version gcc 7.4.0, cmake version, build type: static ]: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, g++ 7.3.0, current master branch of ADIOS2 (b39b9a4)

Additional context

Following up

@franzpoeschel
Copy link
Contributor Author

The error seems to be system-specific. I tried a smaller dataset from our CI now and could reproduce the same error. On another system, I did not see the error. (I used the same version of ADIOS2 on that system.)

@Anton-Le
Copy link

Anton-Le commented Dec 5, 2021

I can confirm the above problem and reproduce it on a third system given the same dataset as @franzpoeschel and me tried on hemera @ HZDR.

Compressed data, while purportedly being written without a hitch can not be read back and takes up the same amount of disk space as the uncompressed data - indicating that no compression has been performed irrespective of the compression level specified. Uncompressed data can be written and read without issues.
Compression of the dataset stipulated by passing
"dataset": { "operators": [ { "type": "blosc", "parameters": { "clevel": "3", "doshuffle": "BLOSC_BITSHUFFLE" } } ] } as a part of the option string to openpmd-pipe (ADIOS 2).

Run-time environment

  • OS: RHEL 7.6

Prerequisites:
Python 3.8.0 (without mpi4py)
GCC 9.3.0
OpenMPI 4.0.4

Zlib:

Version 1.2.11

BLOSC:

Version 1.15.0
Compiled via:
cmake -DCMAKE_INSTALL_PREFIX=$HCBASE/lib/BLOSC/1.15.0 -DPREFER_EXTERNAL_ZLIB=ON ../c-blosc

Build type: Release
Static and shared build.

ADIOS 1 (for input data)

Version 1.13.1
Compiled via:
CFLAGS="-fPIC" ../../adios-1.13.1/configure --enable-static --enable-shared --prefix=$ADIOS_ROOT --with-mpi=$MPI_ROOT --with-zlib=$ZLIB_ROOT --with-blosc=$BLOSC_ROOT

ADIOS 2 (for output data):

Version 2.7.1 (git commit 94c2e37)
Compiled using default flags. Libraries for HDF5, Blosc, Zlib detected automatically. No IME
Build type: Release

openPMD (the glue):

Git commit 79077f8f56033cb67b870fe947553f41ca147485
Build type: Release
HDF5, ADIOS1 and ADIOS2 autodetected via CMake.

BLOSC, ADIOS(2) and openPMD versions are identical on this system and hemera @ HZDR (where I first encountered this problem).
While the libraries were compiled with MPI support the conversion was performed using only 1 process.

Valgrind output

bash-4.2$ valgrind bpls simDataCompressed.bp/ -e '.*/B/x' -d
==12919== Memcheck, a memory error detector
==12919== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12919== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==12919== Command: bpls simDataCompressed.bp/ -e .*/B/x -d
==12919== 
  float     /data/24000/fields/B/x                              {640, 3000, 592}
==12919== Warning: set address range perms: large range [0x59fa2040, 0x168f92040) (undefined)
==12919== Invalid read of size 4
==12919==    at 0x52F6590: adios2::core::compress::CompressBlosc::Decompress(void const*, unsigned long, void*, unsigned long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) const (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x529B38A: adios2::format::BPBlosc::GetData(char const*, adios2::helper::BlockOperationInfo const&, char*) const (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x523C989: void adios2::format::BP4Deserializer::PostDataRead<float>(adios2::core::Variable<float>&, adios2::core::Variable<float>::Info&, adios2::helper::SubStreamBoxInfo const&, bool, unsigned long) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x50AF707: void adios2::core::engine::BP4Reader::ReadVariableBlocks<float>(adios2::core::Variable<float>&) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x50ABC68: adios2::core::engine::BP4Reader::DoGetSync(adios2::core::Variable<float>&, float*) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x4FD6CF1: void adios2::core::Engine::Get<float>(adios2::core::Variable<float>&, float*, adios2::Mode) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x4FE24AB: void adios2::core::Engine::Get<float>(adios2::core::Variable<float>&, std::vector<float, std::allocator<float> >&, adios2::Mode) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x43C124: int adios2::utils::readVar<float>(adios2::core::Engine*, adios2::core::IO*, adios2::core::Variable<float>*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x462D34: int adios2::utils::printVariableInfo<float>(adios2::core::Engine*, adios2::core::IO*, adios2::core::Variable<float>*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x4188E1: adios2::utils::doList_vars(adios2::core::Engine*, adios2::core::IO*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x41932A: adios2::utils::doList(char const*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x419FE4: adios2::utils::bplsMain(int, char**) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==12919== 
==12919== 
==12919== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==12919==  Access not within mapped region at address 0x0
==12919==    at 0x52F6590: adios2::core::compress::CompressBlosc::Decompress(void const*, unsigned long, void*, unsigned long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) const (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x529B38A: adios2::format::BPBlosc::GetData(char const*, adios2::helper::BlockOperationInfo const&, char*) const (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x523C989: void adios2::format::BP4Deserializer::PostDataRead<float>(adios2::core::Variable<float>&, adios2::core::Variable<float>::Info&, adios2::helper::SubStreamBoxInfo const&, bool, unsigned long) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x50AF707: void adios2::core::engine::BP4Reader::ReadVariableBlocks<float>(adios2::core::Variable<float>&) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x50ABC68: adios2::core::engine::BP4Reader::DoGetSync(adios2::core::Variable<float>&, float*) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x4FD6CF1: void adios2::core::Engine::Get<float>(adios2::core::Variable<float>&, float*, adios2::Mode) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x4FE24AB: void adios2::core::Engine::Get<float>(adios2::core::Variable<float>&, std::vector<float, std::allocator<float> >&, adios2::Mode) (in $LIBDIR/ADIOS/2.7.1/lib64/libadios2_core.so.2.7.1)
==12919==    by 0x43C124: int adios2::utils::readVar<float>(adios2::core::Engine*, adios2::core::IO*, adios2::core::Variable<float>*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x462D34: int adios2::utils::printVariableInfo<float>(adios2::core::Engine*, adios2::core::IO*, adios2::core::Variable<float>*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x4188E1: adios2::utils::doList_vars(adios2::core::Engine*, adios2::core::IO*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x41932A: adios2::utils::doList(char const*) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==    by 0x419FE4: adios2::utils::bplsMain(int, char**) (in $LIBDIR/ADIOS/2.7.1/bin/bpls)
==12919==  If you believe this happened as a result of a stack
==12919==  overflow in your program's main thread (unlikely but
==12919==  possible), you can try to increase the size of the
==12919==  main thread stack using the --main-stacksize= flag.
==12919==  The main thread stack size used in this run was 16777216.
==12919== 
==12919== HEAP SUMMARY:
==12919==     in use at exit: 4,553,980,764 bytes in 2,851 blocks
==12919==   total heap usage: 10,727 allocs, 7,876 frees, 4,554,475,597 bytes allocated
==12919== 
==12919== LEAK SUMMARY:
==12919==    definitely lost: 0 bytes in 0 blocks
==12919==    indirectly lost: 0 bytes in 0 blocks
==12919==      possibly lost: 0 bytes in 0 blocks
==12919==    still reachable: 4,553,980,764 bytes in 2,851 blocks
==12919==         suppressed: 0 bytes in 0 blocks
==12919== Rerun with --leak-check=full to see details of leaked memory
==12919== 
==12919== For counts of detected and suppressed errors, rerun with: -v
==12919== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

@franzpoeschel
Copy link
Contributor Author

Reproducing example

I've now been able to see the issue with a pure ADIOS2-based example code. For this, I wrote a small script to compress one of our example datasets.

Steps for reproducing:

  1. Download our ADIOS2 sample dataset, extract it and cd example-3d-bp4
  2. Compile the code below. Its purpose is to copy all variables from one BP file to another one (out.bp), while applying the bzip2 operator to each variable.
  3. Execute the compiled binary in the folder of the example dataset.
  4. bpls -d out.bp will then lead to the errors seen above on some systems. If commenting out the line with AddOperation, everything will run fine.
#include <adios2.h>

template< typename VarType >
void variableFromSourceToSink(
    std::string const & varName,
    adios2::IO readIO,
    adios2::IO writeIO,
    adios2::Engine readEngine,
    adios2::Engine writeEngine,
    adios2::Operator compress )
{
    auto readVar = readIO.InquireVariable< VarType >( varName );
    auto shape = readVar.Shape();
    adios2::Dims start( shape.size(), 0 );

    auto writeVar =
        writeIO.DefineVariable< VarType >( varName, shape, start, shape );
    writeVar.AddOperation( compress );

    if( shape.size() == 0 )
    {
        throw std::runtime_error( "Scalar variables not supported" );
    }
    auto span = writeEngine.Put( writeVar );
    readEngine.Get(
        readVar,
        span.data(),
        adios2::Mode::Sync ); // we need sync access here since the span pointer
                              // might change due to reallocations
}

int main()
{
    adios2::ADIOS entryPoint;
    adios2::Operator compress = entryPoint.DefineOperator( "bzip2", "bzip2" );

    adios2::IO readIO = entryPoint.DeclareIO( "read" );
    adios2::IO writeIO = entryPoint.DeclareIO( "write" );
    adios2::Engine readEngine =
        readIO.Open( "example-3d-bp4.bp", adios2::Mode::Read );
    adios2::Engine writeEngine = readIO.Open( "out.bp", adios2::Mode::Write );
    while( readEngine.BeginStep() != adios2::StepStatus::EndOfStream )
    {
        writeEngine.BeginStep();
        for( auto const & pair : readIO.AvailableVariables() )
        {
            std::string const & varName = pair.first;
            auto const & info = pair.second;
            auto const & typeLabel = info.at( "Type" );
#define SWITCH( type, label )                                                  \
    if( typeLabel == #label || typeLabel == #type )                            \
    {                                                                          \
        variableFromSourceToSink< type >(                                      \
            varName, readIO, writeIO, readEngine, writeEngine, compress );     \
    }                                                                          \
    else
            ADIOS2_FOREACH_PRIMITVE_STDTYPE_2ARGS( SWITCH )
            // else
            {
                throw std::runtime_error( "Type not found: " + typeLabel );
            }
#undef SWITCH
        }
        readEngine.EndStep();
        writeEngine.EndStep();
    }
    readEngine.Close();
    writeEngine.Close();
    return 0;
}

@franzpoeschel
Copy link
Contributor Author

The issue seems to be with span-based writing

The following example does not trigger the issue:

#include <adios2.h>
#include <vector>

template< typename VarType >
void variableFromSourceToSink(
    std::string const & varName,
    adios2::IO readIO,
    adios2::IO writeIO,
    adios2::Engine readEngine,
    adios2::Engine writeEngine,
    adios2::Operator compress )
{
    auto readVar = readIO.InquireVariable< VarType >( varName );
    auto shape = readVar.Shape();
    adios2::Dims start( shape.size(), 0 );

    auto writeVar =
        writeIO.DefineVariable< VarType >( varName, shape, start, shape );
    writeVar.AddOperation( compress );

    if( shape.size() == 0 )
    {
        throw std::runtime_error( "Scalar variables not supported" );
    }
    std::vector< VarType > buffer;
    readEngine.Get(
        readVar,
        buffer,
        adios2::Mode::Sync );
    writeEngine.Put( writeVar, buffer.data(), adios2::Mode::Sync );
}

int main()
{
    adios2::ADIOS entryPoint;
    adios2::Operator compress = entryPoint.DefineOperator( "bzip2", "bzip2" );

    adios2::IO readIO = entryPoint.DeclareIO( "read" );
    adios2::IO writeIO = entryPoint.DeclareIO( "write" );
    adios2::Engine readEngine =
        readIO.Open( "example-3d-bp4.bp", adios2::Mode::Read );
    adios2::Engine writeEngine = readIO.Open( "out.bp", adios2::Mode::Write );
    while( readEngine.BeginStep() != adios2::StepStatus::EndOfStream )
    {
        writeEngine.BeginStep();
        for( auto const & pair : readIO.AvailableVariables() )
        {
            std::string const & varName = pair.first;
            auto const & info = pair.second;
            auto const & typeLabel = info.at( "Type" );
#define SWITCH( type, label )                                                  \
    if( typeLabel == #label || typeLabel == #type )                            \
    {                                                                          \
        variableFromSourceToSink< type >(                                      \
            varName, readIO, writeIO, readEngine, writeEngine, compress );     \
    }                                                                          \
    else
            ADIOS2_FOREACH_PRIMITVE_STDTYPE_2ARGS( SWITCH )
            // else
            {
                throw std::runtime_error( "Type not found: " + typeLabel );
            }
#undef SWITCH
        }
        readEngine.EndStep();
        writeEngine.EndStep();
    }
    readEngine.Close();
    writeEngine.Close();
    return 0;
}

@franzpoeschel franzpoeschel changed the title Segfault when reading compressed datasets Segfault on some systems when creating compressed datasets with span-based Put API Dec 6, 2021
@franzpoeschel
Copy link
Contributor Author

@psychocoderHPC The way to workaround the issue on the affected systems is by not using the span-based API. openPMD exposes both variants, so it's up to applications to use either one.
Theoretically, we can add an environment variable or so into the openPMD-api to enforce usage of the fallback implementation that we have, but let's first see how difficult this bug turns out to be.

@williamfgc
Copy link
Contributor

@franzpoeschel FYI, span+compression was never supported in BP3/BP4, it's listed as a limitation in the docs for Put using Span. We didn't have the bandwidth, so it has to be implemented. @pnorbert and @eisenhauer would know if this will be available in BP5.

@franzpoeschel
Copy link
Contributor Author

So there will be no fix?
If so, is it possible to issue a warning when such a situation occurs? This took us some while to debug.
Also, this means that we should add a workaround in openPMD like I described above.

@williamfgc
Copy link
Contributor

@franzpoeschel I agree, there should be an exception thrown. That's probably an oversight from my side, I'll try to do a PR. As for implementing it, I don't think @pnorbert and @eisenhauer are adding new features to BP3/BP4, but BP5 (please correct me if I'm wrong) since this is not a trivial change. Hope it helps.

@eisenhauer
Copy link
Member

No, the combination will not be available in BP5. Generally putting the two together doesn't make much sense. The function of Span is to allocate space for to-be-generated data in ADIOS internal buffers, returning the address of that space, with the idea that we can avoid a data copy that way. But for that to work, the data size can't change (as it would with compression), and of course most compression operations don't operate in-place, so they're essentially a copy anyway. (Also, BP5 tries to do zero-copy writes for large deferred Put()s, so Span is immaterial. We've implemented it, for backwards compatibility, but it will generally not provide a gain in performance.)

@franzpoeschel
Copy link
Contributor Author

Ah, so the operators are applied inside ADIOS2 while performing Puts? Then I agree that it does not quite fit the workflow.
Also it's then probably good news that the span API will not be necessary in BP5, because that makes our decision easier to make the span API opt-in instead of opt-out.

@williamfgc
Copy link
Contributor

@franzpoeschel Operators (compression really in this case) can be heavy operations so yes, they are applied in PerfomPuts/EndStep/Put in sync mode. The motivation for Span is memory-bound application and composing a contiguous memory variable from non-contiguous memory (e.g. tables, array of structs). I think adding an Operator still makes sense since the final Variable data in the stream (on disk or network) can be compressed at the expense of internally having temporary memory that is then copied to the original "span" buffer post-compression. Just more operations to reduce the final data size, but I don't know when it's worth it.

@franzpoeschel
Copy link
Contributor Author

Thanks for the clarification @williamfgc
I'll leave the issue open for now since you said you wanted to prepare a PR for the warning.

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jan 3, 2022

Thanks for adding a clear error message now :)
Fixed with #2981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants