Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak Remote class and test multi-threaded file remote access #3834

Merged
merged 4 commits into from
Oct 14, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions source/adios2/toolkit/remote/Remote.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,22 @@ void ReadResponseHandler(CManager cm, CMConnection conn, void *vevent, void *cli
return;
};

CManagerSingleton &CManagerSingleton::Instance(RemoteCommon::Remote_evpath_state &ev_state)
{
static CManagerSingleton self = [&ev_state] {
CManagerSingleton instance;
ev_state = instance.internalEvState;
return instance;
}();
ev_state = self.internalEvState;
Copy link
Contributor

@anagainaru anagainaru Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a thread might reach this line before the thread calling the lambda finished in which case there will be a thread with an invalid internal ev state. If this is the thread that ends up calling std::call_once we will have a problem. Is there a reason why you want the CMregister_handler calls outside the singleton?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state should be complete once we have registered the formats. We don't need the handlers to have been registered until we get messages arriving. (I had them outside of the Singleton because they weren't visible inside it. I could have added the externs before it, but I don't think it should be necessary.). My understanding (which may be flawed), is that the singleton creation is thread safe, so anyone that gets it will get it after its constructor has been completed. So, the formats should have been registered and the internalEvState fully populated. So, the ev_state in every Remote instance should be appropriately populated. One of them will register the handlers, and from what I read of the call_once docs, there will be no race conditions. I.E. if a thread invokes the call_once, it will only proceed past that once someone has completed the call, even if it's some other thread doing it. I don't think there are holes here, but I can't explain the failures. Will be trying to get it into a docker image to see if I can get more info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it wasn't the internal state thing. Instead icc seems not agree that this pattern produces a singleton. Specifically, the constructor gets called (creating the CManager), then the destructor gets called (closing the CManager), then we start registering handlers but get a segfault because the CManager has been closed and deallocated. I'm guessing that the other failure, from nvhpc is from the same problem. The gcc-based compilers seem happy, but not the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, the internal state is guaranteed to be created when threads reach that line. Let me play with godbolt a bit to look at the assembly created by icc. I am very surprised only gcc likes this.

return self;
}

void Remote::InitCMData()
{
std::lock_guard<std::mutex> lockGuard(m_CMInitMutex);
bool first = true;
auto CM = CManagerSingleton::Instance(first);
ev_state.cm = CM->m_cm;
RegisterFormats(ev_state);
if (first)
{
CMfork_comm_thread(ev_state.cm);
(void)CManagerSingleton::Instance(ev_state);
static std::once_flag flag;
std::call_once(flag, [&]() {
CMregister_handler(ev_state.OpenResponseFormat, (CMHandlerFunc)OpenResponseHandler,
&ev_state);
CMregister_handler(ev_state.ReadResponseFormat, (CMHandlerFunc)ReadResponseHandler,
Expand All @@ -68,7 +74,7 @@ void Remote::InitCMData()
(CMHandlerFunc)OpenSimpleResponseHandler, &ev_state);
CMregister_handler(ev_state.ReadResponseFormat, (CMHandlerFunc)ReadResponseHandler,
&ev_state);
}
});
}

void Remote::Open(const std::string hostname, const int32_t port, const std::string filename,
Expand Down
28 changes: 11 additions & 17 deletions source/adios2/toolkit/remote/Remote.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,32 +63,26 @@ class Remote
bool m_Active = false;
};

#ifdef ADIOS2_HAVE_SST
class CManagerSingleton
{
public:
#ifdef ADIOS2_HAVE_SST
static CManagerSingleton &Instance(RemoteCommon::Remote_evpath_state &ev_state);

private:
CManager m_cm = NULL;
#endif
static CManagerSingleton *Instance(bool &first)
RemoteCommon::Remote_evpath_state internalEvState;
CManagerSingleton()
{
static CManagerSingleton *ptr = new CManagerSingleton();
static bool internal_first = true;
first = internal_first;
internal_first = false;
return ptr;
m_cm = CManager_create();
internalEvState.cm = m_cm;
RegisterFormats(internalEvState);
CMfork_comm_thread(internalEvState.cm);
}

protected:
#ifdef ADIOS2_HAVE_SST
CManagerSingleton() { m_cm = CManager_create(); }

~CManagerSingleton() { CManager_close(m_cm); }
#else
CManagerSingleton() {}

~CManagerSingleton() {}
#endif
};
#endif

} // end namespace adios2

Expand Down
19 changes: 13 additions & 6 deletions testing/adios2/engine/bp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -107,22 +107,29 @@ bp_gtest_add_tests_helper(LargeMetadata MPI_ALLOW)
set(BP5LargeMeta "Engine.BP.BPLargeMetadata.BPWrite1D_LargeMetadata.BP5.Serial")

if ((NOT WIN32) AND ADIOS2_HAVE_SST)
# prototype for remote server testing
# (we don't really use SST here, just EVPath, but ADIOS2_HAVE_SST is the most relevant conditional)
macro(add_remote_tests_helper testname)
# prototype for remote server testing
# (we don't really use SST here, just EVPath, but ADIOS2_HAVE_SST is the most relevant conditional)

macro(add_get_remote_tests_helper testname)
add_test(NAME "Remote.BP${testname}.GetRemote" COMMAND Test.Engine.BP.${testname}.Serial bp5)
set_tests_properties(Remote.BP${testname}.GetRemote PROPERTIES FIXTURES_REQUIRED Server ENVIRONMENT "DoRemote=1")
endmacro()

macro(add_file_remote_tests_helper testname)
add_test(NAME "Remote.BP${testname}.FileRemote" COMMAND Test.Engine.BP.${testname}.Serial bp5)
set_tests_properties(Remote.BP${testname}.FileRemote PROPERTIES FIXTURES_REQUIRED Server ENVIRONMENT "DoFileRemote=1")
endmacro()

add_test(NAME remoteServerSetup COMMAND remote_server -background)
set_tests_properties(remoteServerSetup PROPERTIES FIXTURES_SETUP Server)

add_test(NAME remoteServerCleanup COMMAND remote_server -kill_server)
set_tests_properties(remoteServerCleanup PROPERTIES FIXTURES_CLEANUP Server)

#add remote tests below this line
add_remote_tests_helper(WriteReadADIOS2stdio)
add_remote_tests_helper(WriteMemorySelectionRead)
##### add remote tests below this line
add_get_remote_tests_helper(WriteReadADIOS2stdio)
add_get_remote_tests_helper(WriteMemorySelectionRead)
add_file_remote_tests_helper(WriteMemorySelectionRead)
endif()

if(ADIOS2_HAVE_MPI)
Expand Down