-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support FI_ADDR_GNIX enough for bootstrapping #7
Comments
What I have for FI_ADDR_GNIX for now is here. A filled in struct of gni_ep_name would be returned from The structure would contain the minimum needed to The tricky thing is the cmd id. This will need to be obtained Here's an idea I had for a nameservice for jobs launched
|
Couple questions on the name service stuff.
|
Good question, we could enhance ugni/kgni to have a scratchpad in the I'm pretty sure that owing to the need to hide native slurm from I have an account on tiger - still need to login in though - I thought tiger 2015-02-06 16:00 GMT-07:00 Sung-Eun Choi [email protected]:
|
fixed HPCX <=v1.9.7 support (#7) Signed-off-by: Sannikov, Alexander <[email protected]> Signed-off-by: Dmitry Gladkov <[email protected]>
Here is the deadlock scenario: #0 0x00007fed3a439495 in pthread_spin_lock () #1 0x00007fed37ad7cfd in fastlock_acquire () #2 0x00007fed37ad80a4 in psmx2_lock () #3 0x00007fed37ad8361 in psmx2_am_trx_ctxt_handler_ext () #4 0x00007fed37b084e7 in psmx2_am_trx_ctxt_handler_0 () #5 0x00007fed373c08c5 in self_am_short_request () #6 0x00007fed3739bf83 in __psm2_am_request_short () #7 0x00007fed37ad84ee in psmx2_trx_ctxt_disconnect_peers () A lock has been held in psmx2_trx_ctxt_disconnect_peers before psm2_am_request_short is called. While making progress inside this function, the execution is redirected to the AM handler due to the arrival of an incoming disconnection request. The AM handler tries to acquire the same lock that has already been held and reaches a deadlock. Fix by avoiding calling psm2_am_request_short while holding the lock. Signed-off-by: Jianxin Xiong <[email protected]>
No description provided.
The text was updated successfully, but these errors were encountered: