You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We only add executors in RapidsShuffleHeartbeatManager but we never remove them.
We need to keep track of the last time we saw a heartbeat, and as we get more registrations check if any executors are stale, and not include in executor update messages. We don't need to tell already registered executors about the loss of a peer, because UCX should be able to figure that out on its own.
We also need to update the logic on how the "new executors" for a peer is decided. Currently it's based on registration order, and it will need to be time based I believe.
The text was updated successfully, but these errors were encountered:
We only add executors in
RapidsShuffleHeartbeatManager
but we never remove them.We need to keep track of the last time we saw a heartbeat, and as we get more registrations check if any executors are stale, and not include in executor update messages. We don't need to tell already registered executors about the loss of a peer, because UCX should be able to figure that out on its own.
We also need to update the logic on how the "new executors" for a peer is decided. Currently it's based on registration order, and it will need to be time based I believe.
The text was updated successfully, but these errors were encountered: