You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's take a look at the death of a nsmgr case: nsc --> nsmgr --> fwd--> nse
Steps:
nsc calls request on nsmgr
All fine we got connection.
nsmgr died.
nsc starts healing, calls close: Close() : nsc --->x (died) nsmgr --fwd -- nse Close()won't reach nsmgr and resource releasing will be interrupted.
Call Request() from heal: heal Request() : nsc --> nsmgr(new) --> (interface name collision)fwd -- nse
Actual:
got interface name collision on fwd1 side.
Expected:
all fine step 5 complete successful.
Possible solutions
1. Removing collisions by duplicating
In this case we should do:
Interface duplicates. If we already have interface with the same name (for example kernel on NSC, because we didn't remove it on Close() inside forwarder) we can create interface named name + some suffix
Use /30 mask in integration-tests and use new IPs to check new connection (after heal)
So, we believe here that the connection after healing is a completely new connection.
Estimation: most likely, we can do it before the release
2. Improve heal (can't be done before release)
In this case we can start thinking in directions
Call Close() after restart. For example if nsmgr restarted, we can try to get all connections from forwarders.
Think about the approach that was in the previous implementation - each element monitors the others (not just NSC)
Estimation: we can't do it before release
The text was updated successfully, but these errors were encountered:
Description
Currently we don't release all resources when
heal
starts.We call
Close()
beforeheal
Request
but it doesn't go through all of thePath
.https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/begin/event_factory.go#L98
Use-case:
Let's take a look at the death of a nsmgr case:
nsc --> nsmgr --> fwd--> nse
Steps:
Close() : nsc --->x (died) nsmgr --fwd -- nse
Close()
won't reachnsmgr
and resource releasing will be interrupted.heal Request() : nsc --> nsmgr(new) --> (interface name collision)fwd -- nse
Actual:
got interface name collision on fwd1 side.
Expected:
all fine step 5 complete successful.
Possible solutions
1. Removing collisions by duplicating
In this case we should do:
name
(for example kernel on NSC, because we didn't remove it onClose()
inside forwarder) we can create interface namedname + some suffix
/30
mask in integration-tests and use new IPs to check new connection (after heal)So, we believe here that the connection after healing is a completely new connection.
Estimation: most likely, we can do it before the release
2. Improve heal (can't be done before release)
In this case we can start thinking in directions
Close()
after restart. For example ifnsmgr
restarted, we can try to get all connections from forwarders.NSC
)Estimation: we can't do it before release
The text was updated successfully, but these errors were encountered: