Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heal: release resources before healing #1166

Closed
glazychev-art opened this issue Nov 18, 2021 · 4 comments
Closed

Heal: release resources before healing #1166

glazychev-art opened this issue Nov 18, 2021 · 4 comments
Assignees
Labels
ASAP The issue that blocking SOW items or core use-cases

Comments

@glazychev-art
Copy link
Contributor

glazychev-art commented Nov 18, 2021

Description

Currently we don't release all resources when heal starts.
We call Close() before heal Request but it doesn't go through all of the Path.
https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/begin/event_factory.go#L98

Use-case:

Let's take a look at the death of a nsmgr case:
nsc --> nsmgr --> fwd--> nse

Steps:

  1. nsc calls request on nsmgr
  2. All fine we got connection.
  3. nsmgr died.
  4. nsc starts healing, calls close:
    Close() : nsc --->x (died) nsmgr --fwd -- nse
    Close() won't reach nsmgr and resource releasing will be interrupted.
  5. Call Request() from heal:
    heal Request() : nsc --> nsmgr(new) --> (interface name collision)fwd -- nse

Actual:
got interface name collision on fwd1 side.

Expected:
all fine step 5 complete successful.

Possible solutions

1. Removing collisions by duplicating

In this case we should do:

  1. Interface duplicates. If we already have interface with the same name (for example kernel on NSC, because we didn't remove it on Close() inside forwarder) we can create interface named name + some suffix
  2. Use /30 mask in integration-tests and use new IPs to check new connection (after heal)
    So, we believe here that the connection after healing is a completely new connection.
    Estimation: most likely, we can do it before the release

2. Improve heal (can't be done before release)

In this case we can start thinking in directions

  1. Call Close() after restart. For example if nsmgr restarted, we can try to get all connections from forwarders.
  2. Think about the approach that was in the previous implementation - each element monitors the others (not just NSC)
    Estimation: we can't do it before release
@denis-tingaikin
Copy link
Member

@glazychev-art

as I can see:

option 1 can be done before release.
option 2 can't be done before release.

Is this correct?

@glazychev-art
Copy link
Contributor Author

@denis-tingaikin
Yes

@denis-tingaikin denis-tingaikin added the ASAP The issue that blocking SOW items or core use-cases label Nov 18, 2021
@edwarnicke
Copy link
Member

@glazychev-art Why doesn't the forwarder recognize the connection id and treat it as a refresh?

@denis-tingaikin denis-tingaikin moved this from Todo to In Progress in Release 1.1.0 Nov 19, 2021
@denis-tingaikin denis-tingaikin moved this from In Progress to Potentially fixed in Release 1.1.0 Nov 19, 2021
@denis-tingaikin denis-tingaikin moved this from Potentially fixed to Done in Release 1.1.0 Nov 22, 2021
@denis-tingaikin
Copy link
Member

Fixed by #1167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASAP The issue that blocking SOW items or core use-cases
Projects
Status: Done
Development

No branches or pull requests

3 participants