-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[qfix] Add refresh retries on failure #1081
[qfix] Add refresh retries on failure #1081
Conversation
Signed-off-by: Vladimir Popov <[email protected]>
cb85233
to
41ae4ca
Compare
continue | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why retry instead of letting the next refresh catch it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... perhaps I'm simply confused... but this looks to me like it will call the refresh once, that will cancel the context (see line 72 above) and then it will return based on line 86 ... meaning... I don't see what this does for us at all. I can see some of the other fixes in the PR... but this just introduces complexity I don't think will ever be used...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait... I see.. the context isn't canceled if our call to the downstream gets an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... this is clever... it will tick at the roughly 1/3 marks even if we get an error... so not overwhelming... but keeps trying.
} | ||
}(cancelCtx, afterCh) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pass the variables to the go routine explicitly, rather than by closure when possible, it precludes a variety of potential issues and bugs.
@@ -74,17 +75,25 @@ func (t *refreshClient) Request(ctx context.Context, request *networkservice.Net | |||
store(ctx, metadata.IsClient(t), cancel) | |||
|
|||
eventFactory := begin.FromContext(ctx) | |||
timeClock := clock.FromContext(ctx) | |||
clockTime := clock.FromContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like 'clockTime' better than 'timeClock'
@Bolodya1997 Could you also remind why is this critical to have retries for refresh? |
continue | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait... I see.. the context isn't canceled if our call to the downstream gets an error.
continue | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... this is clever... it will tick at the roughly 1/3 marks even if we get an error... so not overwhelming... but keeps trying.
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
…k@main PR link: networkservicemesh/sdk#1081 Commit: c7b15ba Author: Vladimir Popov Date: 2021-09-21 03:26:12 +0700 Message: - Add refresh retries on failure (#1081) Signed-off-by: NSMBot <[email protected]>
@Bolodya1997 One question... and maybe I'm just tired, but how does the Refresh terminate? Refresh is only for running in the real clients (not the passthroughs)... so the real clients never hit a 'timeout'... it feels like we need a termination clause in your loop that isn't just the cancel, something where we don't try past the point the client side runs out of runway on its own timeout... thoughts? |
It is terminated only on passthrough clients with timeout. Probably we can add some timeout on client so it will close the chain if it fails to send a Request during some time? |
Passthrough clients don't originate refreshed :) Only 'leaf' clients originate refresh :) |
@edwarnicke |
Description
Add refresh retries on failure.
How Has This Been Tested?
Types of changes