Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leakage of context errors can tear down a service #77

Closed
erincandescent opened this issue Nov 6, 2024 · 2 comments
Closed

Leakage of context errors can tear down a service #77

erincandescent opened this issue Nov 6, 2024 · 2 comments

Comments

@erincandescent
Copy link

This is an unexpected behaviour which we discovered:

  • Service is doing an action with a timeout
  • That timeout happens and a function returns context.DeadlineExceeded
  • That got bubbled up to Serve()'s return value
  • The supervisor then treated that context.DeadlineExceeded as a service quit

I think the reasoning behind this is to handle graceful shutdown when a service stops because of a supervisor request; but the rest of the time it's surprising behaviour and a bit of a footgun.

I wonder if Suture should only treat those errors as service terminations if the service's context is canceled?

For our part we're probably going to work around this with some wrapper logic along the lines of

func (s *service) Serve(ctx context.Context) error {
	err := s.Service.Serve(ctx)
	if ctxErr := ctx.Err(); ctxErr != nil {
		return context.Cause(ctx)
	}
	if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
		// Mask context errors which didn't come from supervisor shutdown
		return fmt.Errorf("service failed with context error: %v", err)
	}
	return err
}
@thejerf
Copy link
Owner

thejerf commented Nov 6, 2024

Hmm, yes, that is an interesting one.

I think you're right that the semantics are such that we only care about cancellation or timeout if it's our cancellation or timeout... I'll have to think on that a bit but I think that some variant on checking the context for an error rather than assuming it is being returned is correct.

@thejerf
Copy link
Owner

thejerf commented Nov 29, 2024

Thank you for this report. I seem to have screwed up the issue connection in GitHub here, but this is now fixed in the latest commit. I believe your wrapper should no longer be necessary, but it also shouldn't hurt anything, for maximum upgrade flexibility.

@thejerf thejerf closed this as completed Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants