Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientConn: fix Dial using grpc.WithTimeout() #2737

Merged
merged 2 commits into from
Apr 4, 2019

Conversation

daviddrysdale
Copy link
Contributor

Fixes #2736

Commit 955eb8a ("channelz: cleanup channel registration if Dial
fails (grpc#2733)") moved a defer block earlier in DialContext() to
ensure that cc.Close() was always called.  This defer block also
checks whether the ctx.Done() is true, and if so ensures the context
error is returned.

If the dial options include a timeout, the original context gets
replaced with a new context that has the timeout, and this gets
a catchall `defer cancel()` to go with it.

However, this cancel() now gets called before the cleanup defer
block, so when the latter runs the context is always already cancelled.

Fix by splitting the larger defer block into two parts:
 - The part that does cc.Close() stays near the beginning of the
   method.
 - The part that checks ctx.Done() returns to below the `defer cancel()`
   call, and so gets invoked before it.
@thelinuxfoundation
Copy link

Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement.

After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.

Regards,
CLA GitHub bot

@daviddrysdale
Copy link
Contributor Author

CNCF account created as a Google employee.

r.InitialState(resolver.State{Addresses: []resolver.Address{lisAddr}})
client, err := Dial(r.Scheme()+":///test.server", WithInsecure(), WithTimeout(5*time.Second))
close(dialDone)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still working on this test? It doesn't seem to test the error behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, not really :-). I got it just far enough so it fails with the DialContext timeout, but I don't know what else would be worth testing after that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I see what it's testing.

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! LGTM.

r.InitialState(resolver.State{Addresses: []resolver.Address{lisAddr}})
client, err := Dial(r.Scheme()+":///test.server", WithInsecure(), WithTimeout(5*time.Second))
close(dialDone)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I see what it's testing.

@menghanl menghanl merged commit d37bd82 into grpc:master Apr 4, 2019
@menghanl menghanl changed the title Fix DialContext when using a timeout ClientConn: fix Dial using grpc.WithTimeout() Apr 4, 2019
@daviddrysdale daviddrysdale deleted the deadline-dial branch April 4, 2019 17:54
@lock lock bot locked as resolved and limited conversation to collaborators Oct 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClientConn: WithTimeout() causes Dial error to be always Canceled
4 participants