-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated poolDialer logic for VTOrc, throttler #15562
Conversation
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Added some unit testing. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15562 +/- ##
==========================================
+ Coverage 65.73% 65.77% +0.03%
==========================================
Files 1560 1561 +1
Lines 194595 194755 +160
==========================================
+ Hits 127921 128102 +181
+ Misses 66674 66653 -21 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go for this one yeah, and see separately how we can fix up those other cases. I suspect we're probably good enough for a single multiplexed connection in the end anyway?
I believe so. Certainly for the throttler and for vtorc. For the rest, more experimentation needed. |
Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this implementation!
invalidator := func(err error) { | ||
client.mu.Lock() | ||
defer client.mu.Unlock() | ||
m[addr].cc.Close() | ||
delete(m, addr) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question. We take in the error here, but don't check for the error to be a connection error, or even if it's nil. Won't this cause us to invalidate the connections after every call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we call invalidator only when there is an error, passing of error should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on not checking for err == nil
! Updated.
Re: connection error, we've already established last week that it's very difficult to distinguish between gRPC errors, TCP errors, vitess errors, MySQL errors etc., because the call stack is such that errors are wrapped up as SQLErr
including gRPC ones. We therefore invalidate on any error.
Mind you that in this PR we only create an invalidation function for FullStatus
and for CheckThrottler
, where SQL errors are highly unexpected, as opposed to ExecuteFetch*
where such errors are trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I further updated to reflect @harshit-gangal 's comment. Calling invalidator()
only on error, and now not even passing the error argument.
Signed-off-by: Shlomi Noach <[email protected]>
if dialPoolGroup != DialPoolGroupDefault { | ||
client.mu.Lock() | ||
defer client.mu.Unlock() | ||
if client.rpcDialPoolMap == nil { | ||
client.rpcDialPoolMap = make(map[DialPoolGroup]addrTmcMap) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: default and non-default looks like a different code path,
can this be changed to a switch condition and separate method calls for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: default and non-default looks like a different code path,
I've now split into dialPool
(near identical the original function) and dialDedicatedPool
(new behavior). Can you please take another look?
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@@ -1101,9 +1158,10 @@ func (client *Client) Backup(ctx context.Context, tablet *topodatapb.Tablet, req | |||
// and dialing the other tablet every time is not practical. | |||
func (client *Client) CheckThrottler(ctx context.Context, tablet *topodatapb.Tablet, req *tabletmanagerdatapb.CheckThrottlerRequest) (*tabletmanagerdatapb.CheckThrottlerResponse, error) { | |||
var c tabletmanagerservicepb.TabletManagerClient | |||
var invalidator invalidatorFunc | |||
var err error | |||
if poolDialer, ok := client.dialer.(poolDialer); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder in what scenario is the dialer
not a poolDialer
?
if invalidator != nil { | ||
invalidator() | ||
} | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this will not be nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nil
, if this condition fails:
if poolDialer, ok := client.dialer.(poolDialer); ok
Now, I'm not sure how it could possibly fail, but as golang goes, I created var invalidator invalidatorFunc
and it could be nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like, I can pre-assign var invalidator
to an empty function? That way it will never be nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kinda returns to this comment: #14979 (comment)
@@ -1120,6 +1178,9 @@ func (client *Client) CheckThrottler(ctx context.Context, tablet *topodatapb.Tab | |||
|
|||
response, err := c.CheckThrottler(ctx, req) | |||
if err != nil { | |||
if invalidator != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
Merging for now. I think we still need a refactor to the pooDialer code, and that's where we can address casting, nil, naming, etc. issues. |
This is a connection leak bugfix, and as such, should be backported to v19. |
… (#15567) Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Shlomi Noach <[email protected]>
Description
Did I say #15560 was the final alternative? This is final 2.0. So #15560 ran into a couple problems, seemingly locking. I'm investing far too much time in this problem, so this PR, as per @harshit-gangal 's suggestion, keeps the existing pools completely untouched, and adds a new mechanism for the non-standard pool dialers (ie
vtorc
andthrottler
). These two non-standard dialers only have a singletmc
hence a single cached connection. For any error whatsoever received inFullStatus
or inCheckThrottler
, the connection gets invalidated (it's too aggressive, but better than existing situation).Related Issue(s)
poolDialer
#15563vtorc
and to throttler functions, no change in "concurrency" #15560, WIP: dedicated pool dialers tovtorc
and to throttler functions, removing "concurrency" #15559, WIP:TabletManagerClient
real pool dialer #15541, Fix for FullStatus gRPC connection pooling #15520Checklist
Deployment Notes