You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ifpoolDialer, ok:=client.dialer.(poolDialer); ok {
c, err=poolDialer.dialPool(ctx, tablet)
iferr!=nil {
returnnil, err
}
}
Which means it uses cached tablet manager client connections.
Say the replica tablet goes down (e.g. in a k8s cluster the pod goes down).
There is nothing to remove the cached client connection.
Say a new tablet comes up, potentially in a different shard/keyspace/cluster, with the same IP (as happens in k8s)
The cached connection now attempts to reconnect to the new tablet. This is the connection leak.
There needs to be a cleanup/invalidation mechanism where cached connections are dropped upon gRPC error (or by whatever other indication).
Some function calls can be expected to return error codes because they are invoked by users (ExecuteFetchAsApp). Other would only return error codes on an internal or network problem (CheckThrottler, FullStatus). These will likely benefit a separate handling logic.
Side discussion
To be discussed in a spin-off issue, the current mechanism is using misleading terminology. concurrency and pool are inappropriate terms for the mechanism. While the cache holds, say, 8 cached connection to a tablet, these 8 connections are shared without limit among as many users as asked for, without any concurrency limitation. A single connection can be held by, say, a dozen users, each invoking any queries without locking. This is fine, but it's not a "pool", and "concurrency" is incorrect.
Reproduction Steps
Binary Version
v19
Operating System and Environment details
-
Log Fragments
No response
The text was updated successfully, but these errors were encountered:
Overview of the Issue
This piece of code in tablet manager client causes connection leaks:
vitess/go/vt/vttablet/grpctmclient/client.go
Lines 163 to 170 in 8833092
The scenario is:
CheckThrottler
on one of the replicas.CheckThrottler
uses a poolDialer:vitess/go/vt/vttablet/grpctmclient/client.go
Lines 1105 to 1110 in 8833092
Which means it uses cached tablet manager client connections.
k8s
cluster the pod goes down).k8s
)There needs to be a cleanup/invalidation mechanism where cached connections are dropped upon gRPC error (or by whatever other indication).
Some function calls can be expected to return error codes because they are invoked by users (
ExecuteFetchAsApp
). Other would only return error codes on an internal or network problem (CheckThrottler
,FullStatus
). These will likely benefit a separate handling logic.Side discussion
To be discussed in a spin-off issue, the current mechanism is using misleading terminology.
concurrency
andpool
are inappropriate terms for the mechanism. While the cache holds, say, 8 cached connection to a tablet, these 8 connections are shared without limit among as many users as asked for, without any concurrency limitation. A single connection can be held by, say, a dozen users, each invoking any queries without locking. This is fine, but it's not a "pool", and "concurrency" is incorrect.Reproduction Steps
Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: