Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/sqlproxyccl: update connection tracker to track ServerAssignment instances #80446

Merged

Conversation

jaylim-crl
Copy link
Collaborator

@jaylim-crl jaylim-crl commented Apr 24, 2022

ccl/sqlproxyccl: introduce a new ServerAssignment struct

This commit introduces a new ServerAssignment struct that will be tracked by
the connection tracker through {register,unregister}Assignment. At the moment,
those functions do nothing. The plan is to update the connection tracker to
track these assignments instead of the forwarder instances. This helps in two
ways:

  1. We can now update the counts map the moment a SQL pod is assigned to an
    incoming request, rather than waiting until a forwarder has been created.
  2. TenantCache can be removed since we can easily maintain an up-to-date count
    map. (Previously, with the forwarder approach, we would need to keep that
    updated everytime we move the connection around. With this approach, once
    an assignment is handed out, the destination does not change. A transfer
    operation just requests for a new assignment.)

ccl/sqlproxyccl: update conn_tracker to track ServerAssignment instances

Previously, the connection tracker was tracking forwarder instances. This
commit updates that behavior to track ServerAssignment instances. At the same
time, the connection tracker is moved into the balancer, and is now owned by
it. The plan is to rename ConnTracker to tracker or assignmentTracker
to denote that this is internal to the balancer only. Finally, we remove the
destination field from rebalanceRequest, and clean up the tests.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jaylim-crl jaylim-crl force-pushed the jay/220422-track-serverassignment branch 4 times, most recently from f27a612 to c1ff423 Compare April 24, 2022 21:45
@jaylim-crl jaylim-crl marked this pull request as ready for review April 25, 2022 12:27
@jaylim-crl jaylim-crl requested review from a team as code owners April 25, 2022 12:27
@jaylim-crl jaylim-crl requested review from jeffswenson and andy-kimball and removed request for a team April 25, 2022 12:27
Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/ccl/sqlproxyccl/balancer/assignment.go Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)


pkg/ccl/sqlproxyccl/balancer/assignment.go line 55 at r5 (raw file):
I think we should stick to sync.Once for simplicity reasons. With sync.Once, it is clear what the code is doing, and we don't need to worry about multiple Close calls racing. Atomics would complicate the reasoning a little, and given that this is in the Close path, I don't think it's worth it, especially when this changes the semantics of .Close as well.

The downside is there is actually a subtle deadlock risk. If the Close callback somehow leads to calling Close again, then sync.Once will deadlock.

In terms of the subtle deadlock risk, I don't think we should worry about that now. We could add a note to closerFn to ensure that Close will never be called. And even if new code in the future results in a deadlock, the tests should catch this. I don't foresee us adding any more code to closerFn or unregisterAssignment.

This commit introduces a new ServerAssignment struct that will be tracked by
the connection tracker through {register,unregister}Assignment. At the moment,
those functions do nothing. The plan is to update the connection tracker to
track these assignments instead of the forwarder instances. This helps in two
ways:
1. We can now update the counts map the moment a SQL pod is assigned to an
   incoming request, rather than waiting until a forwarder has been created.
2. TenantCache can be removed since we can easily maintain an up-to-date count
   map. (Previously, with the forwarder approach, we would need to keep that
   updated everytime we move the connection around. With this approach, once
   an assignment is handed out, the destination does not change. A transfer
   operation just requests for a new assignment.)

Release note: None
Previously, the connection tracker was tracking forwarder instances. This
commit updates that behavior to track ServerAssignment instances. At the same
time, the connection tracker is moved into the balancer, and is now owned by
it. The plan is to rename `ConnTracker` to `tracker` or `assignmentTracker`
to denote that this is internal to the balancer only. Finally, we remove the
destination field from rebalanceRequest, and clean up the tests.

Release note: None
@jaylim-crl jaylim-crl force-pushed the jay/220422-track-serverassignment branch from c1ff423 to c89f5c9 Compare April 25, 2022 20:45
@jaylim-crl
Copy link
Collaborator Author

bors r=JeffSwenson

@craig
Copy link
Contributor

craig bot commented Apr 26, 2022

Build succeeded:

@craig craig bot merged commit de012eb into cockroachdb:master Apr 26, 2022
@jaylim-crl jaylim-crl deleted the jay/220422-track-serverassignment branch April 26, 2022 04:09
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Jun 2, 2022
In cockroachdb#80446, we updated the code to ensure that the connection gets cleaned up
whenever BackendDial fails. With the code as-is, there is a possibility where
conn becomes nil whenever sslOverlay returned an error. This would result in
a panic whenever the defer callback gets executed to close the connection
object.

This commit folds all the functions (i.e. sslOverlay and relayStartupMsg) into
BackendDial to avoid this panic issue altogether. That way, there is no
possibility where the connection object can be nil.

Release note: None
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Jun 2, 2022
In cockroachdb#80446, we updated the code to ensure that the connection gets cleaned up
whenever BackendDial fails. With the code as-is, there is a possibility where
conn becomes nil whenever sslOverlay returned an error. This would result in
a panic whenever the defer callback gets executed to close the connection
object.

This commit folds all the functions (i.e. sslOverlay and relayStartupMsg) into
BackendDial to avoid this panic issue altogether. That way, there is no
possibility where the connection object can be nil.

Release note: None
craig bot pushed a commit that referenced this pull request Jun 2, 2022
82320: ccl/sqlproxyccl: fix panic on nil conn when failing to dial SQL backend r=JeffSwenson a=jaylim-crl

In #80446, we updated the code to ensure that the connection gets cleaned up
whenever BackendDial fails. With the code as-is, there is a possibility where
conn becomes nil whenever sslOverlay returned an error. This would result in
a panic whenever the defer callback gets executed to close the connection
object.

This commit folds all the functions (i.e. sslOverlay and relayStartupMsg) into
BackendDial to avoid this panic issue altogether. That way, there is no
possibility where the connection object can be nil.

Release note: None

Co-authored-by: Jay <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Jun 2, 2022
In #80446, we updated the code to ensure that the connection gets cleaned up
whenever BackendDial fails. With the code as-is, there is a possibility where
conn becomes nil whenever sslOverlay returned an error. This would result in
a panic whenever the defer callback gets executed to close the connection
object.

This commit folds all the functions (i.e. sslOverlay and relayStartupMsg) into
BackendDial to avoid this panic issue altogether. That way, there is no
possibility where the connection object can be nil.

Release note: None
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Aug 10, 2022
… init

Related to cockroachdb#80446.

In cockroachdb#80446, we updated the connection tracker to track server assignments
instead of forwarders. This also meant that there is a possibility where we
can start transferring the connection before we even resumed the forwarder
for the first time, breaking the TransferConnection invariant where the
processors must be resumed before being called.

This commit fixes that issue by introducing a new isInitialized flag to the
forwarder, which will only get set to true once run returns. Attempting to
transfer a connection with isInitialized=false will return an error. This
should fix flakes that we've been seeing on CI.

Release note: None

Release justification: sqlproxy bug fix. This ensures that we don't resume
the processors mid connection transfer, causing unexpected issues on the
client's end. Note that this situation is rare since it involves ensuring
timely behavior of forwarder.Run and forwarder.TransferConnection at the same
time.
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Aug 11, 2022
… init

Related to cockroachdb#80446.

In cockroachdb#80446, we updated the connection tracker to track server assignments
instead of forwarders. This also meant that there is a possibility where we
can start transferring the connection before we even resumed the forwarder
for the first time, breaking the TransferConnection invariant where the
processors must be resumed before being called.

This commit fixes that issue by introducing a new isInitialized flag to the
forwarder, which will only get set to true once run returns. Attempting to
transfer a connection with isInitialized=false will return an error. This
should fix flakes that we've been seeing on CI.

Release note: None

Release justification: sqlproxy bug fix. This ensures that we don't resume
the processors mid connection transfer, causing unexpected issues on the
client's end. Note that this situation is rare since it involves ensuring
timely behavior of forwarder.Run and forwarder.TransferConnection at the same
time.
craig bot pushed a commit that referenced this pull request Aug 11, 2022
85769: sql, server: regulate access to remaining observability features r=Santamaura a=Santamaura

This change will control access to various observability
features based on system privileges including the following:
- admin ui databases/tables/schema endpoints requires admin or VIEWACTIVITY
- EXPERIMENTAL_AUDIT requires admin or MODIFYCLUSTERSETTING
- sql login requires not having NOSQLLOGIN or the equivalent
role option

Resolves: #83848, #83863, #83862

Release note (security update): Change requirements to access some
observability features. Databases/tables/schema endpoints for
admin ui require admin or VIEWACTIVITY. EXPERIMENTAL_AUDIT
requires admin or MODIFYCLUSTERSETTING. SQL login requires not
having NOSQLLOGIN or the equivalent role option.

85931: ccl/sqlproxyccl: ensure that connections cannot be transferred before initialization r=JeffSwenson a=jaylim-crl

Related to #80446.

In #80446, we updated the connection tracker to track server assignments
instead of forwarders. This also meant that there is a possibility where we
can start transferring the connection before we even resumed the forwarder
for the first time, breaking the TransferConnection invariant where the
processors must be resumed before being called.

This commit fixes that issue by introducing a new isInitialized flag to the
forwarder, which will only get set to true once run returns. Attempting to
transfer a connection with isInitialized=false will return an error. This
should fix flakes that we've been seeing on CI.

Release note: None

Release justification: sqlproxy bug fix. This ensures that we don't resume
the processors mid connection transfer, causing unexpected issues on the
client's end. Note that this situation is rare since it involves ensuring
timely behavior of forwarder.Run and forwarder.TransferConnection at the same
time.

Co-authored-by: Santamaura <[email protected]>
Co-authored-by: Jay <[email protected]>
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this pull request Aug 11, 2022
… init

Related to cockroachdb#80446.

In cockroachdb#80446, we updated the connection tracker to track server assignments
instead of forwarders. This also meant that there is a possibility where we
can start transferring the connection before we even resumed the forwarder
for the first time, breaking the TransferConnection invariant where the
processors must be resumed before being called.

This commit fixes that issue by introducing a new isInitialized flag to the
forwarder, which will only get set to true once run returns. Attempting to
transfer a connection with isInitialized=false will return an error. This
should fix flakes that we've been seeing on CI.

Release note: None

Release justification: sqlproxy bug fix. This ensures that we don't resume
the processors mid connection transfer, causing unexpected issues on the
client's end. Note that this situation is rare since it involves ensuring
timely behavior of forwarder.Run and forwarder.TransferConnection at the same
time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants