Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balancer: automatically stop producers on subchannel state changes #7663

Merged
merged 6 commits into from
Sep 30, 2024

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Sep 23, 2024

This change tweaks the Producer API as follows:

  1. Any active Producers on a SubConn automatically have their close() method called on every state change.
  2. GetOrBuildProducer is documented as only to be used on a READY SubConn. Using it on a non-READY SubConn won't error, but isn't supported.

I believe this fixes any possible remaining races with the Producer interface. #7651 fixed the issue of producers making calls before a subchannel went READY, but it was still possible for listeners to be invoked with updates after a subchannel disconnected and went to IDLE. Because the channel now calls close() on the producer before delivering any state change update, and because close is synchronous (or should be, according to the documentation), it should be impossible for the listener to receive a call after the subchannel disconnects.

RELEASE NOTES:

  • orca (experimental): if using an ORCA listener, it must now be registered only on a READY SubConn, and the listener will automatically be stopped when the connection is lost.

@dfawley dfawley added this to the 1.68 Release milestone Sep 23, 2024
@dfawley dfawley requested a review from arjan-bal September 23, 2024 15:18

func (w *weightedSubConn) updateORCAListener(cfg *lbConfig) {
if w.stopORCAListener != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the subchannel now takes care of closing the producer, do you think it makes sense to simplify things here by getting rid of the stopORCAListener field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I wanted to do that, but we still need to restart the listener when the config changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these tests no longer required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no more races like these since we don't support starting a producer before READY.

I did think of an interesting test case, which I will add (if I can remember it again :P).

Copy link

codecov bot commented Sep 24, 2024

Codecov Report

Attention: Patch coverage is 97.36842% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.94%. Comparing base (8ea3460) to head (a23eb96).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
balancer/weightedroundrobin/balancer.go 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7663      +/-   ##
==========================================
+ Coverage   81.86%   81.94%   +0.08%     
==========================================
  Files         361      361              
  Lines       27821    27818       -3     
==========================================
+ Hits        22775    22796      +21     
+ Misses       3847     3837      -10     
+ Partials     1199     1185      -14     
Files with missing lines Coverage Δ
balancer/balancer.go 95.45% <ø> (ø)
balancer_wrapper.go 84.47% <100.00%> (-2.71%) ⬇️
clientconn.go 92.97% <100.00%> (-0.16%) ⬇️
orca/producer.go 96.42% <100.00%> (+0.22%) ⬆️
balancer/weightedroundrobin/balancer.go 80.58% <87.50%> (+0.07%) ⬆️

... and 28 files with indirect coverage changes

@dfawley dfawley added Type: Behavior Change Behavior changes not categorized as bugs and removed Type: Bug labels Sep 24, 2024
clientconn.go Outdated
Comment on lines 1505 to 1507
// getTransport waits until the addrconn is ready and returns the transport.
// If the context expires first, returns an appropriate status. If the
// addrConn is stopped first, returns an Unavailable status error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs updating now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do one better and just delete the function.

@easwars easwars assigned dfawley and unassigned easwars Sep 26, 2024
@dfawley dfawley removed their assignment Sep 27, 2024
@dfawley dfawley requested a review from arjan-bal September 27, 2024 20:21
Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Sep 30, 2024
@dfawley dfawley merged commit ca4865d into grpc:master Sep 30, 2024
14 checks passed
@dfawley dfawley deleted the producer-change branch September 30, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants