Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time #11520

Merged
merged 11 commits into from
Oct 3, 2024

Conversation

larry-safran
Copy link
Contributor

Hide behind environment variable GRPC_SERIALIZE_RETRIES

This is for testing with GMS core to see if they use the new PF with only 1 subchannel at a time trying to reconnect whether that eliminates their 6% data increase.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have. I'll need to look at this more later.

@@ -458,7 +538,8 @@ private SubchannelData createNewSubchannel(SocketAddress addr, Attributes attrs)
}

private boolean isPassComplete() {
if (addressIndex.isValid() || subchannels.size() < addressIndex.size()) {
if ((!serializingRetries && addressIndex.isValid())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to ignore whether the index is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we reset it for subchannel retry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still nagging at me. I feel like either we never need this condition or we need to replace it with something else for serializingRetries. I agree that we need this function to return true even though we've reset the index (for ++numTf).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, there would never be a time where that was true but the subchannels.size() < addressIndex.size() was false. We do have the firstPass variable, so I used that where this is called to skip the comparisons on the repeated checks once we've seen the end.

@larry-safran larry-safran force-pushed the new_pf_prob branch 2 times, most recently from 289ca19 to df8e78c Compare September 24, 2024 19:30
…if environment variable GRPC_SERIALIZE_RETRIES == true.

Cache serializingRetries value so that it doesn't have to look up the flag every time.

Clear the correct task when READY in processSubchannelState and move the logic to cancelScheduledTasks

Cleanup based on PR review

remove unneeded checks for shutdown.
@larry-safran
Copy link
Contributor Author

Ready for re-review

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have. This seems much less bug-prone

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the first failure for an address, I don't see how we start a second connection to the same address (after the backoff).

…t is disabled.

Remove an extra index.increment in LeafLB
Fix spelling, remove unneeded additions
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the nagging !serializingRetries && addressIndex.isValid()

@@ -458,7 +538,8 @@ private SubchannelData createNewSubchannel(SocketAddress addr, Attributes attrs)
}

private boolean isPassComplete() {
if (addressIndex.isValid() || subchannels.size() < addressIndex.size()) {
if ((!serializingRetries && addressIndex.isValid())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still nagging at me. I feel like either we never need this condition or we need to replace it with something else for serializingRetries. I agree that we need this function to return true even though we've reset the index (for ++numTf).

}
}

if (isPassComplete()) {
if (!firstPass || isPassComplete()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's at least add a comment that the !firstPass is an optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed it. Looping on a boolean comparison hopefully won't take too long.

@larry-safran larry-safran merged commit 9bb06af into grpc:master Oct 3, 2024
15 checks passed
@larry-safran larry-safran deleted the new_pf_prob branch October 3, 2024 00:04
kannanjgithub pushed a commit to kannanjgithub/grpc-java that referenced this pull request Oct 23, 2024
…rpc#11520)

* Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time if environment variable GRPC_SERIALIZE_RETRIES == true.

Cache serializingRetries value so that it doesn't have to look up the flag every time.

Clear the correct task when READY in processSubchannelState and move the logic to cancelScheduledTasks

Cleanup based on PR review

remove unneeded checks for shutdown.

* Fix previously broken tests

* Shutdown previous subchannel when run off end of index.

* Provide option to disable subchannel retries to let PFLeafLB take control of retries.

* InternalSubchannel internally goes to IDLE when sees TF when reconnect is disabled.
Remove an extra index.increment in LeafLB
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants