Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds/federation: update xdsclient to support multi authority #5042

Merged

Conversation

menghanl
Copy link
Contributor

@menghanl menghanl commented Dec 6, 2021

  • xdsclient
    • added an authority struct
      • consists of controller (the xds stream) and an pubsub (cache for resources received from this authority)
    • an authority is created when the first watch is called on this authority, and deleted when the last watch is canceled
      • authorities are ref-counted
      • authorities are kept in a map[server-config]authority, note that key is the config, not authority name, so that two authoritie with different names but same config will share the xds stream and pubsub
      • authority is kept in a cache after deletion, and only actually deleted after a timeout, so it can be revived if a new watch is started
    • watch functions find (or build) the authority first, then call watch on this authority
      • they also handle ref/unref of the authority
    • dump (for CSDS) is updated to dump resources from all the authorities
  • tests
    • update handler is only available after the watch is started, so many tests need updating
    • make many LDS/RDS/CDS/EDS tests share code, and also added a federation version
  • resource handling (when unmarshalling receive xds resources) parses the resource name first, and then turns it back to a string, to canonicalize (especially the context parameters)

RELEASE NOTES: N/A

@menghanl
Copy link
Contributor Author

menghanl commented Dec 6, 2021

This PR has changes from #4991 and #5037

Many testing changes are caused by refactoring (moving code from TestLDS.../TestRDS.../TestCDS.../TestEDS... to another file and share code)

@menghanl menghanl added the Type: Feature New features or improvements in behavior label Dec 6, 2021
@menghanl menghanl added this to the 1.44 Release milestone Dec 6, 2021
@menghanl menghanl force-pushed the xds_client_federation_multi_controller branch from 9ab1734 to fd4a99f Compare December 8, 2021 18:56
- added an authority struct
  - consists of controller (the xds stream) and an pubsub (cache for resources received from this authority)

- an authority is created when the first watch is called on this authority, and deleted when the last watch is canceled
  - authorities are ref-counted
  - authorities are kept in a `map[server-config]authority`, note that key is the config, not authority name, so that two authoritie with different names but same config will share the xds stream and pubsub
  - authority is kept in a cache after deletion, and only actually deleted after a timeout, so it can be revived if a new watch is started

- watch functions find (or build) the authority first, then call watch on this authority
  - they also handle ref/unref of the authority

- dump (for CSDS) is updated to dump resources from all the authorities

- resource handling (when unmarshalling receive xds resources) parses the resource name first, and then turns it back to a string, to canonicalize (especially the context parameters)

- tests
  - update handler is only available after the watch is started, so many tests need updating
  - make many LDS/RDS/CDS/EDS tests share code, and also added a federation version
@menghanl menghanl force-pushed the xds_client_federation_multi_controller branch from fd4a99f to 026827e Compare December 8, 2021 20:09
@menghanl menghanl requested a review from easwars December 8, 2021 20:09
xds/internal/xdsclient/client.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/client.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/client.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/dump.go Show resolved Hide resolved
xds/internal/xdsclient/loadreport.go Outdated Show resolved Hide resolved
@easwars easwars assigned menghanl and unassigned easwars Dec 8, 2021
Copy link
Contributor Author

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. PTAL.

xds/internal/xdsclient/client.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/client.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/dump.go Show resolved Hide resolved
xds/internal/xdsclient/loadreport.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/client.go Outdated Show resolved Hide resolved
@menghanl menghanl assigned easwars and unassigned menghanl Dec 9, 2021
xds/internal/xdsclient/authority.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/authority.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/authority.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/authority.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/authority.go Show resolved Hide resolved
xds/internal/xdsclient/authority.go Show resolved Hide resolved
@easwars easwars assigned menghanl and unassigned easwars Dec 10, 2021
@menghanl menghanl assigned easwars and unassigned menghanl Dec 13, 2021
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly OK to me. Just need to take another look at the tests.

xds/internal/xdsclient/authority.go Show resolved Hide resolved
xds/internal/xdsclient/authority.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/authority.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/loadreport.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/watchers.go Outdated Show resolved Hide resolved
@easwars easwars assigned menghanl and unassigned easwars Dec 13, 2021
@menghanl menghanl assigned easwars and unassigned menghanl Dec 14, 2021
xds/internal/xdsclient/authority_test.go Show resolved Hide resolved
xds/internal/xdsclient/authority_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/authority_test.go Outdated Show resolved Hide resolved
Comment on lines 77 to 94
// Clear the item in the watch channel, otherwise the next watch will block.
authority := xdsresource.ParseName(resourceName).Authority
var config *bootstrap.ServerConfig
if authority == "" {
config = client.config.XDSServer
} else {
authConfig, ok := client.config.Authorities[authority]
if !ok {
t.Fatalf("failed to find authority %q", authority)
}
config = authConfig.XDSServer
}
a := client.authorities[config.String()]
if a == nil {
t.Fatalf("authority for %q is not created", authority)
}
ctrlTemp := a.controller.(*testController)
ctrlTemp.addWatches[xdsresource.ClusterResource].ReceiveOrFail()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow why we need to do this in the test? When/why will there be an item in the watch channel already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to clear the channel so the next watch on this controller doesn't block.

}
t.Cleanup(client.Close)

ctrl, ok, _ := watchAndFetchNewController(t, client, testCDSName, ctrlCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not calling the returned cancel func here and in many other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that cancel just sends an item to a cancel channel. So I skipped that if there's no check for that item.

xds/internal/xdsclient/authority_test.go Outdated Show resolved Hide resolved

// TestAuthorityRevive covers that the authorities in the idle cache is revived
// when a new watch is started on this authority.
func (s) TestAuthorityRevive(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this test be combined with the idleness test, because they are mostly doing the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they are actually testing quite different things.
idleness is testing ref counting
This is testing getting items back from the timeout cache.

xds/internal/xdsclient/watchers_test.go Show resolved Hide resolved
xds/internal/xdsclient/watchers_test.go Outdated Show resolved Hide resolved
//
// Note that this function checks for the controller after watching. This is the
// new behavior after federation.
func testWatchSetup(ctx context.Context, t *testing.T, typ xdsresource.ResourceType, resourceName string, overrideWatchExpiryTimeout bool) (_ *clientImpl, _ *testController, _ pubsub.UpdateHandler, updateCh *testutils.Channel, cancelWatch func()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we are sacrificing test code readability for test code brevity. This function does a lot of things, and not all callers use all the return values. Do you think we can simplify things at the code of writing repetitive test code?

See: https://engdoc.corp.google.com/eng/doc/tott/episodes/598.md?cl=head

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Federation made this util function more complex, because now controller is only available after a watch.

But the code doesn't look that cleaner even after splitting... There are just too many steps.

@easwars easwars assigned menghanl and unassigned easwars Dec 14, 2021
@menghanl menghanl assigned easwars and unassigned menghanl Dec 15, 2021
@easwars easwars assigned menghanl and unassigned easwars Dec 21, 2021
@menghanl menghanl merged commit afded72 into grpc:master Jan 4, 2022
@menghanl menghanl deleted the xds_client_federation_multi_controller branch January 4, 2022 23:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants