Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[delta-xds] do not populate resource_names_subscribe with existing resources on reconnection for "wildcard" XDSes #16063

Closed
stevenzzzz opened this issue Apr 19, 2021 · 14 comments · Fixed by #16153

Comments

@stevenzzzz
Copy link
Contributor

Title: Since LDS/CDS/SRDS are wildcard XDSes, we should not populate the resource_names_subscribe with existing resources.

Description:
On reconnection, DeltaXDS packs the already subscribed resources into resource_names_subscribe, which is used to mean "new subscription interests", this is confusing on the management server side since for "wildcard XDSes"( CDS/LDS/SRDS) it expects an empty resource list

https://github.com/envoyproxy/envoy/blob/main/source/common/config/delta_subscription_state.cc#L169-L182

We should also revisit this decision of "packing existing resources into resource_names_subscribe on reconnection" since
the "interests of these resources" is already expressed via initial_resource_versions.

@stevenzzzz stevenzzzz added bug triage Issue requires triage labels Apr 19, 2021
@stevenzzzz
Copy link
Contributor Author

@htuch
Copy link
Member

htuch commented Apr 19, 2021

I agree, let's skip the subscribe.

@stevenzzzz
Copy link
Contributor Author

@antoniovicente @ahedberg

@htuch
Copy link
Member

htuch commented Apr 19, 2021

The one thing to be wary of is whether anyone might be relying on this behavior today. Is Delta xDS at the point that we can't break backwards compat?

@stevenzzzz
Copy link
Contributor Author

I dont believe so.
I don't see the point of having the "already subscribed" resources in both lists, it should just work the same if we just use initial_resource_versions to include "resources that has versions", which tells management server the same info: please send updates of these resources to me.

@adisuissa
Copy link
Contributor

Should this be wrapped with a runtime configurable flag?

@antoniovicente
Copy link
Contributor

The one thing to be wary of is whether anyone might be relying on this behavior today. Is Delta xDS at the point that we can't break backwards compat?

Is the current implementation adding a subscription for both wildcard and specific resources on reconnect? Or only for specific resources?

It seems to me that if the intent is to subscribe for wildcard, envoy should send wildcard subscribe requests. I don't know how backwards compatibility factors into this.

@asraa asraa added area/xds and removed triage Issue requires triage labels Apr 20, 2021
@markdroth
Copy link
Contributor

FWIW, I agree that the client should populate resource_names_subscribe for the first message on a reestablished stream exactly the same way as if it were not a reestablished stream (i.e., as if it were the very first stream created after the client started up). If it's a wildcard subscription, resource_names_subscribe should be empty; if it's a non-wildcard subscription, resource_names_subscribe should be set to the list of names being subscribed to.

I think the value of initial_resource_versions is mostly orthogonal to resource_names_subscribe. It's used to tell the server what is already present in the client's cache, so that the server can (a) prevent resending resources that the client already has and (b) tell the client to remove a resource that it currently has that has subsequently been removed, but it should not be used to control what resources the client is actually subscribing to.

See also related discussion in #15857 (comment) about the possible need for wildcard and non-wildcard subscriptions to coexist in the same stream.

@stevenzzzz
Copy link
Contributor Author

I think we are on the same page here.

IIUC, resource_names_subscribe field is for "newly added not served " resources only.
On reconnection, for the management server to rebuild the status of the client, initial_resource_versions is used. I think it's reasonable to not send "already served" resources, which are already included in the "initial_resource_versions", via the "resource_names_subscribe" field, for any XDS.

@markdroth
Copy link
Contributor

I think it's reasonable to not send "already served" resources, which are already included in the "initial_resource_versions", via the "resource_names_subscribe" field, for any XDS.

I don't think we can do that. The entries in resource_names_subscribe are resource locators, each of which may match multiple resources (e.g., as a wildcard subscription in the current naming scheme, or via a glob collection or flexible context parameter matching in the new naming scheme). However, the entries in initial_resource_versions are resource names, each associated with a specific individual resource. The server cannot determine the set of resource locators that the client is subscribed to by looking at the set of specific resources in its cache.

I think the way to think about this is that the set of resource locators being subscribed to by the client is not something that survives a stream restart; it is purely local to each stream, so it must be completely reestablished via resource_names_subscribed when a new stream is created. The initial_resource_versions field is an optimization that allows the server to not resend individual resources that match the resource locators that the client is subscribed to after the stream is reestablished if those resources have not changed, but that optimization does not help the client inform the server what resource locators it is actually subscribing to in the first place.

@htuch
Copy link
Member

htuch commented Apr 25, 2021

I'm aligned with presenting resource_names_subscribe on a re-established stream in the same way its done on the initial stream; do we just need to clarify somewhere the assumption of initial_resources_version being only a hint?

@adisuissa
Copy link
Contributor

do we just need to clarify somewhere the assumption of initial_resources_version being only a hint?
I think it is described in the xds protocol and in the proto description of the field:

// Informs the server of the versions of the resources the xDS client knows of, to enable the
// client to continue the same logical xDS session even in the face of gRPC stream reconnection.
// It will not be populated: [1] in the very first stream of a session, since the client will
// not yet have any resources, [2] in any message after the first in a stream (for a given
// type_url), since the server will already be correctly tracking the client's state.
// (In ADS, the first message *of each type_url* of a reconnected stream populates this map.)
// The map's keys are names of xDS resources known to the xDS client.
// The map's values are opaque resource versions.
map<string, string> initial_resource_versions = 5;

@htuch
Copy link
Member

htuch commented Apr 26, 2021

Sure, clarifying would be useful.

@adisuissa
Copy link
Contributor

I've updated the docs in the PR(#16153).

rboyer added a commit to hashicorp/consul that referenced this issue Jan 25, 2022
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).

This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).

This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.

This fixes the failure case as described here: #11833 (comment)

Co-authored-by: Huan Wang <[email protected]>
hc-github-team-consul-core pushed a commit to hashicorp/consul that referenced this issue Jan 25, 2022
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).

This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).

This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.

This fixes the failure case as described here: #11833 (comment)

Co-authored-by: Huan Wang <[email protected]>
rboyer added a commit to hashicorp/consul that referenced this issue Jan 25, 2022
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).

This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).

This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.

This fixes the failure case as described here: #11833 (comment)

Co-authored-by: Huan Wang <[email protected]>
rboyer added a commit to hashicorp/consul that referenced this issue Jan 25, 2022
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).

This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).

This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.

This fixes the failure case as described here: #11833 (comment)

Co-authored-by: Huan Wang <[email protected]>

Co-authored-by: Huan Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants