-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for explicit wildcard resource #16855
Add support for explicit wildcard resource #16855
Conversation
Signed-off-by: Krzesimir Nowak <[email protected]>
Signed-off-by: Krzesimir Nowak <[email protected]>
docs/root/api-docs/xds_protocol.rst
Outdated
that once a stream has entered wildcard mode for a given resource type, there is no way to change | ||
the stream out of wildcard mode; resource names specified in any subsequent request on the stream | ||
will be ignored. | ||
types, there is also a "wildcard" mode, which comes in two equivalent flavors. First flavor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my last comment in #15857, I don't think we should be describing this as two separate modes. Instead, I think we should change this doc to explain that subscribing to *
means a wildcard subscription, and that if the first request on a stream for a given resource type specifies an empty set of subscriptions, that's a syntactic alias for subscribe to *
and nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at the rewording. Hopefully it's closer to what you wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. I've sent you #16861 with what I'm proposing.
Signed-off-by: Krzesimir Nowak <[email protected]>
Specify that sending an empty initial request as means for opting in into the wildcard subscription is a special case of sending the wildcard resource name. Signed-off-by: Krzesimir Nowak <[email protected]>
feda61b
to
d78ec97
Compare
Signed-off-by: Krzesimir Nowak <[email protected]>
@@ -53,6 +58,31 @@ void DeltaSubscriptionState::updateSubscriptionInterest( | |||
names_added_.erase(r); | |||
names_removed_.insert(r); | |||
} | |||
switch (mode_) { | |||
case WildcardMode::Implicit: | |||
if (names_removed_.find("*") != names_removed_.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised this implementation deviates from how wildcards are currently implemented: a subscription without any resource names is a wildcard one. A wildcard subscription stops being one when a resource name is added to it. If a subscription has the last resource name removed from it, it becomes a wildcard subscription (see https://github.com/envoyproxy/envoy/blob/main/source/common/config/watch_map.cc#L23 and https://github.com/envoyproxy/envoy/blob/main/source/common/config/watch_map.cc#L50).
Another issue is that we now have wildcard-state related logic in two places; I need to think about this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another difference is that currently a wildcard subscription always stays a wildcard one: added or removed resources do not affect subscriptions' behaviour after markStreamFresh()
has been called, and there's no difference with a regular subscription at other times.
After thinking about this, I think the way the current wildcard mode handles stream refresh might have a problem: it skips over all resource names that have been added, but doesn't remove corresponding resource states. Depending on how the server handles stream refresh (for example the server might reset its state and only send resources it considers necessary, ignoring client subscriptions), we might end up with different susbcription state on the client and the server.
I don't have a strong opinion about this, but I think a wildcard subscription should always stay one (i.e one cannot downgrade it to a regular subscription). If so, and to avoid the problem of an inconsistent state, I think a subscription in "new" wildcard mode should send all its state on stream refresh (alternatively, it should reset its state to the state at the time of creation).
WDYT? @markdroth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current spec (without my changes in #16861) says the following:
For Listener and Cluster resource types, there is also a “wildcard” mode, which is triggered when the initial request on the stream for that resource type contains no resource names. In this case, the server should use site-specific business logic to determine the full set of resources that the client is interested in, typically based on the client’s node identification. Note that once a stream has entered wildcard mode for a given resource type, there is no way to change the stream out of wildcard mode; resource names specified in any subsequent request on the stream will be ignored.
And later:
Note that for Listener and Cluster resource types where the stream is in “wildcard” mode (see How the client specifies what resources to return for details), the set of resources being subscribed to is determined by the server instead of the client, so there is no mechanism for the client to unsubscribe from resources.
In other words, currently, wildcard mode for a given resource type is determined by the very first request for that resource type on the stream, and the stream stays in that mode forever -- it is not possible to later change a stream from wildcard mode to non-wildcard mode or vice-versa. If that first request has resource_names
unset, then the stream is in wildcard mode for that resource type, and the resource_names
field is ignored in all subsequent requests for that resource type. However, if the first request has resource_names
set, then the stream is in non-wildcard mode for that resource type, and if there is later a request for that resource type with resource_names
unset, that means to unsubscribe from all resources of that type; it does not mean to enter wildcard mode.
One way to think about the change we're making here is that we want to allow exiting wildcard mode after the initial request, but it will still not be possible to enter wildcard mode. (Entering wildcard mode later would break existing gRPC clients, which will send a request with resource_names
unset to indicate unsubscribing to all resources of that type.)
Although I think a better way to think about it is what I described in #16861: Instead of wildcard being a mode for the stream as a whole, it's now going to be a special type of subscription (*
), and the existing syntax of leaving resource_names
unset will just be a syntactic alias for that special subscription.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note about the wildcard mode:
One way to think about the change we're making here is that we want to allow exiting wildcard mode after the initial request, but it will still not be possible to enter wildcard mode. (Entering wildcard mode later would break existing gRPC clients, which will send a request with
resource_names
unset to indicate unsubscribing to all resources of that type.)
I actually implemented also reentering into the wildcard subscription in this PR, but the reentering can only be done by sending "*". I imagined the current workflow in general terms (or in terms of SotW) to be like (think of the graphs below as deterministic finite state automata that start at "stream init"):
And for incremental (delta) wildcard workflow would look like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, there are typos like "unsubscribe to" where it should be "unsubscribe from", but hopefully those graphs help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd very much like an explicit wildcard mode.
when reconnection comes into play, there is a chance that non-wildcard stream could become wildcard:
- client asks for A, B, C, it's marked as non-wildcard stream
- client removed A, B, C, network error happens
- client reconnect, not the server would think it's a wildcard stream and begin to send everything to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code takes care of this scenario, what happens here at 3 is "client reconnects, but does not send any initial request, so the server shouldn't be sending anything back".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzzzz As @krnowak mentioned, that case should never happen, because if the client is no longer subscribed to any resources when it reconnects, it will not send any request for that resource type on the reconnected stream.
I don't see any value for an explicit wildcard mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krnowak It might be a good idea to change the PR title to reflect what it actually does. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, changed the PR title and the commit message.
Signed-off-by: Krzesimir Nowak <[email protected]>
@htuch: yes can do. |
Hi @htuch, sorry for late reply. I'm currently swamped with other work, so I didn't find time to rebase it nor to attend the community meeting yesterday. But the good news are that I was told by my employer that I can pick up this work and #15857 in July, so in one week. I think this PR is ready for review - I'll need to rebase this PR to current main branch, which means dropping whatever changes I have made in the xds_protocol.rst document, because Mark already did a better job at wording than me. But the code should rather not change. So thanks in advance for the reviews! |
I like @markdroth's idea of not using a state machine for handling of wildcard subscriptions (see the thread starting somewhere here: #16855 (comment)). I think it will be easier to maintain and test in the long run. |
Signed-off-by: Krzesimir Nowak <[email protected]>
Signed-off-by: Krzesimir Nowak <[email protected]>
Signed-off-by: Krzesimir Nowak <[email protected]>
This is to avoid putting "*" all over the place, but instead have a readable name. Signed-off-by: Krzesimir Nowak <[email protected]>
Signed-off-by: Krzesimir Nowak <[email protected]>
I updated the PR. So now the wildcard request will always send I would also like to get an advice where to put the @markdroth: One XDS protocol specification question though: let's assume I subscribed to |
I don't think that's what we want here. As per previous discussion, existing management servers detect wildcard requests by the
This problem is not actually unique to the wildcard subscription, and I don't think it's really new. The same would occur if the client was originally subscribed to As I mentioned above (#16855 (comment)), there are two fields in the request:
Note that all of the above assumes that we're talking about the incremental protocol variant. In SotW, there is no equivalent of |
Retrying Azure Pipelines: |
@dmitri-d Gentle Nudge |
@krnowak can you merge main again since this has been a while and I'll do the final merge? Thanks. |
This can be reverted too, when old DSS goes away. Signed-off-by: Krzesimir Nowak <[email protected]>
@htuch: Thanks, done. I needed to add another small commit to fix redis integration tests (something new came with the merge). |
@krnowak one more merge please :) Thanks. |
I don't understand the tsan failure (something about 1 tests failing remotely?). Will try a retest. |
/retest |
Retrying Azure Pipelines: |
The integration test got twice as much test cases to run, because it runs the previous configuration with either new or old DSS. This resulted in breaking the timeout limit for the test. Signed-off-by: Krzesimir Nowak <[email protected]>
So the tsan failure was about RedisClusterRemoval test from redis integration test timing out. It's probably because of my merge fix. Probably, because the tests passed earlier. The fix doubled the number of testcases from 8 to 16 (so they are being run against old and new DSS, because RedisIntegrationTest is basically an alias for AdsIntegrationTest, which tests things against new and old DSS). I suppose the typical fix for test timeouts is sharding, so I tried that, let's see how it goes. |
@htuch: Merge done. Seems like sharding did the trick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks so much @krnowak, this has been epic and paves the way for on-demand CDS, which will be a huge deal.
Awesome to see this finally land! I'm really looking forward to the on-demand CDS functionality! |
Thanks for merging it. :) I opened the new PR for ODCDS, since I could not reopen the old one. It's here: #18723. |
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream, prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field with the full list of currently subscribed items, instead of simply omitting it to infer that it wanted everything (which is what wildcard mode means). This upstream issue was filed in envoyproxy/envoy#16063 and fixed in envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later versions (later refactored in envoyproxy/envoy#16855). This PR conditionally forces LDS/CDS to be wildcard-only even when the connected Envoy requests a non-wildcard subscription, but only does so on versions prior to `1.19.0`, as we should not need to do this on later versions. This fixes the failure case as described here: #11833 (comment) Co-authored-by: Huan Wang <[email protected]>
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream, prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field with the full list of currently subscribed items, instead of simply omitting it to infer that it wanted everything (which is what wildcard mode means). This upstream issue was filed in envoyproxy/envoy#16063 and fixed in envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later versions (later refactored in envoyproxy/envoy#16855). This PR conditionally forces LDS/CDS to be wildcard-only even when the connected Envoy requests a non-wildcard subscription, but only does so on versions prior to `1.19.0`, as we should not need to do this on later versions. This fixes the failure case as described here: #11833 (comment) Co-authored-by: Huan Wang <[email protected]>
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream, prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field with the full list of currently subscribed items, instead of simply omitting it to infer that it wanted everything (which is what wildcard mode means). This upstream issue was filed in envoyproxy/envoy#16063 and fixed in envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later versions (later refactored in envoyproxy/envoy#16855). This PR conditionally forces LDS/CDS to be wildcard-only even when the connected Envoy requests a non-wildcard subscription, but only does so on versions prior to `1.19.0`, as we should not need to do this on later versions. This fixes the failure case as described here: #11833 (comment) Co-authored-by: Huan Wang <[email protected]>
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream, prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field with the full list of currently subscribed items, instead of simply omitting it to infer that it wanted everything (which is what wildcard mode means). This upstream issue was filed in envoyproxy/envoy#16063 and fixed in envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later versions (later refactored in envoyproxy/envoy#16855). This PR conditionally forces LDS/CDS to be wildcard-only even when the connected Envoy requests a non-wildcard subscription, but only does so on versions prior to `1.19.0`, as we should not need to do this on later versions. This fixes the failure case as described here: #11833 (comment) Co-authored-by: Huan Wang <[email protected]> Co-authored-by: Huan Wang <[email protected]>
Commit Message:
This enables a client to send a special resource named
*
to a server as a way to express an interest in a wildcard subscription. That way the wildcard subscription can coexist with subscriptions to other resources on the same stream. The wildcard resource, like any other resource, can be subscribed to and unsubscribed from at any time. The legacy way of enabling wildcard subscription (by sending an empty initial message) is still supported, but otherwise behaves just like a subscription to*
. This behavior is gated behind the enabled-by-default "envoy.restart_features.explicit_wildcard_resource" runtime guard.Additional Description:
N/A
Risk Level:
High, changes in wildcard mode.
Testing:
As a part of #15523. Also done internally.
Docs Changes:
Updated version history.
Release Notes:
*
is reserved. See :ref:wildcard mode description <xds_protocol_resource_hints>
and :ref:unsubscribing from resources <xds_protocol_unsubscribing>
for details. This behavior is gated behind theenvoy.restart_features.explicit_wildcard_resource
runtime guard.Platform Specific Features:
N/A
Runtime guard:
envoy.restart_features.explicit_wildcard_resource
enabled by default.[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]