-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm: removed automatical route refreshment and add a foreign function to clear the route cache #36671
wasm: removed automatical route refreshment and add a foreign function to clear the route cache #36671
Conversation
Signed-off-by: wangbaiping/wbpcode <[email protected]>
…isable-automatic-route-refreshment
Signed-off-by: wangbaiping/wbpcode <[email protected]>
Signed-off-by: wangbaiping/wbpcode <[email protected]>
Signed-off-by: wangbaiping/wbpcode <[email protected]>
@@ -450,6 +463,9 @@ class Context : public proxy_wasm::ContextBase, | |||
// Filter state prototype declaration. | |||
absl::flat_hash_map<std::string, Filters::Common::Expr::CelStatePrototypeConstPtr> | |||
state_prototypes_; | |||
|
|||
const bool no_automatic_route_refresh_{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default behavior will break existing Wasm users (which is "beta" in Istio). I think we need to change the xDS API instead of the runtime key. Runtime key assumes the default behavior will eventually change but we can't be certain in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do don't want break exist plugins, but I cannot find better chance to bring this back to the expected way. At least for now, the wasm still not production ready.
I also doubt if it is good choice to add new API. I think the API should be part of the wasm plugin configuration, not for the host configuration.
There are also some possible compromised way:
- Add a runtime key (not runtime flag) to control the global default value.
- Try to bump the ABI version and only change the behaviour in the new ABI version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, a long-living runtime key is probably better to avoid a churn in the spec. Not refreshing routes with a custom host call is a safer option, until we get better guidance from Proxy-Wasm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do don't want break exist plugins, but I cannot find better chance to bring this back to the expected way.
Don't change the default, and instead add xDS configuration option to disable automatic refresh.
I don't understand why you insisting on changing the default behavior and breaking existing plugins in the process.
At least for now, the wasm still not production ready.
That's a bit of a stretch. People have been using Proxy-Wasm in Envoy in production for the last 4+ years.
I also doubt if it is good choice to add new API. I think the API should be part of the wasm plugin configuration, not for the host configuration.
It absolutely should be a new xDS API:
- This must be configurable on a per plugin basis (e.g. you might want to allow some plugins to clear route cache, but disable it for others).
- As mentioned number of times already, since this is a security feature, hosts must be able to control that.
Per offline discussion, a long-living runtime key is probably better to avoid a churn in the spec. Not refreshing routes with a custom host call is a safer option, until we get better guidance from Proxy-Wasm
To be honest, this is mostly Envoy internal change. However, the upcoming ABI v0.3 will add features (proxy-wasm/spec#55) and ability to list custom functions (proxy-wasm/spec#71), both of which could be used to let plugin know about changed behavior and addition of the custom hostcall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit of a stretch. People have been using Proxy-Wasm in Envoy in production for the last 4+ years.
I am not very sure. May Proxy-Wasm self was production ready for a while. But in the Envoy, basically this is common understanding in our inner discussion (and it is still alpha). I think our recent target is to make it production ready in the Envoy. We just fixed a very very very basic bug #36411. So I persaonlly think may it's never be widely/heavily used until now.
Don't change the default, and instead add xDS configuration option to disable automatic refresh.
I don't understand why you insisting on changing the default behavior and breaking existing plugins in the process.
Because implicitly behavior will bring lots complexity in production where multiple different plugins may be used. It require the developers know all the details or it may result in unexpected result.
So I think the correct way should be: 1) no implicitly refresh, 2) provide explict host call to do it, 3) provide additional control by the xDS if needed (mostly for security, although I think now it not a big problem, but this may be necessary in future once the wasm is widely used).
And considering current situation of wasm support of Envoy, I think this is the last timepoint to change the behavior and bring it to the expected way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I also don't want to break exist plugin. So, I think we could provide different behavior for different versions' plugins.
For example, only disable the automotical refreh for plugins with ABI version >= 0.3.0. Then all exist plugins won't be effected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very sure. May Proxy-Wasm self was production ready for a while. But in the Envoy, basically this is common understanding in our inner discussion (and it is still alpha). I think our recent target is to make it production ready in the Envoy. [...] So I persaonlly think may it's never be widely/heavily used until now.
AFAIK, it's "alpha" only because of the missing docs.
There are many people using it in production, but if you know better, then that's fine.
We just fixed a very very very basic bug #36411.
I've just looked at that bug and I'm a bit puzzled by it.
There were already tests for buffered and unbuffered responses, so it seems that Envoy behaves differently depending on HTTP version / codec? I'm not sure if that's desirable behavior. Also, did you check when this issue first appeared? Is it possible that it was a regression due to change from nghttp2 to oghttp2? Maybe that would explain why existing users didn't run into it yet?
Of course, I also don't want to break exist plugin. So, I think we could provide different behavior for different versions' plugins.
For example, only disable the automotical refreh for plugins with ABI version >= 0.3.0. Then all exist plugins won't be effected.
Yes, that would be more reasonable. However, I still think this should be configurable in xDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were already tests for buffered and unbuffered responses, so it seems that Envoy behaves differently depending on HTTP version / codec? I'm not sure if that's desirable behavior. Also, did you check when this issue first appeared? Is it possible that it was a regression due to change from nghttp2 to oghttp2? Maybe that would explain why existing users didn't run into it yet?
Envoy never promise that the last chunk will be empty. Even for the HTTP/1.1, the last chunk may be modified by filters and contains data.
So, it actually is a bug. But may be the lib transferation changes the possibility of the bug occurs? Not sure.
And if you look at that fix more carefully, you may found it also fixed another simple bug: not set the response buffer correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be more reasonable. However, I still think this should be configurable in xDS.
I think we should make sure the default behavior match most cases and additional xds should only be used to provide additional control. And the ABI version also provide a way let's us to change the default behavior and without break exist plugins.
Anyway, it's hard to let everyone happy. Different people has different views.
But seems this one could be an acceptable approach for our both.
"clear_route_cache", | ||
[](WasmBase&, std::string_view, const std::function<void*(size_t size)>&) -> WasmResult { | ||
auto context = static_cast<Context*>(proxy_wasm::current_context_); | ||
context->clearRouteCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is this safe to call at any point in execution? This is untrusted code calling out, we might need to harden it if it's not already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember I did check in the clearRouteCache self. Will re-check it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clearRouteCache will be forbided after the response headers is processed. But I agree we should harden it continuously to ensure the interface that's provided to extension is robust enough.
For example, only allow the head filter to clear the route cache. I will do this with another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think from the separate discussion in proxy-wasm-spec, what we need is a control knob on the filter xDS with several options:
- Clear route cache on every header operation (past default).
- No clear route cache at all (future default).
- Permit clear route cache external function (to disallow untrusted extensions from causing harm by clearing routes).
- Automatic clear route after callback (to mimic ext_proc, not needed for your use case but it will be easier to handle dynamic metadata routing, for example).
We can split this into several booleans as well (e.g. 3 - could be a separate knob).
@PiotrSikora - does that look right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, this PR removed the automatic route refresh and provide a manual way. This should be good enough as a default way to handle the route cache.
New options (3 and 4) could be treated as independent requirement of Envoy-self and be implemented in the future.
I think a basic rule is listening the feedbacks from acutal practices and responding them quickly
, so, if some users have actual related requirements, I am happy to do it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyessenov yeah, that sounds good, although I'm not sure I understand the difference between 1 and 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyessenov yeah, that sounds good, although I'm not sure I understand the difference between 1 and 4.
The difference is that in which case we need additional xDS control. In the new implementation, the automical refresh should be enabled by an explict option.
seems I didn't got it correctly. 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PiotrSikora The difference is that 1) clears route only on header operations but 4) should clear routes on any dynamic metadata or other non-HTTP attribute update
Signed-off-by: wangbaiping/wbpcode <[email protected]>
/retest |
/wait |
Signed-off-by: wangbaiping/wbpcode <[email protected]>
/retest |
changelogs/current.yaml
Outdated
The route cache will not be cleared by default if the wasm extension modified the request headers. | ||
But it could be done explicitly by call the ``clear_route_cache`` foreign function. | ||
This behavior change could be disabled temporarily by setting the runtime flag | ||
``envoy_reloadable_features_wasm_no_automatic_route_refresh`` to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, ext_proc has configuration named disable_clear_route_cache
, so it might be a good idea to align with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically should be replaced by route_cache_action
. And the default behavior is only clear the route cache when the external plugin require it.
// The default behavior is to clear the route cache only when the
// :ref:`clear_route_cache <envoy_v3_api_field_service.ext_proc.v3.CommonResponse.clear_route_cache>`
// field is set in an external processor response.
DEFAULT = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbpcode Can you introduce this to the xDS spec then instead of the runtime flag. Personally, I think it's OK to change the default to be "do nothing", and allow control planes to set it to "ON_HEADER_OPERATION" to preserve the existing semantics. Istio and other consumers can then control the behavior via xDS which is easier for upgrades.
Would that answer your concerns @PiotrSikora ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyessenov Fine. But let me do it step by step. I will remove runtime flag and only keep the abi version check in this PR. Considering abi 0.3.0 is not ready now, so, this change won't break anything.
And at next one or two PR I will add the new API and let the api shepherds to review the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyessenov yes, that's much better, thank you!
Having said that, if we assume that control planes handle the upgrade logic, then I still don't see the benefit of breaking the current behavior and using DO_NOTHING
instead of ON_HEADER_OPERATION
as the default.
@@ -450,6 +463,9 @@ class Context : public proxy_wasm::ContextBase, | |||
// Filter state prototype declaration. | |||
absl::flat_hash_map<std::string, Filters::Common::Expr::CelStatePrototypeConstPtr> | |||
state_prototypes_; | |||
|
|||
const bool no_automatic_route_refresh_{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do don't want break exist plugins, but I cannot find better chance to bring this back to the expected way.
Don't change the default, and instead add xDS configuration option to disable automatic refresh.
I don't understand why you insisting on changing the default behavior and breaking existing plugins in the process.
At least for now, the wasm still not production ready.
That's a bit of a stretch. People have been using Proxy-Wasm in Envoy in production for the last 4+ years.
I also doubt if it is good choice to add new API. I think the API should be part of the wasm plugin configuration, not for the host configuration.
It absolutely should be a new xDS API:
- This must be configurable on a per plugin basis (e.g. you might want to allow some plugins to clear route cache, but disable it for others).
- As mentioned number of times already, since this is a security feature, hosts must be able to control that.
Per offline discussion, a long-living runtime key is probably better to avoid a churn in the spec. Not refreshing routes with a custom host call is a safer option, until we get better guidance from Proxy-Wasm
To be honest, this is mostly Envoy internal change. However, the upcoming ABI v0.3 will add features (proxy-wasm/spec#55) and ability to list custom functions (proxy-wasm/spec#71), both of which could be used to let plugin know about changed behavior and addition of the custom hostcall.
"clear_route_cache", | ||
[](WasmBase&, std::string_view, const std::function<void*(size_t size)>&) -> WasmResult { | ||
auto context = static_cast<Context*>(proxy_wasm::current_context_); | ||
context->clearRouteCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyessenov yeah, that sounds good, although I'm not sure I understand the difference between 1 and 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to change defaults to safer and allow extensibility in an Envoy-specific foreign function. We should follow-up with a generic "clear route cache" spec in xDS. @mpwarres
Signed-off-by: wangbaiping/wbpcode <[email protected]>
…isable-automatic-route-refreshment
changelogs/current.yaml
Outdated
The route cache will not be cleared by default if the wasm extension modified the request headers. | ||
But it could be done explicitly by call the ``clear_route_cache`` foreign function. | ||
This behavior change could be disabled temporarily by setting the runtime flag | ||
``envoy_reloadable_features_wasm_no_automatic_route_refresh`` to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbpcode Can you introduce this to the xDS spec then instead of the runtime flag. Personally, I think it's OK to change the default to be "do nothing", and allow control planes to set it to "ON_HEADER_OPERATION" to preserve the existing semantics. Istio and other consumers can then control the behavior via xDS which is easier for upgrades.
Would that answer your concerns @PiotrSikora ?
Signed-off-by: wangbaiping(wbpcode) <[email protected]>
…isable-automatic-route-refreshment
Signed-off-by: wangbaiping(wbpcode) <[email protected]>
/retest |
cc @kyessenov I will add the xDS options in next PR. This should be OK becasue we won't cut a release recently anyway. 🤣 (BTW: I think the new HTTP-specific options to the xDS configuration may bring much more effect to our code base than we think.) |
/retest |
if (wasm != nullptr) { | ||
abi_version_ = wasm->abi_version_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is available via wasm()->abiVersion()
in the callbacks, so you shouldn't need to retain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are completely right at the production code view. But this is helpful for our tests now to allow us to mock different abi versions.
Our tests also need to be improved to avoid this, but, I want to put this to a refactoring only PR, for our tests/Plugin class/Wasm class etc.
(Much things need to do... :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) WDYT about instead defining a virtual Context::getAbiVersion() method whose default impl calls wasm()->abiVersion(), but can be overridden by TestContext?
The minor quibble I have with adding an abi_version_ field here is that now the information is stored in multiple places, as opposed to one authoritative source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) WDYT about instead defining a virtual Context::getAbiVersion() method whose default impl calls wasm()->abiVersion(), but can be overridden by TestContext?
The minor quibble I have with adding an abi_version_ field here is that now the information is stored in multiple places, as opposed to one authoritative source.
SGTM. But I want to did some refactoring to these code anyway, will defer this to future PR.
changelogs/current.yaml
Outdated
The route cache will not be cleared by default if the wasm extension modified the request headers. | ||
But it could be done explicitly by call the ``clear_route_cache`` foreign function. | ||
This behavior change could be disabled temporarily by setting the runtime flag | ||
``envoy_reloadable_features_wasm_no_automatic_route_refresh`` to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyessenov yes, that's much better, thank you!
Having said that, if we assume that control planes handle the upgrade logic, then I still don't see the benefit of breaking the current behavior and using DO_NOTHING
instead of ON_HEADER_OPERATION
as the default.
friendly ping @mpwarres :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
if (wasm != nullptr) { | ||
abi_version_ = wasm->abi_version_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) WDYT about instead defining a virtual Context::getAbiVersion() method whose default impl calls wasm()->abiVersion(), but can be overridden by TestContext?
The minor quibble I have with adding an abi_version_ field here is that now the information is stored in multiple places, as opposed to one authoritative source.
Commit Message: wasm: removed automatical route refreshment and add a foreign function to clear the route cache
Additional Description:
Here are the reasons to do this change:
false
) to control the behavior.Risk Level: low.
Testing: n/a.
Docs Changes: n/a.
Release Notes: n/a.
Platform Specific Features: n/a.