Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm: removed automatical route refreshment and add a foreign function to clear the route cache #36671

Merged

Conversation

wbpcode
Copy link
Member

@wbpcode wbpcode commented Oct 17, 2024

Commit Message: wasm: removed automatical route refreshment and add a foreign function to clear the route cache

Additional Description:

Here are the reasons to do this change:

  1. It make it's impossible to only mofify the host or path or some headers but don't effect the route result.
  2. Basically, refreshing route is not encouraged. Even the filters (ext_authz, jwt_authz, ext_proc, etc) who acutally need or want to clear the route cache will also provide a flag (and basically default to false) to control the behavior.
  3. Refreshing route have big effect to the traffic and should only be done explicitly (code or configurations), we should never change it implicitly.

Risk Level: low.
Testing: n/a.
Docs Changes: n/a.
Release Notes: n/a.
Platform Specific Features: n/a.

Signed-off-by: wangbaiping/wbpcode <[email protected]>
Signed-off-by: wangbaiping/wbpcode <[email protected]>
@wbpcode wbpcode requested a review from kyessenov as a code owner October 17, 2024 11:36
Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #36671 was opened by wbpcode.

see: more, trace.

Signed-off-by: wangbaiping/wbpcode <[email protected]>
Signed-off-by: wangbaiping/wbpcode <[email protected]>
@@ -450,6 +463,9 @@ class Context : public proxy_wasm::ContextBase,
// Filter state prototype declaration.
absl::flat_hash_map<std::string, Filters::Common::Expr::CelStatePrototypeConstPtr>
state_prototypes_;

const bool no_automatic_route_refresh_{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default behavior will break existing Wasm users (which is "beta" in Istio). I think we need to change the xDS API instead of the runtime key. Runtime key assumes the default behavior will eventually change but we can't be certain in this case.

Copy link
Member Author

@wbpcode wbpcode Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do don't want break exist plugins, but I cannot find better chance to bring this back to the expected way. At least for now, the wasm still not production ready.

I also doubt if it is good choice to add new API. I think the API should be part of the wasm plugin configuration, not for the host configuration.

There are also some possible compromised way:

  1. Add a runtime key (not runtime flag) to control the global default value.
  2. Try to bump the ABI version and only change the behaviour in the new ABI version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, a long-living runtime key is probably better to avoid a churn in the spec. Not refreshing routes with a custom host call is a safer option, until we get better guidance from Proxy-Wasm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do don't want break exist plugins, but I cannot find better chance to bring this back to the expected way.

Don't change the default, and instead add xDS configuration option to disable automatic refresh.

I don't understand why you insisting on changing the default behavior and breaking existing plugins in the process.

At least for now, the wasm still not production ready.

That's a bit of a stretch. People have been using Proxy-Wasm in Envoy in production for the last 4+ years.

I also doubt if it is good choice to add new API. I think the API should be part of the wasm plugin configuration, not for the host configuration.

It absolutely should be a new xDS API:

  1. This must be configurable on a per plugin basis (e.g. you might want to allow some plugins to clear route cache, but disable it for others).
  2. As mentioned number of times already, since this is a security feature, hosts must be able to control that.

Per offline discussion, a long-living runtime key is probably better to avoid a churn in the spec. Not refreshing routes with a custom host call is a safer option, until we get better guidance from Proxy-Wasm

To be honest, this is mostly Envoy internal change. However, the upcoming ABI v0.3 will add features (proxy-wasm/spec#55) and ability to list custom functions (proxy-wasm/spec#71), both of which could be used to let plugin know about changed behavior and addition of the custom hostcall.

Copy link
Member Author

@wbpcode wbpcode Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit of a stretch. People have been using Proxy-Wasm in Envoy in production for the last 4+ years.

I am not very sure. May Proxy-Wasm self was production ready for a while. But in the Envoy, basically this is common understanding in our inner discussion (and it is still alpha). I think our recent target is to make it production ready in the Envoy. We just fixed a very very very basic bug #36411. So I persaonlly think may it's never be widely/heavily used until now.

Don't change the default, and instead add xDS configuration option to disable automatic refresh.
I don't understand why you insisting on changing the default behavior and breaking existing plugins in the process.

Because implicitly behavior will bring lots complexity in production where multiple different plugins may be used. It require the developers know all the details or it may result in unexpected result.

So I think the correct way should be: 1) no implicitly refresh, 2) provide explict host call to do it, 3) provide additional control by the xDS if needed (mostly for security, although I think now it not a big problem, but this may be necessary in future once the wasm is widely used).

And considering current situation of wasm support of Envoy, I think this is the last timepoint to change the behavior and bring it to the expected way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I also don't want to break exist plugin. So, I think we could provide different behavior for different versions' plugins.

For example, only disable the automotical refreh for plugins with ABI version >= 0.3.0. Then all exist plugins won't be effected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very sure. May Proxy-Wasm self was production ready for a while. But in the Envoy, basically this is common understanding in our inner discussion (and it is still alpha). I think our recent target is to make it production ready in the Envoy. [...] So I persaonlly think may it's never be widely/heavily used until now.

AFAIK, it's "alpha" only because of the missing docs.

There are many people using it in production, but if you know better, then that's fine.

We just fixed a very very very basic bug #36411.

I've just looked at that bug and I'm a bit puzzled by it.

There were already tests for buffered and unbuffered responses, so it seems that Envoy behaves differently depending on HTTP version / codec? I'm not sure if that's desirable behavior. Also, did you check when this issue first appeared? Is it possible that it was a regression due to change from nghttp2 to oghttp2? Maybe that would explain why existing users didn't run into it yet?

Of course, I also don't want to break exist plugin. So, I think we could provide different behavior for different versions' plugins.

For example, only disable the automotical refreh for plugins with ABI version >= 0.3.0. Then all exist plugins won't be effected.

Yes, that would be more reasonable. However, I still think this should be configurable in xDS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were already tests for buffered and unbuffered responses, so it seems that Envoy behaves differently depending on HTTP version / codec? I'm not sure if that's desirable behavior. Also, did you check when this issue first appeared? Is it possible that it was a regression due to change from nghttp2 to oghttp2? Maybe that would explain why existing users didn't run into it yet?

Envoy never promise that the last chunk will be empty. Even for the HTTP/1.1, the last chunk may be modified by filters and contains data.

So, it actually is a bug. But may be the lib transferation changes the possibility of the bug occurs? Not sure.
And if you look at that fix more carefully, you may found it also fixed another simple bug: not set the response buffer correctly.

Copy link
Member Author

@wbpcode wbpcode Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be more reasonable. However, I still think this should be configurable in xDS.

I think we should make sure the default behavior match most cases and additional xds should only be used to provide additional control. And the ABI version also provide a way let's us to change the default behavior and without break exist plugins.

Anyway, it's hard to let everyone happy. Different people has different views.

But seems this one could be an acceptable approach for our both.

"clear_route_cache",
[](WasmBase&, std::string_view, const std::function<void*(size_t size)>&) -> WasmResult {
auto context = static_cast<Context*>(proxy_wasm::current_context_);
context->clearRouteCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this safe to call at any point in execution? This is untrusted code calling out, we might need to harden it if it's not already done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember I did check in the clearRouteCache self. Will re-check it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clearRouteCache will be forbided after the response headers is processed. But I agree we should harden it continuously to ensure the interface that's provided to extension is robust enough.

For example, only allow the head filter to clear the route cache. I will do this with another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from the separate discussion in proxy-wasm-spec, what we need is a control knob on the filter xDS with several options:

  1. Clear route cache on every header operation (past default).
  2. No clear route cache at all (future default).
  3. Permit clear route cache external function (to disallow untrusted extensions from causing harm by clearing routes).
  4. Automatic clear route after callback (to mimic ext_proc, not needed for your use case but it will be easier to handle dynamic metadata routing, for example).

We can split this into several booleans as well (e.g. 3 - could be a separate knob).
@PiotrSikora - does that look right to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, this PR removed the automatic route refresh and provide a manual way. This should be good enough as a default way to handle the route cache.

New options (3 and 4) could be treated as independent requirement of Envoy-self and be implemented in the future.

I think a basic rule is listening the feedbacks from acutal practices and responding them quickly, so, if some users have actual related requirements, I am happy to do it. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov yeah, that sounds good, although I'm not sure I understand the difference between 1 and 4.

Copy link
Member Author

@wbpcode wbpcode Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov yeah, that sounds good, although I'm not sure I understand the difference between 1 and 4.

The difference is that in which case we need additional xDS control. In the new implementation, the automical refresh should be enabled by an explict option.

seems I didn't got it correctly. 😞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PiotrSikora The difference is that 1) clears route only on header operations but 4) should clear routes on any dynamic metadata or other non-HTTP attribute update

Signed-off-by: wangbaiping/wbpcode <[email protected]>
@wbpcode
Copy link
Member Author

wbpcode commented Oct 18, 2024

/retest

@kyessenov
Copy link
Contributor

/wait

Signed-off-by: wangbaiping/wbpcode <[email protected]>
@wbpcode
Copy link
Member Author

wbpcode commented Oct 19, 2024

/retest

The route cache will not be cleared by default if the wasm extension modified the request headers.
But it could be done explicitly by call the ``clear_route_cache`` foreign function.
This behavior change could be disabled temporarily by setting the runtime flag
``envoy_reloadable_features_wasm_no_automatic_route_refresh`` to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, ext_proc has configuration named disable_clear_route_cache, so it might be a good idea to align with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It basically should be replaced by route_cache_action. And the default behavior is only clear the route cache when the external plugin require it.

    // The default behavior is to clear the route cache only when the
    // :ref:`clear_route_cache <envoy_v3_api_field_service.ext_proc.v3.CommonResponse.clear_route_cache>`
    // field is set in an external processor response.
    DEFAULT = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbpcode Can you introduce this to the xDS spec then instead of the runtime flag. Personally, I think it's OK to change the default to be "do nothing", and allow control planes to set it to "ON_HEADER_OPERATION" to preserve the existing semantics. Istio and other consumers can then control the behavior via xDS which is easier for upgrades.

Would that answer your concerns @PiotrSikora ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov Fine. But let me do it step by step. I will remove runtime flag and only keep the abi version check in this PR. Considering abi 0.3.0 is not ready now, so, this change won't break anything.

And at next one or two PR I will add the new API and let the api shepherds to review the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov yes, that's much better, thank you!

Having said that, if we assume that control planes handle the upgrade logic, then I still don't see the benefit of breaking the current behavior and using DO_NOTHING instead of ON_HEADER_OPERATION as the default.

@@ -450,6 +463,9 @@ class Context : public proxy_wasm::ContextBase,
// Filter state prototype declaration.
absl::flat_hash_map<std::string, Filters::Common::Expr::CelStatePrototypeConstPtr>
state_prototypes_;

const bool no_automatic_route_refresh_{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do don't want break exist plugins, but I cannot find better chance to bring this back to the expected way.

Don't change the default, and instead add xDS configuration option to disable automatic refresh.

I don't understand why you insisting on changing the default behavior and breaking existing plugins in the process.

At least for now, the wasm still not production ready.

That's a bit of a stretch. People have been using Proxy-Wasm in Envoy in production for the last 4+ years.

I also doubt if it is good choice to add new API. I think the API should be part of the wasm plugin configuration, not for the host configuration.

It absolutely should be a new xDS API:

  1. This must be configurable on a per plugin basis (e.g. you might want to allow some plugins to clear route cache, but disable it for others).
  2. As mentioned number of times already, since this is a security feature, hosts must be able to control that.

Per offline discussion, a long-living runtime key is probably better to avoid a churn in the spec. Not refreshing routes with a custom host call is a safer option, until we get better guidance from Proxy-Wasm

To be honest, this is mostly Envoy internal change. However, the upcoming ABI v0.3 will add features (proxy-wasm/spec#55) and ability to list custom functions (proxy-wasm/spec#71), both of which could be used to let plugin know about changed behavior and addition of the custom hostcall.

"clear_route_cache",
[](WasmBase&, std::string_view, const std::function<void*(size_t size)>&) -> WasmResult {
auto context = static_cast<Context*>(proxy_wasm::current_context_);
context->clearRouteCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov yeah, that sounds good, although I'm not sure I understand the difference between 1 and 4.

kyessenov
kyessenov previously approved these changes Oct 21, 2024
Copy link
Contributor

@kyessenov kyessenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to change defaults to safer and allow extensibility in an Envoy-specific foreign function. We should follow-up with a generic "clear route cache" spec in xDS. @mpwarres

The route cache will not be cleared by default if the wasm extension modified the request headers.
But it could be done explicitly by call the ``clear_route_cache`` foreign function.
This behavior change could be disabled temporarily by setting the runtime flag
``envoy_reloadable_features_wasm_no_automatic_route_refresh`` to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbpcode Can you introduce this to the xDS spec then instead of the runtime flag. Personally, I think it's OK to change the default to be "do nothing", and allow control planes to set it to "ON_HEADER_OPERATION" to preserve the existing semantics. Istio and other consumers can then control the behavior via xDS which is easier for upgrades.

Would that answer your concerns @PiotrSikora ?

Signed-off-by: wangbaiping(wbpcode) <[email protected]>
Signed-off-by: wangbaiping(wbpcode) <[email protected]>
@wbpcode
Copy link
Member Author

wbpcode commented Oct 23, 2024

/retest

@wbpcode
Copy link
Member Author

wbpcode commented Oct 23, 2024

cc @kyessenov I will add the xDS options in next PR. This should be OK becasue we won't cut a release recently anyway. 🤣 (BTW: I think the new HTTP-specific options to the xDS configuration may bring much more effect to our code base than we think.)

@wbpcode
Copy link
Member Author

wbpcode commented Oct 23, 2024

/retest

Comment on lines +157 to +159
if (wasm != nullptr) {
abi_version_ = wasm->abi_version_;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is available via wasm()->abiVersion() in the callbacks, so you shouldn't need to retain it.

Copy link
Member Author

@wbpcode wbpcode Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are completely right at the production code view. But this is helpful for our tests now to allow us to mock different abi versions.

Our tests also need to be improved to avoid this, but, I want to put this to a refactoring only PR, for our tests/Plugin class/Wasm class etc.

(Much things need to do... :(

Copy link
Contributor

@mpwarres mpwarres Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) WDYT about instead defining a virtual Context::getAbiVersion() method whose default impl calls wasm()->abiVersion(), but can be overridden by TestContext?

The minor quibble I have with adding an abi_version_ field here is that now the information is stored in multiple places, as opposed to one authoritative source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) WDYT about instead defining a virtual Context::getAbiVersion() method whose default impl calls wasm()->abiVersion(), but can be overridden by TestContext?

The minor quibble I have with adding an abi_version_ field here is that now the information is stored in multiple places, as opposed to one authoritative source.

SGTM. But I want to did some refactoring to these code anyway, will defer this to future PR.

The route cache will not be cleared by default if the wasm extension modified the request headers.
But it could be done explicitly by call the ``clear_route_cache`` foreign function.
This behavior change could be disabled temporarily by setting the runtime flag
``envoy_reloadable_features_wasm_no_automatic_route_refresh`` to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov yes, that's much better, thank you!

Having said that, if we assume that control planes handle the upgrade logic, then I still don't see the benefit of breaking the current behavior and using DO_NOTHING instead of ON_HEADER_OPERATION as the default.

@wbpcode
Copy link
Member Author

wbpcode commented Oct 29, 2024

friendly ping @mpwarres :)

Copy link
Contributor

@mpwarres mpwarres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment on lines +157 to +159
if (wasm != nullptr) {
abi_version_ = wasm->abi_version_;
}
Copy link
Contributor

@mpwarres mpwarres Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) WDYT about instead defining a virtual Context::getAbiVersion() method whose default impl calls wasm()->abiVersion(), but can be overridden by TestContext?

The minor quibble I have with adding an abi_version_ field here is that now the information is stored in multiple places, as opposed to one authoritative source.

@wbpcode wbpcode merged commit e483dbb into envoyproxy:main Oct 30, 2024
20 checks passed
@wbpcode wbpcode deleted the dev-disable-automatic-route-refreshment branch October 30, 2024 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants