Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm VM corruption on plugin config update #125

Closed
bianpengyuan opened this issue Feb 5, 2021 · 7 comments · Fixed by envoyproxy/envoy#15016
Closed

Wasm VM corruption on plugin config update #125

bianpengyuan opened this issue Feb 5, 2021 · 7 comments · Fixed by envoyproxy/envoy#15016

Comments

@bianpengyuan
Copy link
Contributor

context: istio/istio#29843

When reconfigure a plugin, VM could be stuck at a bad state:

2021-02-04T22:06:29.003338Z	trace	envoy wasm	[host->vm] proxy_on_request_headers(6, 10, 1)
2021-02-04T22:06:29.003479Z	error	envoy wasm	Function: proxy_on_request_headers failed: Uncaught RuntimeError: function signature mismatch

This happen randomly to some VMs at an Envoy process, so looks like vm is corrupted due to config update.

@bianpengyuan
Copy link
Contributor Author

cc @PiotrSikora @mathetake

@PiotrSikora
Copy link
Member

Could you provide any steps to reproduce?

@bianpengyuan
Copy link
Contributor Author

bianpengyuan commented Feb 8, 2021

Steps to reproduce locally:

  1. Clone https://github.com/bianpengyuan/proxy and checkout webassembly/debug-vm-corruption branch, which has an integration test to reproduce this.
  2. Run make build_envoy, to build the envoy binary for the testing.
  3. Then run the test go test -v -run ^TestConfigUpdate$ istio.io/proxy/test/envoye2e/stats_plugin. The test will bring up a proxy, configure it with a basic auth wasm filter, update the plugin config, then sleep 300s.
  4. While the test is running, in another terminal, curl the testing envoy: for i in {1..300}; do curl localhost:23923/; done. It could take many requests to see empty response being returned, which happens on VM corruption. In the test terminal you should see a log saying Function: proxy_on_request_headers failed: Uncaught RuntimeError: function signature mismatch

@PiotrSikora
Copy link
Member

Thanks, I can replicate it locally.

@PiotrSikora
Copy link
Member

It looks that new HTTP request is dispatched on RootContext, for some reason.

Good execution:

[http] [external/envoy/source/common/http/conn_manager_impl.cc:883] [C145][S10210102865694369098] request headers complete (end_stream=true):                                                                         
[http] [external/envoy/source/common/http/filter_manager.cc:774] [C145][S10210102865694369098] request end stream                                                                                                     
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [host->vm] proxy_on_context_create(15, 1)                                                                                                         
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [host<-vm] proxy_on_context_create return: void                                                                                                   
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [host->vm] proxy_on_request_headers(15, 7, 1)                                                                                                     

Bad execution:

[http] [external/envoy/source/common/http/conn_manager_impl.cc:883] [C151][S10175944935117729199] request headers complete (end_stream=true):
[http] [external/envoy/source/common/http/filter_manager.cc:774] [C151][S10175944935117729199] request end stream                                                                                                     
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [host->vm] proxy_on_context_create(17, 0)                                                                                                         
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [vm->host] env.proxy_get_property(10976, 14, 5257888, 5257884)                                                                                    
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [vm<-host] env.proxy_get_property return: 0                                                                                                       
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [host<-vm] proxy_on_context_create return: void                                                                                                   
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:40] [host->vm] proxy_on_request_headers(17, 7, 1)                                                                                                     
[wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:39] Function: proxy_on_request_headers failed: Uncaught RuntimeError: function signature mismatch                                                     

btw: the port is 20923 in my env, not 23923.

@PiotrSikora
Copy link
Member

PiotrSikora commented Feb 10, 2021

The issue is that even though WasmVMs are thread local and each context is thread local, the cached root_context_id in Envoy integration isn't thread local, and it was pointing to a context that existed only in one of the threads.

This worked fine with the old code that updated root context in-place, since cached root_context_id was always pointing to root_context_id=1, which was valid context id on all threads, but with the change to create new root context for each config generation, the root_context_id was unique on each thread, and the cached version was correct only on one thread.

I'm working on a fix for this.

PiotrSikora added a commit to PiotrSikora/envoy that referenced this issue Feb 10, 2021
lizan pushed a commit to envoyproxy/envoy that referenced this issue Feb 17, 2021
lizan pushed a commit to envoyproxy/envoy that referenced this issue Feb 19, 2021
PiotrSikora added a commit to PiotrSikora/proxy that referenced this issue Feb 19, 2021
istio-testing pushed a commit to istio/proxy that referenced this issue Feb 19, 2021
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this issue Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants