Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm binary served from kuadrant operator workload #686

Closed
wants to merge 8 commits into from
Closed

Conversation

eguzki
Copy link
Contributor

@eguzki eguzki commented Jun 3, 2024

What

part of #325
Follow-up work after discarding #593

The rate limiting wasm binary is embedded in the kuadrant operator at build time. The kuadrant operator exposes the wasm binary in port 8082 (as a configuration parameter) at the endpoint /kuadrant-wasm-shim. The operator's build process requires the wasm binary to be available locally. Then, the SHA256 checksum of the wasm binary is computed and stored internally in a Golang variable. In order to have the wasm binary locally available at build time, there is a new makefile target to fetch the specified version of the wasm binary from the Github release assets.

make wasm-shim WASM_SHIM_VERSION=vX.Z.Y

The kuadrant operator exposes the wasm binary deploying a new kubernetes service called kuadrant-operator-controller-manager-wasm-shim-service. It looks like this (some fields were removed for simplicity)

apiVersion: v1
kind: Service
metadata:
  labels:
    app: kuadrant
    control-plane: controller-manager
  name: kuadrant-operator-controller-manager-wasm-shim-service
  namespace: kuadrant-system
spec:
  ports:
  - name: wasm-shim
    port: 8082
    protocol: TCP
    targetPort: wasm-shim
  selector:
    app: kuadrant
    control-plane: controller-manager
  sessionAffinity: None
  type: ClusterIP

The wasm module integrates with the gateway in the data plane via
the Wasm Network filter.
The source code of the compiled Wasm binaries is hosted at
Kuadrant's Wasm-Shim project.

Currently, at runtime, the istio control plane downloads an oci wasm image. Usually from cluster external image repo like quay.io. This clearly opens a risky door to inject malicious code.

This architecture enables so-called offline or disconnected installs,
which allow having the entire cluster disconnected from the internet,
at least regarding the Wasm module.

Disconnected install is itself a full feature and engineering did not tested that yet.

How

Istio

The following sequence diagram shows the workflow when Envoy is managed by Istio

sequenceDiagram
    autonumber
    box transparent Kubernetes cluster
    participant K as Kuadrant Operator
    participant I as Istio WasmPlugin
    participant E as Envoy
    end
    K->>I: http://kuadrant-operator-address:8082, sha256
    I->>K: Fetch Wasm binary, verify sha256 checksum
    I->>E: Push Wasm binary
    I->>E: Setup Wasm filter
Loading

Verification Steps

  • Setup the environment:
make local-setup   
  • Request an instance of Kuadrant:
kubectl -n kuadrant-system apply -f - <<EOF
apiVersion: kuadrant.io/v1beta1
kind: Kuadrant
metadata:
  name: kuadrant
spec: {}
EOF
  • Deploy toystore
kubectl apply -f examples/toystore/toystore.yaml

Create a HTTPRoute to route traffic to the service via Istio Ingress Gateway:

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: toystore
spec:
  parentRefs:
  - name: istio-ingressgateway
    namespace: istio-system
  hostnames:
  - api.toystore.com
  rules:
  - matches:
    - method: GET
      path:
        type: PathPrefix
        value: "/toys"
    backendRefs:
    - name: toystore
      port: 80
  - matches: # it has to be a separate HTTPRouteRule so we do not rate limit other endpoints
    - method: POST
      path:
        type: Exact
        value: "/toys"
    backendRefs:
    - name: toystore
      port: 80
EOF

Export the gateway hostname and port:

export INGRESS_HOST=$(kubectl get gtw istio-ingressgateway -n istio-system -o jsonpath='{.status.addresses[0].value}')
export INGRESS_PORT=$(kubectl get gtw istio-ingressgateway -n istio-system -o jsonpath='{.spec.listeners[?(@.name=="http")].port}')
export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT

Verify the route works:

curl -H 'Host: api.toystore.com' http://$GATEWAY_URL/toys -i
# HTTP/1.1 200 OK
  • Enforce rate limiting on requests to the Toy Store API
kubectl apply -f - <<EOF
apiVersion: kuadrant.io/v1beta2
kind: RateLimitPolicy
metadata:
  name: toystore
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: toystore
  limits:
    "create-toy":
      rates:
      - limit: 5
        duration: 10
        unit: second
      routeSelectors:
      - matches: # selects the 2nd HTTPRouteRule of the targeted route
        - method: POST
          path:
            type: Exact
            value: "/toys"
EOF
  • Check wasm plugin has been created and it contains the url of local service of the kuadrant operator at port 8082 and endpoint /kuadrant-wasm-shim
kubectl get wasmplugin kuadrant-istio-ingressgateway -n istio-system -o jsonpath="{.spec.url}"

It should return

http://kuadrant-operator-controller-manager-wasm-shim-service.kuadrant-system.svc.cluster.local:8082/kuadrant-wasm-shim

Note that the url follows the kubernetes format http://<service-name>.<namespace>.svc.cluster.local:<port>/<endpoint>.

  • Check wasm plugin has been created and it contains the sha256 checksum of the Wasm binary served by the operator
kubectl get wasmplugin kuadrant-istio-ingressgateway -n istio-system -o jsonpath="{.spec.sha256}"

It should return some sha256 value (may not be the same)

12879567faee3d2625a4998f6a4e622ded01163133bdef2b539bfa62f921cdd8

The sha256 checksum value should match the one shown in the operator logs

kubectl logs deployment/kuadrant-operator-controller-manager -n kuadrant-system | grep sha256

which gives

2024-06-12T08:54:18Z	INFO	kuadrant-operator	wasm-shim	{"sha256": "12879567faee3d2625a4998f6a4e622ded01163133bdef2b539bfa62f921cdd8"}
  • Run requests (5 out of 10 allowed)
while :; do curl --write-out '%{http_code}\n' --silent --output /dev/null -H 'Host: api.toystore.com' http://$GATEWAY_URL/toys -X POST | grep -E --color "\b(429)\b|$"; sleep 1; done
  • Let's verify upgrade of the Wasm binary version
    There is no need to stop the http client... it should not be affected and rate limiting should work all the time

First delete cached wasm binary

rm kuadrant-wasm-shim

Undeploy running kuadrant operator

kubectl delete deployment kuadrant-operator-controller-manager -n kuadrant-system

Download the new wasm version v0.4.0-alpha.4

make wasm-shim WASM_SHIM_VERSION=v0.4.0-alpha.4

It should report the new sha256 checksum

Downloading [email protected] from https://api.github.com/repos/Kuadrant/wasm-shim/releases/assets/173313660
sha256sum /home/eguzki/git/kuadrant/kuadrant-operator/kuadrant-wasm-shim
e0c43b4759a86d97461377bf55c71d4a6366f709a245420e34ab12928a3e101e  /home/eguzki/git/kuadrant/kuadrant-operator/kuadrant-wasm-shim

The next command will build a new operator image with the new wasm binary and deploy it

make local-deploy
  • Check wasm plugin has been updated (reconciled from the new operator) and it contains the new sha256 checksum of the Wasm binary served by the operator
kubectl get wasmplugin kuadrant-istio-ingressgateway -n istio-system -o jsonpath="{.spec.sha256}"

It should return some sha256 value (may not be the same)

e0c43b4759a86d97461377bf55c71d4a6366f709a245420e34ab12928a3e101e

The sha256 checksum value should match the one shown in the operator logs

kubectl logs deployment/kuadrant-operator-controller-manager -n kuadrant-system | grep sha256

which gives

2024-06-12T09:02:39Z	INFO	kuadrant-operator	wasm-shim	{"sha256": "e0c43b4759a86d97461377bf55c71d4a6366f709a245420e34ab12928a3e101e"}

Note that before the upgrade, the sha256 was 12879567faee3d2625a4998f6a4e622ded01163133bdef2b539bfa62f921cdd8

Note the HTTP client keeps being rate limited and did not notice the upgrade.

@eguzki eguzki added the kind/enhancement New feature or request label Jun 3, 2024
@eguzki eguzki changed the title wasm service Wasm binary served from kuadrant operator workload Jun 3, 2024
@eguzki eguzki mentioned this pull request Jun 3, 2024
11 tasks
@@ -23,10 +23,6 @@ on:
description: DNS Operator bundle version
default: latest
type: string
wasmShimVersion:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up @didierofrivia

The wasm shim is no longer part of the operator bundle. Instead, it is added as part of the kuadrant operator image build process.

Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.92%. Comparing base (ece13e8) to head (1168e54).
Report is 120 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #686      +/-   ##
==========================================
+ Coverage   80.20%   82.92%   +2.72%     
==========================================
  Files          64       77      +13     
  Lines        4492     5776    +1284     
==========================================
+ Hits         3603     4790    +1187     
- Misses        600      653      +53     
- Partials      289      333      +44     
Flag Coverage Δ
bare-k8s-integration 4.56% <0.00%> (?)
controllers-integration 72.58% <73.91%> (?)
gatewayapi-integration 11.11% <0.00%> (?)
integration ?
istio-integration 56.23% <73.91%> (?)
unit 32.42% <0.00%> (+2.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 71.42% <ø> (ø)
api/v1beta2 (u) 93.58% <100.00%> (+2.16%) ⬆️
pkg/common (u) 88.13% <ø> (-0.70%) ⬇️
pkg/istio (u) 72.39% <ø> (-1.53%) ⬇️
pkg/log (u) 94.73% <ø> (ø)
pkg/reconcilers (u) ∅ <ø> (∅)
pkg/rlptools (u) 82.53% <ø> (+3.08%) ⬆️
controllers (i) 82.18% <80.49%> (+5.38%) ⬆️
Files Coverage Δ
...llers/rate_limiting_istio_wasmplugin_controller.go 81.09% <100.00%> (ø)
pkg/rlptools/wasm/server_utils.go 100.00% <100.00%> (ø)
pkg/rlptools/wasm/utils.go 86.66% <ø> (ø)
pkg/istio/mutators.go 41.66% <0.00%> (ø)

... and 32 files with indirect coverage changes

@eguzki eguzki marked this pull request as ready for review June 4, 2024 16:12
@eguzki eguzki requested a review from a team as a code owner June 4, 2024 16:12
Makefile Outdated
@@ -305,15 +295,26 @@ test-unit: clean-cov generate fmt vet ## Run Unit tests.

##@ Build

WASM_SHIM = $(PROJECT_PATH)/kuadrant-ratelimit-wasm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a more generic name for this, that perhaps continues to work if we, say, add something like ext authz to the functions performed by the component as well?

Suggested change
WASM_SHIM = $(PROJECT_PATH)/kuadrant-ratelimit-wasm
WASM_SHIM = $(PROJECT_PATH)/kuadrant-wasm-shim

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are planning to have just one wasm shim to do everything, it makes sense @guicassolato suggestion... however, if the name is meant to only rate limiting purposes, it's OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for the more generic name too... unless we need to discriminate one day, which I doubt, this is the most portable option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what's better between one wasm shim per function (RL, auth) or a single one for all. I can see pros and cons for both, although today I'd probably be more inclined to a single one, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was that if ext authz was being done by wasm, it would be called kuadrant-authz-wasm. And would be a different wasm binary. Discrimination comes from the fact that they are essentially speaking different languages. While rate limiting uses RLS, ext auth uses external authorization gRPC protocol. So configuration of each wasm module, I pre-asume, would be different. Not even speaking about potentially different release stream.

But if you prefer to go for a generic name for now, I am happy with it. It can always be changed in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that RLS and ext_authz are indeed what make the two wasm-shims (two functions of a single wasm-shim) different one another.

On the other hand, other than that, arguably the two wasm-shims are practically identical:

  • both need to decide whether a request matches;
  • both need to evaluate well-known attributes;
  • both perform a grpc call to a service, basically saying "should I let traffic go through? decide based on this payload.";
  • both expect a boolean response, maybe with some metadata that typically become HTTP headers.

Moreover, if the auth layer needs to propagate data to the RL layer, with one wasm-shim, that can be done "over-the-wire", while with two, one wasm-shim needs to inject Envoy Dynamic Metadata so the other one can retrieve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a design pattern, I always pick one module does one thing. For multiple reasons. But this is a discussion we do not need to have now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one module does one thing

Not wrong. Neither we should repeat ourselves, another design principle 😜

Maybe this is one of those cases where following all the "rules" we have in life becomes impractical, like "The early bird catches the worm" but also "Good things come to those who wait."

.gitignore Outdated
@@ -31,5 +31,7 @@ tmp
/catalog/kuadrant-operator-catalog.Dockerfile
/coverage/

/kuadrant-ratelimit-wasm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it kuadrant-ratelimit-shim ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be kuadrant-wasm-shim

Dockerfile Outdated
@@ -16,12 +25,16 @@ COPY controllers/ controllers/
COPY pkg/ pkg/

# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o manager main.go
RUN WASM_SHIM_SHA256=$(cat /opt/kuadrant/wasm-shim/kuadrant-ratelimit-wasm.sha256) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case we follow what Gui suggested, this var makes sense... if not, probably something like RATELIMIT_SHIM_SHA256 would make more sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping as WASM_SHIM_SHA256

}

.PHONY: wasm-shim
wasm-shim: $(WASM_SHIM) ## Download opm locally if necessary.
Copy link
Member

@didierofrivia didierofrivia Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## Download the wasm-shim locally if necessary

@eguzki
Copy link
Contributor Author

eguzki commented Jun 12, 2024

Applied s/kuadrant-ratelimit-wasm/kuadrant-wasm-shim/g (not literally) and updated verification steps.

ready for a new review

resources:
- ../default
- ../dependencies
patchesStrategicMerge:
- manager_debug_mode.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we defaulting to debug on make deploy now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! we can no longer run make run to run locally (an actual process in your local machine) which was configured with debug level to see all details.

So, now, we (developers) have only "make deploy" dev/Testing deployment option so I decided to deploy with debug to have all available logs.

This is not affecting OLM deployments for which LOG_LEVEL defaults to "INFO" level LOG_MODE to production

Copy link
Contributor

@guicassolato guicassolato Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. IDK. I think debug is OK maybe for local-deploy, but deploy is a legit target that blindly applies the deploy manifests to the kubectl context. It seems risky to enable debug by default. If this eventually ends up propagated to internal components config such as auth, it would leak sensitive data to the logs by default.

Maybe commenting this patch and let devs decide when to enable it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! we can no longer run make run to run locally (an actual process in your local machine) which was configured with debug level to see all details.

Are we OK with this? I would pretty much always run controllers locally when developing, and a development environment where we are forced to re-deploy an image is not ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy is configured to download some file from the operator. If envoy runs in kubernetes and the operator in your local machine.... what can we do to make that download happen?

I was also heavily using make run. But the decision to serve the binary from the operator makes it very hard to keep that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to keep if you want.. but does not work for all the use cases.

Copy link
Contributor Author

@eguzki eguzki Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. IDK. I think debug is OK maybe for local-deploy, but deploy is a legit target that blindly applies the deploy manifests to the kubectl context. It seems risky to enable debug by default. If this eventually ends up propagated to internal components config such as auth, it would leak sensitive data to the logs by default.

I do not think make deploy will ever be used outside development. Anyway, it is faster to update than discuss. I made the changes and make deploy does not change. make local-deploy patches the deployment to setup debug/development mode

Makefile Show resolved Hide resolved
@eguzki eguzki marked this pull request as draft June 13, 2024 15:56
@eguzki
Copy link
Contributor Author

eguzki commented Jun 13, 2024

Back to draft. Offline, enhancements were asked:

  • wasm binary download process add sha256 checksum check
  • The operator pushes the wasm binary into the envoy (using the provider API) instead of being pulled with a URL.

@eguzki
Copy link
Contributor Author

eguzki commented Jul 1, 2024

There is currently no easy way to push the wasm binary into Envoy container using Istio API or EnvoyGateway API.

Istio approach

When using Istio Wasm OCI image API, providing a image url, Istio's proxy (aka istio-agent which is a xds proxy living in the same container as Envoy) will fetch the image and validate the sha256 checksum and then, cache it locally. The envoy configuration will look like:

config:
  name: istio-system.kuadrant-istio-ingressgateway
  vmConfig:
    runtime: envoy.wasm.runtime.v8
    code:
      local:
        filename: /var/lib/istio/data/81221938ebcbc4550eb35f72aac65d0939d89335832b5a022226fb2526806e9e/676b5f025bb67d0993fa37dc6b4de18ca515938a5d30e7451ba745c3098e485c.wasm
    configuration: {}

Envoy Gateway approach

Recently merged envoyproxy/gateway#3564 with the Wasm OCI feature, describes as:

EG to download Wasm images from remote registries and serve them to the Envoy fleet via 
a local HTTP server inside EG running on 18002.

Kuadrant's available options

Therefore, the available options for kuadrant to distribute the wasm binary:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants