-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for Envoy managed by Crossover #386
Conversation
4b5c872
to
1f2f3eb
Compare
Assumes `envoy:smi` as the mesh provider name as I've successfully tested the progressive delivery for Envoy + Crossover with it. This enhances Flagger to translate it to the metrics provider name of `envoy` for deployment targets, or `envoy:service` for service targets. The `envoy` metrics provider is equivalent to `appmesh`, as both relies on the same set of standard metrics exposed by Envoy itself. The `envoy:service` is almost the same as the `envoy` provider, but removing the condition on pod name, as we only need to filter on the backing service name = envoy_cluster_name. We don't consider other Envoy xDS implementations that uses anything that is different to original servicen ames as `envoy_cluster_name`, for now. Ref fluxcd#385
1f2f3eb
to
a828524
Compare
Can you please add a section to the PR and explain how was this tested? The manual testing should cover the tutorial. You could create a Helm repository for crossover with GitHub pages or use the Flagger one by adding the crossover chart to the charts directory. The chart could have a dependency on the Envoy stable one. |
url: http://flagger-loadtester.test/ | ||
timeout: 5s | ||
metadata: | ||
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should point to the Envoy's ClusterIP otherwise there will be no metrics data. See NGINX and Gloo examples.
Generate HTTP 500 errors: | ||
|
||
```bash | ||
hey -z 1m -c 5 -q 5 http://podinfo-canary.test:9898/status/500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should point to the Envoy's ClusterIP
Generate latency: | ||
|
||
```bash | ||
watch -n 1 curl http://podinfo-canary.test:9898/delay/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should point to the Envoy's ClusterIP
pkg/metrics/envoy.go
Outdated
@@ -20,7 +20,7 @@ var envoyQueries = map[string]string{ | |||
rate( | |||
envoy_cluster_upstream_rq{ | |||
kubernetes_namespace="{{ .Namespace }}", | |||
kubernetes_pod_name=~"{{ .Name }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)" | |||
envoy_cluster_name=~"{{ .Name }}-canary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks App Mesh, the envoy_cluster_name doesn't have this format. Please create a crossover metrics observer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've discussed with stefan about this in Slack and the fix has been pushed.
url: http://flagger-loadtester.test/ | ||
timeout: 5s | ||
metadata: | ||
cmd: "hey -z 1m -q 10 -c 2 http://envoy.test:10000/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no host header in here, did you test it with multiple TrafficSplit objects? I don't see how this will hit the right service
To not break AppMesh integration.
Updated the pull request description to match the current state and the scope of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Awesome work! Thanks @mumoshu
This is the first set of changes to address #385, including:
How this was tested
I've literally run every step described in the guide almost as-is. The only difference is that I've used my own flagger built from this branch with:
With docker builds:
This is how you can see traffic actually shifting from canary to primary in the dashboard:
Future works
My plan is to submit a separate pull request for finishing AppMesh with Service target kind with some documentation, as it seemed to take much more time for me.
Also, I'm going to improve the experience around installing Envoy with Crossover in a separate pull request. It seems messy now. Probably I'll publish an umbrella chart so that one can install it without git and a huge values.yaml file.Done.