Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for Envoy managed by Crossover #386

Merged
merged 13 commits into from
Dec 18, 2019

Conversation

mumoshu
Copy link
Contributor

@mumoshu mumoshu commented Nov 30, 2019

This is the first set of changes to address #385, including:

  • Metrics providers for Deployment and Service behind Envoy managed by Crossover
  • Metrics provider for Service behind Envoy/Crossover
  • A guide to use Envoy/Crossover with Deployment kind
  • Rename from the former Envoy observer to AppMesh observer

How this was tested

I've literally run every step described in the guide almost as-is. The only difference is that I've used my own flagger built from this branch with:

helm3 upgrade -i flagger flagger/flagger --namespace test \
  --set prometheus.install=true --set meshProvider=smi:envoy \
  --set image.tag=test-vXXX,image.repository=mumoshu/flagger

With docker builds:

export TAG=test-vXXX; \
  make build && \
  docker tag weaveworks/flagger:$TAG mumoshu/flagger:$TAG && \
  docker push mumoshu/flagger:$TAG && \
  helm3 upgrade -i flagger flagger/flagger --namespace test \
  --set prometheus.install=true --set meshProvider=smi:envoy \
  --set image.tag=$TAG,image.repository=mumoshu/flagger

This is how you can see traffic actually shifting from canary to primary in the dashboard:

image

Future works

My plan is to submit a separate pull request for finishing AppMesh with Service target kind with some documentation, as it seemed to take much more time for me.

Also, I'm going to improve the experience around installing Envoy with Crossover in a separate pull request. It seems messy now. Probably I'll publish an umbrella chart so that one can install it without git and a huge values.yaml file. Done.

@mumoshu mumoshu force-pushed the envoy-canary-analysis branch 2 times, most recently from 4b5c872 to 1f2f3eb Compare November 30, 2019 03:51
Assumes `envoy:smi` as the mesh provider name as I've successfully tested the progressive delivery for Envoy + Crossover with it.

This enhances Flagger to translate it to the metrics provider name of `envoy` for deployment targets, or `envoy:service` for service targets.

The `envoy` metrics provider is equivalent to `appmesh`, as both relies on the same set of standard metrics exposed by Envoy itself.

The `envoy:service` is almost the same as the `envoy` provider, but removing the condition on pod name, as we only need to filter on the backing service name = envoy_cluster_name. We don't consider other Envoy xDS implementations that uses anything that is different to original servicen ames as `envoy_cluster_name`, for now.

Ref fluxcd#385
@mumoshu mumoshu force-pushed the envoy-canary-analysis branch from 1f2f3eb to a828524 Compare November 30, 2019 04:03
@stefanprodan
Copy link
Member

Can you please add a section to the PR and explain how was this tested? The manual testing should cover the tutorial.

You could create a Helm repository for crossover with GitHub pages or use the Flagger one by adding the crossover chart to the charts directory. The chart could have a dependency on the Envoy stable one.

url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to the Envoy's ClusterIP otherwise there will be no metrics data. See NGINX and Gloo examples.

Generate HTTP 500 errors:

```bash
hey -z 1m -c 5 -q 5 http://podinfo-canary.test:9898/status/500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to the Envoy's ClusterIP

Generate latency:

```bash
watch -n 1 curl http://podinfo-canary.test:9898/delay/1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to the Envoy's ClusterIP

@@ -20,7 +20,7 @@ var envoyQueries = map[string]string{
rate(
envoy_cluster_upstream_rq{
kubernetes_namespace="{{ .Namespace }}",
kubernetes_pod_name=~"{{ .Name }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)"
envoy_cluster_name=~"{{ .Name }}-canary"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks App Mesh, the envoy_cluster_name doesn't have this format. Please create a crossover metrics observer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed with stefan about this in Slack and the fix has been pushed.

url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://envoy.test:10000/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no host header in here, did you test it with multiple TrafficSplit objects? I don't see how this will hit the right service

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 18, 2019

Updated the pull request description to match the current state and the scope of this.

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Awesome work! Thanks @mumoshu

@stefanprodan stefanprodan merged commit 968d67a into fluxcd:master Dec 18, 2019
@stefanprodan stefanprodan changed the title feat: Support for canary analysis on deployments and services behind Envoy feat: Add support for Envoy managed by Crossover Jan 6, 2020
@stefanprodan stefanprodan mentioned this pull request Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants