Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NGINX promql and custom metrics checks #174

Merged
merged 4 commits into from
May 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions artifacts/nginx/canary.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,20 @@ spec:
# percentage (0-100)
threshold: 99
interval: 1m
- name: request-duration
# maximum avg req duration
# milliseconds
threshold: 500
- name: "latency"
threshold: 0.5
interval: 1m
query: |
histogram_quantile(0.99,
sum(
rate(
http_request_duration_seconds_bucket{
kubernetes_namespace="test",
kubernetes_pod_name=~"podinfo-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)"
}[1m]
)
) by (le)
)
# external checks (optional)
webhooks:
- name: load-test
Expand Down
68 changes: 67 additions & 1 deletion docs/gitbook/usage/nginx-progressive-delivery.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ Install NGINX with Helm:
helm upgrade -i nginx-ingress stable/nginx-ingress \
--namespace ingress-nginx \
--set controller.stats.enabled=true \
--set controller.metrics.enabled=true
--set controller.metrics.enabled=true \
--set controller.podAnnotations."prometheus\.io/scrape"=true \
--set controller.podAnnotations."prometheus\.io/port"=10254
```

Install Flagger and the Prometheus add-on in the same namespace as NGINX:
Expand Down Expand Up @@ -276,6 +278,70 @@ Events:
Warning Synced 1m flagger Canary failed! Scaling down podinfo.test
```

### Custom metrics

The canary analysis can be extended with Prometheus queries.

The demo app is instrumented with Prometheus so you can create a custom check that will use the HTTP request duration
histogram to validate the canary.

Edit the canary analysis and add the following metric:

```yaml
canaryAnalysis:
metrics:
- name: "latency"
threshold: 0.5
interval: 1m
query: |
histogram_quantile(0.99,
sum(
rate(
http_request_duration_seconds_bucket{
kubernetes_namespace="test",
kubernetes_pod_name=~"podinfo-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)"
}[1m]
)
) by (le)
)
```

The threshold is set to 500ms so if the average request duration in the last minute
goes over half a second then the analysis will fail and the canary will not be promoted.

Trigger a canary deployment by updating the container image:

```bash
kubectl -n test set image deployment/podinfo \
podinfod=quay.io/stefanprodan/podinfo:1.4.3
```

Generate high response latency:

```bash
watch curl http://app.exmaple.com/delay/2
```

Watch Flagger logs:

```
kubectl -n nginx-ingress logs deployment/flagger -f | jq .msg

Starting canary deployment for podinfo.test
Advance podinfo.test canary weight 5
Advance podinfo.test canary weight 10
Advance podinfo.test canary weight 15
Halt podinfo.test advancement latency 1.20 > 0.5
Halt podinfo.test advancement latency 1.45 > 0.5
Halt podinfo.test advancement latency 1.60 > 0.5
Halt podinfo.test advancement latency 1.69 > 0.5
Halt podinfo.test advancement latency 1.70 > 0.5
Rolling back podinfo.test failed checks threshold reached 5
Canary failed! Scaling down podinfo.test
```

If you have Slack configured, Flagger will send a notification with the reason why the canary failed.

### A/B Testing

Besides weighted routing, Flagger can be configured to route traffic to the canary based on HTTP match conditions.
Expand Down
8 changes: 4 additions & 4 deletions pkg/metrics/nginx.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ import (

const nginxSuccessRateQuery = `
sum(rate(
nginx_ingress_controller_requests{kubernetes_namespace="{{ .Namespace }}",
nginx_ingress_controller_requests{namespace="{{ .Namespace }}",
ingress="{{ .Name }}",
status!~"5.*"}
[{{ .Interval }}]))
/
sum(rate(
nginx_ingress_controller_requests{kubernetes_namespace="{{ .Namespace }}",
nginx_ingress_controller_requests{namespace="{{ .Namespace }}",
ingress="{{ .Name }}"}
[{{ .Interval }}]))
* 100
Expand Down Expand Up @@ -68,10 +68,10 @@ func (c *Observer) GetNginxSuccessRate(name string, namespace string, metric str

const nginxRequestDurationQuery = `
sum(rate(
nginx_ingress_controller_ingress_upstream_latency_seconds_sum{kubernetes_namespace="{{ .Namespace }}",
nginx_ingress_controller_ingress_upstream_latency_seconds_sum{namespace="{{ .Namespace }}",
ingress="{{ .Name }}"}[{{ .Interval }}]))
/
sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_count{kubernetes_namespace="{{ .Namespace }}",
sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_count{namespace="{{ .Namespace }}",
ingress="{{ .Name }}"}[{{ .Interval }}])) * 1000
`

Expand Down
4 changes: 2 additions & 2 deletions pkg/metrics/nginx_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ func Test_NginxSuccessRateQueryRender(t *testing.T) {
t.Fatal(err)
}

expected := `sum(rate(nginx_ingress_controller_requests{kubernetes_namespace="nginx",ingress="podinfo",status!~"5.*"}[1m])) / sum(rate(nginx_ingress_controller_requests{kubernetes_namespace="nginx",ingress="podinfo"}[1m])) * 100`
expected := `sum(rate(nginx_ingress_controller_requests{namespace="nginx",ingress="podinfo",status!~"5.*"}[1m])) / sum(rate(nginx_ingress_controller_requests{namespace="nginx",ingress="podinfo"}[1m])) * 100`

if query != expected {
t.Errorf("\nGot %s \nWanted %s", query, expected)
Expand All @@ -43,7 +43,7 @@ func Test_NginxRequestDurationQueryRender(t *testing.T) {
t.Fatal(err)
}

expected := `sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_sum{kubernetes_namespace="nginx",ingress="podinfo"}[1m])) /sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_count{kubernetes_namespace="nginx",ingress="podinfo"}[1m])) * 1000`
expected := `sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_sum{namespace="nginx",ingress="podinfo"}[1m])) /sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_count{namespace="nginx",ingress="podinfo"}[1m])) * 1000`

if query != expected {
t.Errorf("\nGot %s \nWanted %s", query, expected)
Expand Down
4 changes: 3 additions & 1 deletion pkg/metrics/observer.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,9 @@ func (c *Observer) GetScalar(query string) (float64, error) {
query = strings.Replace(query, " ", "", -1)

var value *float64
result, err := c.queryMetric(query)

querySt := url.QueryEscape(query)
result, err := c.queryMetric(querySt)
if err != nil {
return 0, err
}
Expand Down