From e0b397078caaba702ae5ea87881d98da1a686e89 Mon Sep 17 00:00:00 2001 From: Benjamin Jee Date: Fri, 7 Jun 2024 11:40:41 -0700 Subject: [PATCH] Add results for reconfig test --- tests/reconfig/results/1.3.0/1.3.0.md | 110 ++++++++++++++++++++++++++ tests/reconfig/setup.md | 6 +- 2 files changed, 113 insertions(+), 3 deletions(-) create mode 100644 tests/reconfig/results/1.3.0/1.3.0.md diff --git a/tests/reconfig/results/1.3.0/1.3.0.md b/tests/reconfig/results/1.3.0/1.3.0.md new file mode 100644 index 0000000000..4ccb9f38f2 --- /dev/null +++ b/tests/reconfig/results/1.3.0/1.3.0.md @@ -0,0 +1,110 @@ +# Reconfiguration testing Results + + +- [Reconfiguration testing Results](#reconfiguration-testing-results) + - [Summary](#summary) + - [Test environment](#test-environment) + - [Results Tables](#results-tables) + - [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready) + - [Event Batch Processing](#event-batch-processing) + - [NumResources to Total Resources](#numresources-to-total-resources) + - [Observations](#observations) + - [Future Improvements](#future-improvements) + + +## Summary + +- Due to fix https://github.com/nginxinc/nginx-gateway-fabric/issues/1107, time to ready, reload time, and event batch processing + time increased for all 150 resource tests. +- For all 30 resource tests, results were mostly consistent to prior results. + +## Test environment + +GKE cluster: + +- Node count: 3 +- Instance Type: e2-medium +- k8s version: 1.28.9-gke.1000000 +- Zone: us-central1-c +- Total vCPUs: 6 +- Total RAM: 12GB +- Max pods per node: 110 + +NGF deployment: + +- NGF version: edge - git commit 7c9bf23ed89861c9ce7b725f2c1686f4c24ef2f9 +- NGINX OSS Version: 1.27.0 +- NGINX Plus Version: R32 + +## Results Tables + +### NGINX Reloads and Time to Ready + +#### OSS + +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| +| 1 | 30 | 2 | <1 | 2 | 190 | 100% | 100% | +| 1 | 150 | 2 | <1 | 2 | 542 | 50% | 100% | +| 2 | 30 | 37 | <1 | 94 | 169 | 100% | 100% | +| 2 | 150 | 204 | <1 | 387 | 326 | 88% | 100% | +| 3 | 30 | <1 | <1 | 94 | 129 | 100% | 100% | +| 3 | 150 | <1 | <1 | 454 | 130 | 100% | 100% | + +#### Plus + +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| +| 1 | 30 | 1 | <1 | 2 | 220.5 | 100% | 100% | +| 1 | 150 | 1.5 | <1 | 2 | 528.5 | 50% | 100% | +| 2 | 30 | 41 | <1 | 94 | 176.8 | 100% | 100% | +| 2 | 150 | 199 | <1 | 391 | 320.56 | 94.1% | 100% | +| 3 | 30 | <1 | <1 | 94 | 128.5 | 100% | 100% | +| 3 | 150 | <1 | <1 | 454 | 129.2 | 100% | 100% | + +### Event Batch Processing + +#### OSS + +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms | +|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------| +| 1 | 30 | 5 | 726.6 | 80% | 80% | 100% | 100% | 100% | +| 1 | 150 | 5 | 4457 | 40% | 80% | 80% | 80% | 100% | +| 2 | 30 | 371 | 59.5 | 99.7% | 100% | 100% | 100% | 100% | +| 2 | 150 | 1742 | 93.5 | 92.9% | 99.99% | 100% | 100% | 100% | +| 3 | 30 | 370 | 43.9 | 99.85% | 99.85% | 100% | 100% | 100% | +| 3 | 150 | 1810 | 44.8 | 99.99% | 99.99% | 99.99% | 100% | 100% | + +#### Plus + +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms | +|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|--------------| +| 1 | 30 | 6 | 84 | 100% | 100% | 100% | 100% | 100% | +| 1 | 150 | 5 | 4544.3 | 40% | 80% | 80% | 80% | 100% | +| 2 | 30 | 370 | 59.1 | 100% | 100% | 100% | 100% | 100% | +| 2 | 150 | 1747 | 93.2 | 94.1% | 99.99% | 100% | 100% | 100% | +| 3 | 30 | 370 | 41.33 | 99.99% | 99.99% | 100% | 100% | 100% | +| 3 | 150 | 1809 | 44.88 | 99.99% | 99.99% | 99.99% | 99.99% | 100% | + +## NumResources to Total Resources + +| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources | +|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|-----------------| +| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | | +| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 | +| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 | + +## Observations + +1. Reload time and time to ready have increased in 150 resource tests. This is probably due, in part, to the fix of https://github.com/nginxinc/nginx-gateway-fabric/issues/1107 causing the prior + test to only attach 2x of the HTTPRoutes while this test attaches all of them. In the 30 resource tests, results were mostly consistent to prior results. + +2. Event batch processing time increased notably in the 150 resource tests, probably for the same reason mentioned in observation #1. + In the 30 resource tests, results were mostly consistent to prior results. + +3. No errors in the logs. + + +## Future Improvements + +None. diff --git a/tests/reconfig/setup.md b/tests/reconfig/setup.md index 46a9c6f716..8729115544 100644 --- a/tests/reconfig/setup.md +++ b/tests/reconfig/setup.md @@ -43,7 +43,7 @@ The following cluster will be sufficient: ```console helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --version 0.0.0-edge \ - --create-namespace --wait -n nginx-gateway --set nginxGateway.config.logging.level=debug + --create-namespace --wait -n nginx-gateway --set nginxGateway.productTelemetry.enable=false ``` 4. Run tests: @@ -67,7 +67,7 @@ The following cluster will be sufficient: Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`: ```console - GW_POD=$(k get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p') + GW_POD=$(kubectl get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p') kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 & ``` @@ -105,7 +105,7 @@ The following cluster will be sufficient: 2. Run the provided script with the required number of resources, e.g. `cd scripts && bash create-resources-routes-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 Secret, 1 RefGrant, and HTTPRoutes at the same time. - 3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update -> final + 3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update (logs will say "reconciling") -> final config written and NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section. ### Test 3: Start NGF, create many resources attached to a Gateway, deploy the Gateway