Split Performance Tests into their Own Sub-Command #1111

learnitall · 2022-09-26T23:12:07Z

Proposal / RFE

Is your feature request related to a problem?

When running performance tests through cilium connectivity test --perf, all of the deployments required for the non-perf connectivity checks are created and validated, giving some extra bloat and execution time:

❯ ./cilium connectivity test --perf
ℹ️  Monitor aggregation detected, will skip some flow validation steps
⚠️  Each zone only has a single node - could impact the performance test results
--> (Needed for perf)⌛ [kind-kind] Waiting for deployments [perf-client] to become ready...
--> (Needed for perf)⌛ [kind-kind] Waiting for deployments [perf-server] to become ready...
⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/client-668b56fb44-swmr9 to appear...
⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/client2-bf46d8f5d-sb49v to appear...
⌛ [kind-kind] Waiting for pod cilium-test/client-668b56fb44-swmr9 to reach DNS server on cilium-test/echo-same-node-5557788d8-pvtr5 pod...
⌛ [kind-kind] Waiting for pod cilium-test/client2-bf46d8f5d-sb49v to reach DNS server on cilium-test/echo-same-node-5557788d8-pvtr5 pod...
⌛ [kind-kind] Waiting for pod cilium-test/client-668b56fb44-swmr9 to reach DNS server on cilium-test/echo-other-node-7dc896756-jsgr2 pod...
⌛ [kind-kind] Waiting for pod cilium-test/client2-bf46d8f5d-sb49v to reach DNS server on cilium-test/echo-other-node-7dc896756-jsgr2 pod...
⌛ [kind-kind] Waiting for pod cilium-test/client-668b56fb44-swmr9 to reach default/kubernetes service...
⌛ [kind-kind] Waiting for pod cilium-test/client2-bf46d8f5d-sb49v to reach default/kubernetes service...
--> (Needed for perf)⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/perf-client-6b76f6645-jvnk4 to appear...
--> (Needed for perf)⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/perf-client-other-node-55c54fb756-qhnlv to appear...
--> (Needed for perf)⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/perf-server-74dc77c7-2xbns to appear...
⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-7dc896756-jsgr2 to appear...
⌛ [kind-kind] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-5557788d8-pvtr5 to appear...
⌛ [kind-kind] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [kind-kind] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.5:30760 (cilium-test/echo-other-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.5:30547 (cilium-test/echo-same-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.2:30760 (cilium-test/echo-other-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.2:30547 (cilium-test/echo-same-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.3:30760 (cilium-test/echo-other-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.3:30547 (cilium-test/echo-same-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.4:30547 (cilium-test/echo-same-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 172.18.0.4:30760 (cilium-test/echo-other-node) to become ready...
ℹ️  Skipping IPCache check
🔭 Enabling Hubble telescope...
⚠️  Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused"
ℹ️  Expose Relay locally with:
   cilium hubble enable
   cilium hubble port-forward&
ℹ️  Cilium version: 1.12.2
🏃 Running tests...

[=] Test [network-perf]

Describe the solution you'd like

Splitting the performance tests into their own sub-command will help to alleviate this issue and others that may arise as performance tests are expanded upon. I understand that a performance test is technically a connectivity test, but there are two main reasons why I think this is worth doing:

Isolate performance test related options from connectivity test options. Performance test options are "namespaced" already anyways, prefixed with a --perf-. As the suite expands and is developed, more of these options may need to be added, such as for adding new test types or configuring the output format of test results. Having a split between performance options and non-performance related options in the output for a connectivity test --help might make things difficult to read in the CLI. Additionally, not all of the non-perf options are implemented for the performance tests when intuitively they can still apply. For instance, the --multi-cluster is a no-op.
Help with code organization. If we wanted to fix the issue where non-performance resources are created when toggling the performance tests, then when we're deploying resources in connectivity.ConnectivityTest.deploy we could check if performance tests are enabled in an if statement towards the top of the function, and then add a return statement at the end of the branch to prevent other resources from being created:

// deploy ensures the test Namespace, Services and Deployments are running on the cluster.
func (ct *ConnectivityTest) deploy(ctx context.Context) error {
    if ct.params.Perf {
        // deploy perf stuff
        return nil    
    }
}

(This if statement already exists, it's just in the middle of the function and doesn't have a return statement.)

But then we need to add one of these if statements at the top of connectivity.ConnectivityTest.validateDeployment since this function will check for the resources we skipped deploying earlier in deploy and fail the run:

// validateDeployment checks if the Deployments we created have the expected Pods in them.
func (ct *ConnectivityTest) validateDeployment(ctx context.Context) error {
    if ct.params.Perf {
        // validate perf stuff
        return nil
    }
}

This pattern is also repeated in connectivity.Run, which kicks off performance tests if they are enabled and returns when they are done:

func Run(ctx context.Context, ct *check.ConnectivityTest) error {
    // ---
    // some setup tasks
    // ---

    if ct.Params().Perf {
        // run perf stuff
        return nil
    }
}

All of this essentially splits our core functions for connectivity test in half: one part of performance and one part for the connectivity tests. If we need to split our functions up anyways, then I think it would make more sense to just split out the performance-related stuff into its own sub-command. This will help a lot with readability, and preventing and finding bugs. For instance, for some reason I get a timeout error when running performance tests on a fresh cluster. I have to invoke the command twice to get it going:

❯ kubectl delete ns cilium-test
namespace "cilium-test" deleted
❯ ./cilium connectivity test --perf
...
connectivity test failed: timeout reached waiting lookup for localhost from pod cilium-test/client2-bf46d8f5d-r22q8 to server on pod cilium-test/echo-same-node-5557788d8-vbrkz to succeed
❯ ./cilium connectivity test --perf  # now it works
...
ℹ️  Cilium version: 1.12.2
🏃 Running tests...

This doesn't happen for the connectivity tests themselves:

❯ kubectl delete ns cilium-test
namespace "cilium-test" deleted
❯ ./cilium connectivity test
...
ℹ️  Cilium version: 1.12.2
🏃 Running tests...

(I'm on kind version 0.15.0, using image kindest/node:v1.24.0, cilium-cli d9371d3, cilium v1.12.1, cluster has three workers and one control-plane node.)

I'd suggest we create a new folder under the root called benchmark and add a benchmark sub-command:

benchmark/
internal/cli/cmd/
    benchmark.go  <--- $ cilium benchmark

I think benchmark is a better choice than perf, as it's a bit more precise and may avoid confusion with profiling.

I'd love to take up this task, just want to see what folks think and get a thumbs up. Thanks! 🎉

CC: @christarazi

The text was updated successfully, but these errors were encountered:

Before, perf test was part of `connectivity test` subcommand. Now, perf test has separate `connectivity perf` subcommand. We allow to run both host and pod network perf test in a single command. We allow to test host-to-pod type of traffic too. Perf test now respects nodeSelector and allows to run for nodes in two different zones. Always force deploy test pods for perf test to respect nodeSelector. You can export perf test results in json format, which is compatible with perfdash. Fixes: #1111 Fixes: #2114 Signed-off-by: Marcel Zieba <[email protected]>

learnitall added the kind/feature New feature or request label Sep 26, 2022

learnitall mentioned this issue Sep 29, 2022

test/connectivity: Add early return when deploying perf resources #1115

Merged

marseel mentioned this issue Dec 8, 2023

Move perf test command to separate subcommand #2168

Merged

nebril closed this as completed in #2168 Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Performance Tests into their Own Sub-Command #1111

Split Performance Tests into their Own Sub-Command #1111

learnitall commented Sep 26, 2022

Split Performance Tests into their Own Sub-Command #1111

Split Performance Tests into their Own Sub-Command #1111

Comments

learnitall commented Sep 26, 2022

Proposal / RFE