CFP: Cilium CLI connectivity tests speedup. #15

viktor-kurchenko · 2024-01-24T08:35:36Z

No description provided.

Signed-off-by: viktor-kurchenko <[email protected]>

christarazi

Proposal sounds good to me. As mentioned offline by Andre, it would be good to see a POC of how this would work. Namely, the aspect I have concerns about is that many of the connectivity tests configure the cluster in a specific way that may conflict with other test runs (such as policies, etc.). It would be good to understand how you propose to approach that problem.

viktor-kurchenko · 2024-01-25T09:32:04Z

Proposal sounds good to me. As mentioned offline by Andre, it would be good to see a POC of how this would work. Namely, the aspect I have concerns about is that many of the connectivity tests configure the cluster in a specific way that may conflict with other test runs (such as policies, etc.). It would be good to understand how you propose to approach that problem.

Thank you, @christarazi !

Yeah, it should be challenging but I want to try it.
Do you know what else except Cluster wide network policies can interfere with different namespaces?

christarazi · 2024-01-26T00:18:31Z

In general, any policy whether it's CNP or CCNP can interfere, especially if the workloads that they select are common amongst other policies. It seems like one approach could be to completely separate workloads via namespaces for each "group" of connectivity tests. This way the policies applied only have an effect within the namespaces that they are in, so therefore namespaces would be the separation barrier that allows parallelism.

fgiloux

It would be useful to people like me that the document includes a list, at a high level, of the strategies that have already been explored and why they are falling short.

fgiloux · 2024-01-29T08:27:36Z

cilium/CFP-1189-connectivity-tests-speedup.md

+
+* Group connectivity tests into independent sets that can be run concurrently.
+* Run each independent test set concurrently (all together or in batches).
+* Collect test results and display them periodically.


It looks like a different subject to me and it may make sense to have the CFP focused on test parallelisation.

It looks like a different subject to me and it may make sense to have the CFP focused on test parallelisation.

Maybe but if some tests are run in parallel how it will look from the user's perspective?
Output might be unreadable, isn't it?

I am not sure to follow. I have looked at a simple example. The curl test should be free of side effect. The flow validation, I am guessing so and for metrics validation I don't know.
Currently tests are run sequentially in the order they have been registered. They are run in a separate go routine but the loop read from a channel populated at the completion of a test case before going on with the next one:
https://github.com/cilium/cilium-cli/blob/main/connectivity/check/context.go#L402-L433
If we move the channel read outside of the loop we can still collect the results and present them in an ordered way. Is it what you mean? Or am I not getting your point?

Yes, that is what I meant.

fgiloux · 2024-01-29T08:35:08Z

cilium/CFP-1189-connectivity-tests-speedup.md

+## Goals
+
+* Group connectivity tests into independent sets that can be run concurrently.
+* Run each independent test set concurrently (all together or in batches).


Grouping in test sets that can be run concurrently may be a difficult exercise:

how do you know whether your test impacts a test from a different set?

when adding a new test you need to understand all the tests from the other sets to know whether your test can be run in parallel to them

Possible alternative approaches:

define a few configuration flavours, something like what is used in conformance e2e. Is this approach used consistently in other workflows, are there areas for improvement?

test filtering depending on the configuration, which is done in Cilium CLI.

group tests with code areas. As part of CI it may be acceptable to run a subset of the test suite for a localised change.

mark tests that are destructive, which can't be run in parallel, e.g Cilium, cluster update, uninstall, failure simulation, etc

do we have overlap between tests / workflows that we could reduce?

Non destructive tests, which share the same configuration flavour may then be run in parallel on the same cluster. Tests for a code areas, which have not been touched by a PR can be trimmed of the CI run.

viktor-kurchenko · 2024-01-29T14:11:00Z

It would be useful to people like me that the document includes a list, at a high level, of the strategies that have already been explored and why they are falling short.

Yeah, it would be useful for me as well.

viktor-kurchenko · 2024-02-02T18:21:40Z

I was thinking about PoC plan and realized that Cilium CLI can be used to run multiple tests in parallel (at least for testing).
The --test-namespace and --test parameters were used to validate the idea.

I've selected 46 tests (from EKS CNI conformance test workflow) and used the attached bash script to run the tests in batches/parallel.

You can find results in the table: https://docs.google.com/spreadsheets/d/1csmszEtlohqPpgMV8N_aJUI4yCoW8mPke46k-rG-Uec/edit?usp=sharing

Conclusions:

Cilium CLI has no tests that use CiliumClusterwideNetworkPolicy yet.
At least 46 selected tests can be run in parallel with no interference!!!
Practically, it won't be possible to run each test in a separate namespace due to a lot of pods and IPs allocation.
Ideally, CLI should create and verify all the required test namespaces/deployments only once initially.

Further steps (order might be different):

Rename --test-namespace parameter to --test-namespace-prefix.
Implement a new parameter: --test-parallel-runs with the default value: 1.
Move test namespace/deployments creation and verification logic before the test run function.
Implement tests grouping logic into batches with the --test-parallel-runs size.
Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).

Also, I was thinking about implementing this as a new CLI command (e.g.: cilium tests ..., maybe even hidden).
So, for some time we can have both old and new approaches with shared test sources and will be able to test and compare them without any impact.

CC: @aanm @christarazi @fgiloux @brlbil @michi-covalent.

christarazi · 2024-02-03T00:27:58Z

That sounds good to me.

Just one thing on

Cilium CLI has no tests that use CiliumClusterwideNetworkPolicy yet.

Soon I imagine #16 will get merged and we'll very likely have tests with CCNP, so it is something that we'll need to consider in this proposal.

brlbil · 2024-02-03T07:18:15Z

Sounds great,

Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).

One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered.

viktor-kurchenko · 2024-02-03T10:59:25Z

Sounds great,

Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).

One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered.

Thanks!
I've already tested this: cilium/images/conn-tests-concurrent-output.gif

xmulligan · 2024-08-09T09:35:55Z

@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status

viktor-kurchenko · 2024-08-09T19:44:05Z

@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status

@xmulligan I think the status should be: Released cilium/cilium-cli 0.16

CFP: Cilium CLI connectivity tests speedup.

c60275f

Signed-off-by: viktor-kurchenko <[email protected]>

christarazi reviewed Jan 24, 2024

View reviewed changes

fgiloux reviewed Jan 29, 2024

View reviewed changes

viktor-kurchenko mentioned this pull request Feb 21, 2024

Connectivity test factory component. cilium/cilium-cli#2322

Merged

michi-covalent approved these changes Aug 10, 2024

View reviewed changes

michi-covalent merged commit b869f50 into cilium:main Aug 10, 2024

xmulligan mentioned this pull request Aug 14, 2024

CFP-1189 update status #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CFP: Cilium CLI connectivity tests speedup. #15

CFP: Cilium CLI connectivity tests speedup. #15

viktor-kurchenko commented Jan 24, 2024

christarazi left a comment

viktor-kurchenko commented Jan 25, 2024

christarazi commented Jan 26, 2024

fgiloux left a comment

fgiloux Jan 29, 2024

viktor-kurchenko Jan 29, 2024

fgiloux Jan 29, 2024

viktor-kurchenko Feb 1, 2024

fgiloux Jan 29, 2024

viktor-kurchenko commented Jan 29, 2024

viktor-kurchenko commented Feb 2, 2024

christarazi commented Feb 3, 2024

brlbil commented Feb 3, 2024

viktor-kurchenko commented Feb 3, 2024

xmulligan commented Aug 9, 2024

viktor-kurchenko commented Aug 9, 2024

CFP: Cilium CLI connectivity tests speedup. #15

CFP: Cilium CLI connectivity tests speedup. #15

Conversation

viktor-kurchenko commented Jan 24, 2024

christarazi left a comment

Choose a reason for hiding this comment

viktor-kurchenko commented Jan 25, 2024

christarazi commented Jan 26, 2024

fgiloux left a comment

Choose a reason for hiding this comment

fgiloux Jan 29, 2024

Choose a reason for hiding this comment

viktor-kurchenko Jan 29, 2024

Choose a reason for hiding this comment

fgiloux Jan 29, 2024

Choose a reason for hiding this comment

viktor-kurchenko Feb 1, 2024

Choose a reason for hiding this comment

fgiloux Jan 29, 2024

Choose a reason for hiding this comment

viktor-kurchenko commented Jan 29, 2024

viktor-kurchenko commented Feb 2, 2024

christarazi commented Feb 3, 2024

brlbil commented Feb 3, 2024

viktor-kurchenko commented Feb 3, 2024

xmulligan commented Aug 9, 2024

viktor-kurchenko commented Aug 9, 2024