-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFP: Cilium CLI connectivity tests speedup. #15
CFP: Cilium CLI connectivity tests speedup. #15
Conversation
Signed-off-by: viktor-kurchenko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal sounds good to me. As mentioned offline by Andre, it would be good to see a POC of how this would work. Namely, the aspect I have concerns about is that many of the connectivity tests configure the cluster in a specific way that may conflict with other test runs (such as policies, etc.). It would be good to understand how you propose to approach that problem.
Thank you, @christarazi ! Yeah, it should be challenging but I want to try it. |
In general, any policy whether it's CNP or CCNP can interfere, especially if the workloads that they select are common amongst other policies. It seems like one approach could be to completely separate workloads via namespaces for each "group" of connectivity tests. This way the policies applied only have an effect within the namespaces that they are in, so therefore namespaces would be the separation barrier that allows parallelism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to people like me that the document includes a list, at a high level, of the strategies that have already been explored and why they are falling short.
|
||
* Group connectivity tests into independent sets that can be run concurrently. | ||
* Run each independent test set concurrently (all together or in batches). | ||
* Collect test results and display them periodically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a different subject to me and it may make sense to have the CFP focused on test parallelisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a different subject to me and it may make sense to have the CFP focused on test parallelisation.
Maybe but if some tests are run in parallel how it will look from the user's perspective?
Output might be unreadable, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to follow. I have looked at a simple example. The curl test should be free of side effect. The flow validation, I am guessing so and for metrics validation I don't know.
Currently tests are run sequentially in the order they have been registered. They are run in a separate go routine but the loop read from a channel populated at the completion of a test case before going on with the next one:
https://github.com/cilium/cilium-cli/blob/main/connectivity/check/context.go#L402-L433
If we move the channel read outside of the loop we can still collect the results and present them in an ordered way. Is it what you mean? Or am I not getting your point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is what I meant.
## Goals | ||
|
||
* Group connectivity tests into independent sets that can be run concurrently. | ||
* Run each independent test set concurrently (all together or in batches). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping in test sets that can be run concurrently may be a difficult exercise:
- how do you know whether your test impacts a test from a different set?
- when adding a new test you need to understand all the tests from the other sets to know whether your test can be run in parallel to them
Possible alternative approaches:
- define a few configuration flavours, something like what is used in conformance e2e. Is this approach used consistently in other workflows, are there areas for improvement?
- test filtering depending on the configuration, which is done in Cilium CLI.
- group tests with code areas. As part of CI it may be acceptable to run a subset of the test suite for a localised change.
- mark tests that are destructive, which can't be run in parallel, e.g Cilium, cluster update, uninstall, failure simulation, etc
- do we have overlap between tests / workflows that we could reduce?
Non destructive tests, which share the same configuration flavour may then be run in parallel on the same cluster. Tests for a code areas, which have not been touched by a PR can be trimmed of the CI run.
Yeah, it would be useful for me as well. |
I was thinking about PoC plan and realized that Cilium CLI can be used to run multiple tests in parallel (at least for testing). I've selected 46 tests (from EKS CNI conformance test workflow) and used the attached bash script to run the tests in batches/parallel. You can find results in the table: https://docs.google.com/spreadsheets/d/1csmszEtlohqPpgMV8N_aJUI4yCoW8mPke46k-rG-Uec/edit?usp=sharing Conclusions:
Further steps (order might be different):
Also, I was thinking about implementing this as a new CLI command (e.g.: |
That sounds good to me. Just one thing on
Soon I imagine #16 will get merged and we'll very likely have tests with CCNP, so it is something that we'll need to consider in this proposal. |
Sounds great,
One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered. |
Thanks! |
@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status |
@xmulligan I think the status should be: |
No description provided.