-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures #1975
Comments
Taken from #3210 To reduce the flaky tests, we can run the Verification tasks in a parallel manner. The tests can be run on either Jenkins or GHA (only if the time is less here as Github provides a 2 x86_64 CPU cores and a 7 GB RAM instance). The current tasks are
As a POC when ran BWC test independently on GHA (smaller instance) there was an improvement in the time and a little less of timeout failures. Integration tests are currently tightly coupled in the codebase so to separate them out will be a huge work. |
Just had discussion with @peterzhuamazon @prudhvigodithi @zelinh around feasibility of having existing gradle check on github (for existing developer experience) but running the multiple jobs in different docker container. Each job runs mutually exclusive subset of verification tasks which are run with gradle check today. Today, gradle check runs a subset of verification tasks; say A, B, C, ...., Z. Proposal is to have a new tasks, say T1,..Tn, where each task run a subset of available verification tasks. |
@dreamer-89 Correction: |
Triggered a gradle check scan to get insights on existing tests and tasks. https://scans.gradle.com/s/ietqu6zf4ha3m Distribution of tasks in
|
Ideally tests related tasks (which brings in flakyness) in gradle check should be splitted based on type of tasks which provisions specific resources such as single,multi-node cluster, container etc so that different tasks have logical resource separation. The other option is to split based on individual projects (qa, plugins, server, modules etc) which makes split intuitive. |
From a quick analysis of existing flaky tests issues, 80%-90% of the tests falls in Next stepsWill run check & internalClusterTest task separately and compare build failure rate compared to when check runs everything. If resource contention with combined run is actual issue, then we should see lesser number of failures with split. OS: ubuntu |
Ran internalClusterTest separately 50 times where task failed 23 times out of which ~20 failed alone due to
|
Ran internalClusterTest task again after muting
|
Ran complete gradle check but got only 3 failures.
|
Separating out internalClusterTest from gradle check did not help much. On the contrary, running entire gradle check failed only thrice when run in continuation 50 times. Not sure if lesser failures can attribute to beefier instance type |
Running |
Please let me know if I need to switch from c518xlarge to c524xlarge. Thanks. |
c5.24xlargeHas only 3 failures out of 50 runs
c5.18xlargeJob running for more than a day. Instance/test is having issues which cause frequent test suite timeouts (20 mins) and overall job runs for > 1 hr.
|
Seems like I can start the next step by switching to c524xlarge. |
Coming from opensearch-project/opensearch-build#1572, @dreamer-89 I see you are assigned to this. Are you planning on to picking this up ? |
@saratvemulapalli : Thanks for sharing this. The pre-requisite for splitting gradle check was to have reduce flaky test failures (apologies issue title doesn't mention this). Probably we can track opensearch-project/opensearch-build#1572 on separate issue ? The next steps on this issue is to perform one more round of experiment to compare flaky test failures in |
Thanks for more information. @dreamer-89
|
@dreamer-89 can you share the meta issue as discussed above for this? |
Apologies @minalsha @saratvemulapalli for the delay on this. Created a meta issue for gradle improvements including gradle check split. #4053 |
@dreamer-89 can we close this in lieu of meta issue #4053 ? |
Thank you @minalsha for the question. Yes, we can close this issue in favour of #4053. Also, this issue tracked the improvements in gradle check failures with different task split and does not focus on splitting check task itself. From the exercises i performed, it is identified that breaking down gradle check into smaller sub tasks does not improve test failures. |
Today upon pull request, a gradle check verification task is started which runs unit test and integration tests along with other verification tasks. Gradle check run itself takes a lot of time (~30min on c5.18xlarge instances) consistently. This time increases exponentially when smaller machines are used for tests and is the reason for moving away from GHA for checks. This check fails quite often for a variety of reasons. Keeping the existing Gradle check as verification task is not sustainable in and thus needs to be divided into smaller logical task groups.
Specifically below work is needed:
Exit Criteria:
./gradlew check
with same existing test coverage.Describe alternatives you've considered
#2496
References:
The text was updated successfully, but these errors were encountered: