Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures #1975

dreamer-89 · 2022-01-26T17:25:00Z

Today upon pull request, a gradle check verification task is started which runs unit test and integration tests along with other verification tasks. Gradle check run itself takes a lot of time (~30min on c5.18xlarge instances) consistently. This time increases exponentially when smaller machines are used for tests and is the reason for moving away from GHA for checks. This check fails quite often for a variety of reasons. Keeping the existing Gradle check as verification task is not sustainable in and thus needs to be divided into smaller logical task groups.

Specifically below work is needed:

Break down gradle check to logical tasks groups. This is needed because of
- Individual runs takes a lot of time
- Difficult to debug failures for new developers
- Not possible to repro on developer machines.
Move individual group tasks into Github actions targeting <1hr runtime for each task. There is on-going work in this direction but due to complexity, runtime h/w needs of gradle check run, non-GHA options are considered
Evaluate macOS runners which provide better compute (2-core 7GB Ubuntu Vs 3 core 14GB macOS) virtual machines for resource intensive tests.

Exit Criteria:

Final set of jobs should take less time compared to existing gradle check
The final set of jobs should cover same or better test coverage.
It should still be possible to run ./gradlew check with same existing test coverage.

Describe alternatives you've considered
#2496

References:

The text was updated successfully, but these errors were encountered:

dreamer-89 · 2022-05-26T17:30:17Z

Taken from #3210

To reduce the flaky tests, we can run the Verification tasks in a parallel manner. The tests can be run on either Jenkins or GHA (only if the time is less here as Github provides a 2 x86_64 CPU cores and a 7 GB RAM instance). The current tasks are

Verification tasks
------------------
branchConsistency - Ensures this branch is internally consistent. For example, that versions constants match released versions.
bwcTest - Runs backwards compatibility tests.
check - Runs all checks.
distroTest.deb - Runs deb tests within vagrant
distroTest.deb-arm64 - Runs deb-arm64 tests within vagrant
distroTest.deb-no-jdk - Runs deb-no-jdk tests within vagrant
distroTest.docker-arm64 - Runs docker-arm64 tests within vagrant
distroTest.linux-archive - Runs linux-archive tests within vagrant
distroTest.linux-archive-arm64 - Runs linux-archive-arm64 tests within vagrant
distroTest.linux-archive-no-jdk - Runs linux-archive-no-jdk tests within vagrant
distroTest.rpm - Runs rpm tests within vagrant
distroTest.rpm-arm64 - Runs rpm-arm64 tests within vagrant
distroTest.rpm-no-jdk - Runs rpm-no-jdk tests within vagrant
distroTest.windows-archive - Runs windows-archive tests within vagrant
distroTest.windows-archive-arm64 - Runs windows-archive-arm64 tests within vagrant
distroTest.windows-archive-no-jdk - Runs windows-archive-no-jdk tests within vagrant
distroUpgradeTest.v2.0.0.deb - Runs v2.0.0.deb tests within vagrant
distroUpgradeTest.v2.0.0.deb-arm64 - Runs v2.0.0.deb-arm64 tests within vagrant
distroUpgradeTest.v2.0.0.rpm - Runs v2.0.0.rpm tests within vagrant
distroUpgradeTest.v2.0.0.rpm-arm64 - Runs v2.0.0.rpm-arm64 tests within vagrant
distroUpgradeTest.v2.1.0.deb - Runs v2.1.0.deb tests within vagrant
distroUpgradeTest.v2.1.0.deb-arm64 - Runs v2.1.0.deb-arm64 tests within vagrant
distroUpgradeTest.v2.1.0.rpm - Runs v2.1.0.rpm tests within vagrant
distroUpgradeTest.v2.1.0.rpm-arm64 - Runs v2.1.0.rpm-arm64 tests within vagrant
integTest - Runs rest tests against an opensearch cluster.
internalClusterTest
jacocoTestCoverageVerification - Verifies code coverage metrics based on specified rules for the test task.
jacocoTestReport - Generates code coverage report for the test task.
javaRestTest - Runs the REST tests against an external cluster
precommit - Runs all non-test checks
run - Runs opensearch in the foreground
spotlessApply - Applies code formatting steps to sourcecode in-place.
spotlessCheck - Checks that sourcecode satisfies formatting steps.
spotlessDiagnose
spotlessJava
spotlessJavaApply
spotlessJavaCheck
spotlessJavaDiagnose
test - Runs the test suite.
testAggregateTestReport - Generates aggregated test report.
yamlRestTest - Runs the REST tests against an external cluster

As a POC when ran BWC test independently on GHA (smaller instance) there was an improvement in the time and a little less of timeout failures.
Sample run with BWC Test: https://github.com/owaiskazi19/OpenSearch/runs/4705950671?check_suite_focus=true
Sample run without BWC Test: https://github.com/owaiskazi19/OpenSearch/runs/4807896738?check_suite_focus=true

Integration tests are currently tightly coupled in the codebase so to separate them out will be a huge work.
Ref: https://rnorth.org/posts/faster-parallel-github-builds/

dreamer-89 · 2022-06-02T17:23:17Z

Just had discussion with @peterzhuamazon @prudhvigodithi @zelinh around feasibility of having existing gradle check on github (for existing developer experience) but running the multiple jobs in different docker container. Each job runs mutually exclusive subset of verification tasks which are run with gradle check today.

Today, gradle check runs a subset of verification tasks; say A, B, C, ...., Z. Proposal is to have a new tasks, say T1,..Tn, where each task run a subset of available verification tasks.

peterzhuamazon · 2022-06-02T21:08:53Z

@dreamer-89 Correction:
it does not need to be multiple workflow, but multiple container within each run.

dreamer-89 · 2022-06-22T17:31:18Z

Triggered a gradle check scan to get insights on existing tests and tasks.

https://scans.gradle.com/s/ietqu6zf4ha3m

Distribution of tasks in graldlew check

Type            Count            Time
All tasks       3710	     4h 10m 17.634s	
Tasks avoided	156 (04.2%)	15.150s	
Tasks executed	2266 (61.1%)	4h 7m 56.431s
Lifecycle	789	1m      3.180s	
No source	340	   35.915s	
Skipped	        159	   26.958s

dreamer-89 · 2022-06-26T17:14:38Z

Ideally tests related tasks (which brings in flakyness) in gradle check should be splitted based on type of tasks which provisions specific resources such as single,multi-node cluster, container etc so that different tasks have logical resource separation. The other option is to split based on individual projects (qa, plugins, server, modules etc) which makes split intuitive.

dreamer-89 · 2022-06-26T17:18:10Z

From a quick analysis of existing flaky tests issues, 80%-90% of the tests falls in :server:internalClusterTest bucket and probably due to resource contention. This is identified previously here and here. This means that separating internalClusterTest task should reduce gradle check drastically and internalClusterTest should also fail less often.

Next steps

Will run check & internalClusterTest task separately and compare build failure rate compared to when check runs everything. If resource contention with combined run is actual issue, then we should see lesser number of failures with split.
1 ./gradlew check -x ':sever:internalClusterTest'
2 ./gradlew ':sever:internalClusterTest'

OS: ubuntu
Instance: c5.24xlarge

dreamer-89 · 2022-06-26T17:48:56Z

Ran internalClusterTest separately 50 times where task failed 23 times out of which ~20 failed alone due to testHighWatermarkNotExceeded. This test is notoriously flaky and muted. This needs more deep dive.

ubuntu@ip-172-31-41-165:~/OpenSearch$ for num in {0..50}; do printf "Iteration $num\n\n\n" >> /tmp/june25/temp_$num.txt; ./gradlew ':server:internalClusterTest' >> /tmp/june25/temp_$num.txt 2>&1; sleep 10     ; done;
...
...

dreamer-89 · 2022-06-27T02:07:11Z

Ran internalClusterTest task again after muting testHighWatermarkNotExceeded. This time the test failed only thrice.

ubuntu@ip-172-31-41-165:~$ grep "BUILD SUCCESSFUL" /tmp/june26/temp_* | wc -l
48
ubuntu@ip-172-31-41-165:~$ grep "BUILD FAILED" /tmp/june26/temp_* | wc -l
3

dreamer-89 · 2022-06-28T15:59:20Z

Ran complete gradle check but got only 3 failures.

for num in {0..50}; do printf "Iteration $num\n\n\n" >> /tmp/june27/temp_check_$num.txt; ./gradlew check  >> /tmp/june27/temp_check_$num.txt 2>&1; sleep 10     ; done;
...
ubuntu@ip-172-31-41-165:~$ grep "^BUILD SUCCESSFUL" /tmp/june27/temp_check_* | wc -l
48
ubuntu@ip-172-31-41-165:~$ grep "^BUILD FAILED" /tmp/june27/temp_check_* | wc -l
3

dreamer-89 · 2022-06-28T16:01:24Z

Separating out internalClusterTest from gradle check did not help much. On the contrary, running entire gradle check failed only thrice when run in continuation 50 times. Not sure if lesser failures can attribute to beefier instance type c5.24xlarge used for experimentation.

dreamer-89 · 2022-06-28T16:06:47Z

Running ./gradlew check again on both c5.24xlarge & c5.18xlarge to verify if instance type is linked to flakyness.

peterzhuamazon · 2022-06-28T22:43:16Z

Running ./gradlew check again on both c5.24xlarge & c5.18xlarge to verify if instance type is linked to flakyness.

Please let me know if I need to switch from c518xlarge to c524xlarge.

Thanks.

dreamer-89 · 2022-06-30T16:58:30Z

c5.24xlarge

Has only 3 failures out of 50 runs

ubuntu@ip-172-31-41-165:~$ grep "^BUILD FAILED" /tmp/june28/temp_check_* | wc -l
3
ubuntu@ip-172-31-41-165:~$ grep "^BUILD SUCCESSFUL" /tmp/june28/temp_check_* | wc -l
48

c5.18xlarge

Job running for more than a day. Instance/test is having issues which cause frequent test suite timeouts (20 mins) and overall job runs for > 1 hr. ps -ef shows bunch of fixture processes active even after terminating script running gradle check. Rebooting the instance and retrying again.

> Task :qa:full-cluster-restart:v2.0.2#oldClusterTest
org.opensearch.upgrades.FullClusterRestartIT > classMethod FAILED
    java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
        at __randomizedtesting.SeedInfo.seed([F0D8CAF88E6F7EAD]:0)

peterzhuamazon · 2022-06-30T20:59:11Z

c5.24xlarge

Has only 3 failures out of 50 runs
ubuntu@ip-172-31-41-165:~$ grep "^BUILD FAILED" /tmp/june28/temp_check_* | wc -l
3
ubuntu@ip-172-31-41-165:~$ grep "^BUILD SUCCESSFUL" /tmp/june28/temp_check_* | wc -l
48
c5.18xlarge

Job running for more than a day. Instance/test is having issues which cause frequent test suite timeouts (20 mins) and overall job runs for > 1 hr. ps -ef shows bunch of fixture processes active even after terminating script running gradle check. Rebooting the instance and retrying again.
> Task :qa:full-cluster-restart:v2.0.2#oldClusterTest
org.opensearch.upgrades.FullClusterRestartIT > classMethod FAILED
    java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
        at __randomizedtesting.SeedInfo.seed([F0D8CAF88E6F7EAD]:0)

Seems like I can start the next step by switching to c524xlarge.

saratvemulapalli · 2022-07-12T18:59:53Z

Coming from opensearch-project/opensearch-build#1572, @dreamer-89 I see you are assigned to this. Are you planning on to picking this up ?

dreamer-89 · 2022-07-12T19:08:29Z

opensearch-project/opensearch-build#1572

@saratvemulapalli : Thanks for sharing this. The pre-requisite for splitting gradle check was to have reduce flaky test failures (apologies issue title doesn't mention this). Probably we can track opensearch-project/opensearch-build#1572 on separate issue ?

The next steps on this issue is to perform one more round of experiment to compare flaky test failures in check Vs check-internalClusterTest & internalClusterTest on different h/w capacity machines.

saratvemulapalli · 2022-07-12T19:26:40Z

Thanks for more information. @dreamer-89
May be lets organize this (if we haven't already) and link them back to 1 meta issue:

Take care of flaky tests
Proposal to split gradle check in to multiple smaller tasks.
Add support for new tests in CI (starting with Packaging tests)

minalsha · 2022-07-19T19:02:06Z

@dreamer-89 can you share the meta issue as discussed above for this?

dreamer-89 · 2022-07-30T21:54:37Z

Apologies @minalsha @saratvemulapalli for the delay on this. Created a meta issue for gradle improvements including gradle check split. #4053

minalsha · 2022-11-22T20:02:49Z

@dreamer-89 can we close this in lieu of meta issue #4053 ?

dreamer-89 · 2022-11-23T01:41:42Z

@dreamer-89 can we close this in lieu of meta issue #4053 ?

Thank you @minalsha for the question. Yes, we can close this issue in favour of #4053. Also, this issue tracked the improvements in gradle check failures with different task split and does not focus on splitting check task itself. From the exercises i performed, it is identified that breaking down gradle check into smaller sub tasks does not improve test failures.

dblock added the untriaged label Feb 2, 2022

anasalkouz added enhancement Enhancement or improvement to existing feature or request Build Libraries & Interfaces and removed untriaged labels Feb 8, 2022

kotwanikunal assigned dreamer-89 May 16, 2022

dreamer-89 mentioned this issue May 16, 2022

[Proposal] Run gradle check tests in parallel #3210

Closed

dreamer-89 changed the title ~~Break down gradle check task to smaller maintainable verification tasks running on GHA~~ Break down gradle check task to smaller maintainable verification tasks May 16, 2022

kotwanikunal unassigned dreamer-89 Jun 2, 2022

tlfeng assigned dreamer-89 Jun 13, 2022

reta mentioned this issue Jul 11, 2022

Run packaging tests on CI infrastructure opensearch-project/opensearch-build#1572

Open

dreamer-89 mentioned this issue Jul 30, 2022

[Meta] gradle check improvements #4053

Open

5 tasks

dreamer-89 changed the title ~~Break down gradle check task to smaller maintainable verification tasks~~ Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures Aug 11, 2022

dblock mentioned this issue Nov 21, 2022

Add Windows CI on pull requests #1448

Open

14 tasks

dreamer-89 closed this as completed Nov 23, 2022

prudhvigodithi mentioned this issue Sep 10, 2024

Re-triggering failed gradle build should run only failed tests opensearch-project/opensearch-build#5010

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures #1975

Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures #1975

dreamer-89 commented Jan 26, 2022 •

edited

Loading

dreamer-89 commented May 26, 2022

dreamer-89 commented Jun 2, 2022

peterzhuamazon commented Jun 2, 2022

dreamer-89 commented Jun 22, 2022 •

edited

Loading

dreamer-89 commented Jun 26, 2022

dreamer-89 commented Jun 26, 2022 •

edited

Loading

dreamer-89 commented Jun 26, 2022 •

edited

Loading

dreamer-89 commented Jun 27, 2022 •

edited

Loading

dreamer-89 commented Jun 28, 2022

dreamer-89 commented Jun 28, 2022 •

edited

Loading

dreamer-89 commented Jun 28, 2022

peterzhuamazon commented Jun 28, 2022

dreamer-89 commented Jun 30, 2022

peterzhuamazon commented Jun 30, 2022

c5.24xlarge

c5.18xlarge

saratvemulapalli commented Jul 12, 2022

dreamer-89 commented Jul 12, 2022 •

edited

Loading

saratvemulapalli commented Jul 12, 2022

minalsha commented Jul 19, 2022

dreamer-89 commented Jul 30, 2022

minalsha commented Nov 22, 2022

dreamer-89 commented Nov 23, 2022

Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures #1975

Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures #1975

Comments

dreamer-89 commented Jan 26, 2022 • edited Loading

dreamer-89 commented May 26, 2022

dreamer-89 commented Jun 2, 2022

peterzhuamazon commented Jun 2, 2022

dreamer-89 commented Jun 22, 2022 • edited Loading

dreamer-89 commented Jun 26, 2022

dreamer-89 commented Jun 26, 2022 • edited Loading

Next steps

dreamer-89 commented Jun 26, 2022 • edited Loading

dreamer-89 commented Jun 27, 2022 • edited Loading

dreamer-89 commented Jun 28, 2022

dreamer-89 commented Jun 28, 2022 • edited Loading

dreamer-89 commented Jun 28, 2022

peterzhuamazon commented Jun 28, 2022

dreamer-89 commented Jun 30, 2022

c5.24xlarge

c5.18xlarge

peterzhuamazon commented Jun 30, 2022

c5.24xlarge

c5.18xlarge

saratvemulapalli commented Jul 12, 2022

dreamer-89 commented Jul 12, 2022 • edited Loading

saratvemulapalli commented Jul 12, 2022

minalsha commented Jul 19, 2022

dreamer-89 commented Jul 30, 2022

minalsha commented Nov 22, 2022

dreamer-89 commented Nov 23, 2022

dreamer-89 commented Jan 26, 2022 •

edited

Loading

dreamer-89 commented Jun 22, 2022 •

edited

Loading

dreamer-89 commented Jun 26, 2022 •

edited

Loading

dreamer-89 commented Jun 26, 2022 •

edited

Loading

dreamer-89 commented Jun 27, 2022 •

edited

Loading

dreamer-89 commented Jun 28, 2022 •

edited

Loading

dreamer-89 commented Jul 12, 2022 •

edited

Loading