Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding test-retry plugin #456

Merged
merged 6 commits into from
Mar 31, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ plugins {
id 'nebula.ospackage' version "8.3.0" apply false
id "com.diffplug.gradle.spotless" version "3.26.1"
id 'java-library'
id 'org.gradle.test-retry' version '1.3.1'
}

tasks.withType(JavaCompile) {
Expand Down Expand Up @@ -148,8 +149,15 @@ def _numNodes = findProperty('numNodes') as Integer ?: 1

def opensearch_tmp_dir = rootProject.file('build/private/opensearch_tmp').absoluteFile
opensearch_tmp_dir.mkdirs()

boolean isCiServer = System.getenv().containsKey("CI")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with this value? Will this only work in github CI runners, or will it work in Jenkins hosts too? Ideally it's run in both so that we don't get surprises on test failures during infra builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value was recommended by https://github.com/gradle/test-retry-gradle-plugin documentation and gradle documentation recommends this line when running something only in CI. I tested it and it works on github actions, I am not sure about jenkins. I saw this also recommended on the opensearch-core PR if (BuildParams.isCi() == true) but wasn't used in the end. I was thinking maybe its okay if we don't retry on jenkins as that is run on a larger instance and I didn't AD backend was as flaky there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I think at least adding for github CI is helpful because of how it's usually run on lower-provisioned hosts compared to a local machine or Jenkins like you mentioned. And if it's still flaky, it will be exposed in those places.

test {
retry {
if (isCiServer) {
failOnPassedAfterRetry = false
maxRetries = 6
maxFailures = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't used test retry, just asking: if maxRetries is 6, the max failure should't exceed 7? Why set maxFailures = 10?

Copy link
Member

@ohltyler ohltyler Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the plugin details, maxFailures is about how many tests failed per run. Assuming the number is a lot, it is likely that there's other issues causing the tests to fail (e.g. cluster isn't coming up), besides flaky tests

}
}
include '**/*Tests.class'
systemProperty 'tests.security.manager', 'false'
}
Expand All @@ -162,6 +170,13 @@ task integTest(type: RestIntegTestTask) {
tasks.named("check").configure { dependsOn(integTest) }

integTest {
retry {
if (isCiServer) {
failOnPassedAfterRetry = false
maxRetries = 6
maxFailures = 10
}
}
dependsOn "bundlePlugin"
systemProperty 'tests.security.manager', 'false'
systemProperty 'java.io.tmpdir', opensearch_tmp_dir.absolutePath
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,7 @@ private void waitAllSyncheticDataIngested(int expectedSize, String datasetName,
request = new Request("POST", String.format(Locale.ROOT, "/%s/_refresh", datasetName));
client.performRequest(request);
}
Thread.sleep(1_000);
} while (maxWaitCycles-- >= 0);
}

Expand Down Expand Up @@ -592,26 +593,14 @@ private void indexTrainData(String datasetName, List<JsonObject> data, int train
throw new RuntimeException(e);
}
});
Thread.sleep(1_000);
Thread.sleep(3_000);
}

public void testRestartHCADDetector() throws Exception {
// TODO: this test case will run for a much longer time and timeout with security enabled
if (!isHttps()) {
int maxRetries = 3;
int i = 0;
for (; i < maxRetries; i++) {
try {
disableResourceNotFoundFaultTolerence();
verifyRestart("synthetic", 1, 8);
break;
} catch (Throwable throwable) {
LOG.info("Retry restart test case", throwable);
cleanUpCluster();
wipeAllODFEIndices();
}
}
assertTrue("failed all retries", i < maxRetries);
disableResourceNotFoundFaultTolerence();
verifyRestart("synthetic", 1, 8);
ohltyler marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down