Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows build and test support for 1.3 #2291

Merged
merged 17 commits into from
Dec 6, 2022

Conversation

peternied
Copy link
Member

@peternied peternied commented Dec 1, 2022

Description

Updating the github action to support build/test for windows. Note; this change will not include specific backporting. I know this is not desired, but there are some funky changes in our source history that are hard to resolve whereas checkout out specific files is much easier.

Details list of changes

Issues Resolved

Testing

Its all in the CI system!

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@peternied peternied self-assigned this Dec 1, 2022
.github/workflows/ci.yml Show resolved Hide resolved
strategy:
fail-fast: false
matrix:
jdk: [8, 11, 14]
jdk: [8, 11]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDK14 support? Temurin doesn't have an image for it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scrawfor99 Could you look into this? My current thinking is that we will merge it after this PR goes in - let me know what you run into

@peternied
Copy link
Member Author

Here is a quick report on the test failures, looks like some backports were missed, I'll include those changes directly in this pull request

Test Failures
JDK11 JDK8
Linux Windows Linux Windows
TestRun 1
HeapBasedRateTrackerTest. maxTwoTriesTest 1
TenantInfoActionTest. testTenantInfoAPI[1] 1
WebhookAuditLogTest. postGetHttpTest 1
HeapBasedRateTrackerTest. expiryTest 1
AuditApiActionTest. testApi[0] 1
DlsTest. testDlsWithMinDocCountZeroAggregations 1
SlowIntegrationTests. testCustomInterclusterRequestEvaluator 1
HttpIntegrationTests. testTenantInfo 2
BasicAuditlogTest. testRestMethod 2 3
TenantInfoActionTest. testTenantInfoAPI[0] 2 2 1 1
WebhookAuditLogTest. httpsTestPemDefault 7 1
WebhookAuditLogTest. httpsTestPemEndpoint 9 1
IntegrationTests. testSpecialUsernames 6 6
UtilTests. testEnvReplace 6 6
OpenSSLTest. testEnsureOpenSSLAvailability 6 6 6 6

Add CI for Windows and MacOS platforms

Due to the increase in the number of platforms, I've separated the newer integration tests into their own workflow.  Until retries have been enabled they will automatically pass - but still run and report logs.  As soon as we have full confidence we will allow them to start blocking pull requests from merging.  opensearch-project#2184

Switch the gradle commands to be platform agnostic via the `gradle/gradle-build-action@v2`, dropping the 'clean' step to the build which allows us to reuse the gradle cache - if we see any problems pulling in more recent snapshots we can disable this setting quickly.

Found and fixed an issued with config value replacement via environment variables, long story short Windows and MacOS allow for many more characters that are used in the unix environment variable landscape.  Added new tests to cover these interesting scenarios as well.

Found an encoding issue with user names from config files, still unclear of the source of this issue, be it test setup specific or a problem in the broader OpenSearch ecosystem, disabling the `testSpecialUsernames` until we can dive deeper. opensearch-project#2194

Disabled the HeapBasedRateTrackerTests - it was depending on system timing and was very brittle if the system under test experienced any undo load, created follow up issue opensearch-project#2193

Fixed a test issue in testDlsWithMinDocCountZeroAggregations where there was a random chance for a test failure, easier to find intermittent tests when they are run so often.

OpenSSL has open questions - while it is supported for our Linux JDK11 builds, it seems like a stopgap measure.  I've disabled the tests on windows environment while we determine if we should support OpenSSL at all on Windows JDK11.  opensearch-project#2195

Signed-off-by: Peter Nied <[email protected]>
Adds spotbugs [1] to detect internalization before they are added to the codebase, also fixed several encoding bugs that impact windows users.

[1] https://spotbugs.readthedocs.io/en/stable/index.html

Closes opensearch-project#2194

Signed-off-by: Peter Nied <[email protected]>
@peternied
Copy link
Member Author

Fixed those bottom 3 tests, they were all windows platform issues, the others might be random failuers. Next step will be looking into the WebhookAuditLogTest failures which seem to be new

@peternied
Copy link
Member Author

@scrawfor99 I know you have a couple PRs out for a couple of backports, you might want to rebase off of this pull request while testing or maybe I should pull some of these changes into this change - what do you think?

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Dec 2, 2022

I dumped the CI backport because I saw @cwperks already had one. I rebased the script changes onto your branch and PR'd against it.

@peternied
Copy link
Member Author

Looking much cleaner, Going to dig in on the WebhookAuditLogTest failures

Test Failures
JDK11 JDK8
Linux Windows Linux Windows
TestRun 2
HttpIntegrationTests. testTenantInfo 1
SecurityRestFilterTest. checkWhitelistedApisAreAccessible 1
BasicAuditlogTest. testRestMethod 2
SlowIntegrationTests. testDelayInSecurityIndexInitialization 6
WebhookAuditLogTest. httpsTestPemEndpoint 1 8
WebhookAuditLogTest. httpsTestPemDefault 1 8
WebhookAuditLogTest. postGetHttpTest 9 10

* Add signal/wait model for TestAuditlogImpl

I have been tracking test failures with testRestMethod very often
showing failures.  My theory is that the execution environment can
impact the order of operations sometimes causing the audit log not to
contain messages before it is checked.

Adding a new method `doThenWaitForMessages(...)` this ensures the
log queue is fresh, the triggering action completes, and the expected
number of messages were received.  There is a second long time window
that allows for the messages to be flushed, this is likely more than
enough - if the messages are received the count down latch immediately
continues execution so the tests will not wait if they are ready to
proceed.

While this new method is much more reliable not all tests were
encountering such issues, so I've keep the original convention.  This
can be migrated in one-offs or all at once if we see more troublesome
behavior.  The previous methods/fields are depreciated to push future
tests to follow the new pattern.

Modifications to the rest helper not to throw exceptions were needed to
keep the Runnable declaration clean and small.

Signed-off-by: Peter Nied <[email protected]>
@peternied
Copy link
Member Author

Audit log tests are flaky - pulling in targeted fixes for those test...

@peternied peternied marked this pull request as ready for review December 2, 2022 21:43
@peternied peternied requested a review from a team December 2, 2022 21:43
Signed-off-by: Peter Nied <[email protected]>
Copy link
Member Author

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added callouts from when non-trivial changes were made. Stylecop, Checkstyle, and license checks were applied from where changes were pulled, sorry I know that it is messy to review.

@@ -164,6 +165,7 @@ publishing {

tasks.withType(JavaCompile) {
options.encoding = 'UTF-8'
options.warnings = false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New: We are never going to fix the warnings emitted during compile in 1.3 and they were slowing me down

}
HttpResponse response = rh.executeGetRequest("_search?pretty", encodeBasicHeader("admin", "admin"));
assertThat(response.getStatusCode(), equalTo(HttpStatus.SC_OK));
}, 7);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New: 1.3 still uses the transport client, those interacts are not logged, so this number was altered from being 14 -> 7 because the 'read' actions were not included in the audit log (which is expected).

Also updated the verification on 261 to only expect the 'write' message types

}
final HttpResponse response = rh.executeGetRequest("_search?pretty", encodeBasicHeader("admin", "admin"));
Assert.assertEquals(HttpStatus.SC_OK, response.getStatusCode());
}, 18);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New: Way more audit messages this time because transport messages are included, there were 14 new messages, 7 transport writes, 7 transport reads, and the original 4 expected messages via REST

Ended up filtering out the transport messages on line 302 to keep the rest of the function identical

@@ -199,23 +199,7 @@ protected void initialize(ClusterInfo info, Settings initTransportClientSettings
Assert.assertEquals(info.numNodes, cur.getNodes().size());

SearchResponse sr = tc.search(new SearchRequest(".opendistro_security")).actionGet();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note; this is inline with main - all of these other requested slowed down how long startup took and these assertions aren't that useful since the plugin just throws exceptions if it couldn't create the security index.

cwperks
cwperks previously approved these changes Dec 5, 2022
Copy link
Member

@cwperks cwperks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 🎸

@@ -222,15 +224,16 @@ public void noServerRunningHttpTest() throws Exception {
public void postGetHttpTest() throws Exception {
TestHttpHandler handler = new TestHttpHandler();

int port = findFreePort();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

* Add jdk 14 to CI

Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: Stephen Crawford <[email protected]>
@peternied
Copy link
Member Author

The latest test failures were from SSLTest.testTransportClientSSLExternalContext which could have been from a network hiccup... tests are rerunning with JDK14 🤞

@peternied peternied merged commit 8e1adfa into opensearch-project:1.3 Dec 6, 2022
@peternied peternied deleted the 1.3-windows branch July 19, 2023 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants