Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Gradle check fails on 2.x. #5716

Closed
2 tasks done
mch2 opened this issue Jan 5, 2023 · 18 comments
Closed
2 tasks done

Windows Gradle check fails on 2.x. #5716

mch2 opened this issue Jan 5, 2023 · 18 comments
Labels
bug Something isn't working Build Libraries & Interfaces Priority-High >test-failure Test failure from CI, local build, etc. windows

Comments

@mch2
Copy link
Member

mch2 commented Jan 5, 2023

Nightly gradle checks have been consistently failing on Windows 2.x branch. Example: https://build.ci.opensearch.org/job/gradle-check/8810/#showFailuresLink. Check should pass.

@mch2 mch2 added bug Something isn't working untriaged labels Jan 5, 2023
@saratvemulapalli
Copy link
Member

Thanks @mch2 for opening this up. I'll take a look.
Trying to look at the latest runs:

https://build.ci.opensearch.org/job/gradle-check/8899/
https://build.ci.opensearch.org/job/gradle-check/8897/

@saratvemulapalli
Copy link
Member

Opened up an issue to get notified on failures: https://github.com/opensearch-project/OpenSearch/issues/5736

@mch2
Copy link
Member Author

mch2 commented Jan 6, 2023

I've re-run these three and they all pass on Windows 10.

[org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu](https://build.ci.opensearch.org/job/gradle-check/8810/testReport/junit/org.opensearch.search.backpressure/SearchBackpressureIT/testSearchShardTaskCancellationWithHighCpu/)
 [org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighHeapUsage](https://build.ci.opensearch.org/job/gradle-check/8810/testReport/junit/org.opensearch.search.backpressure/SearchBackpressureIT/testSearchShardTaskCancellationWithHighHeapUsage/)
[org.opensearch.action.admin.cluster.stats.ClusterStatsIT.testValuesSmokeScreen](https://build.ci.opensearch.org/job/gradle-check/8810/testReport/junit/org.opensearch.action.admin.cluster.stats/ClusterStatsIT/testValuesSmokeScreen/)

@minalsha minalsha removed the untriaged label Jan 6, 2023
@minalsha
Copy link
Contributor

minalsha commented Jan 9, 2023

@saratvemulapalli
Copy link
Member

Update: Yaml tests are running fine in local, feels like the infrastructure is seeing problems when we run the whole suit together.

@Poojita-Raj
Copy link
Contributor

Poojita-Raj commented Jan 10, 2023

Update - org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testNodesRemovedAfterZoneDecommission_ClusterManagerNotInToBeDecommissionedZone passing on local
org.opensearch.cluster.node.DiscoveryNodeRoleIT.testExplicitlyHasAdditionalRoles passing on local
org.opensearch.cluster.node.DiscoveryNodeRoleIT.testDefaultHasAdditionalRole passing on local

@saratvemulapalli saratvemulapalli added the v2.5.0 'Issues and PRs related to version v2.5.0' label Jan 10, 2023
@vibrantvarun
Copy link
Member

vibrantvarun commented Jan 10, 2023

All the tests are working fine on my local machine. Therefore, I was not able to reproduce the test failures.

@mch2
Copy link
Member Author

mch2 commented Jan 11, 2023

I've split apart check and run the following against 2.5 locally:

  1. yamlRestTest - passes. (this occasionally fails when run with check because the suite times out after 30m. We can bump the timeout here or run it by itself)
  2. :server:internalClusterTest - has ~2 failures per run, usually different tests. When running the failed tests individually they pass.
  3. internalClusterTest -x :server:internalClusterTest - passes. This is all the other internalClusterTests not in server project.
  4. :server:test without internalClusterTests - passes
  5. check without :server:test, internalClusterTest, yamlRestTest - passes (this includes bwc & everything else)

So :server:internalClusterTest is the issue right now. On my last run the two tests failing were:
ClusterStatsIT. testValuesSmokeScreen
IndexStatsIT. testFilterCacheStats

I have also seen this test fail occasionally because it can't clean up - TranslogTransferManagerTests:

java.io.IOException: Could not remove the following files (in the order of attempts): C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753008-001.ckp: java.nio.file.AccessDeniedException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753008-001.ckp C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753008-001.tlog: java.nio.file.AccessDeniedException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753008-001.tlog C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753008-002.tlog: java.nio.file.AccessDeniedException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753008-002.tlog C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753009-001.ckp: java.nio.file.AccessDeniedException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753009-001.ckp C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753009-001.tlog: java.nio.file.AccessDeniedException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753009-001.tlog C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753009-002.tlog: java.nio.file.AccessDeniedException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001\translog-6958809543753753009-002.tlog C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001: java.nio.file.DirectoryNotEmptyException: C:\Users\Administrator\OpenSearch\server\build\testrun\test\temp\org.opensearch.index.translog.transfer.TranslogTransferManagerTests_E68AEF178E965295-001 at __randomizedtesting.SeedInfo.seed([E68AEF178E965295]:0) at org.apache.lucene.util.IOUtils.rm(IOUtils.java:341) at org.apache.lucene.tests.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:209) at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at java.base/java.lang.Thread.run(Thread.java:833)

@minalsha
Copy link
Contributor

@mch2 Thank you for sharing the details above. What are the next steps post this analysis to achieve state of notification of tests failing and achieving a green CI?

@mch2
Copy link
Member Author

mch2 commented Jan 24, 2023

I am no longer seeing the failures originally cut with this ticket.
These the the test failures I see on manual runs for main and 2.x:

These are also next steps from CI set up:
opensearch-project/opensearch-build-libraries#113 - check running oom.
opensearch-project/opensearch-build-libraries#114

@minalsha
Copy link
Contributor

minalsha commented Jan 30, 2023

Hi @mch2 : qq: Are these tests failing only in 2.x? Can you confirm if they pass in 1.x?

Hi @ryanbogan: could you please update on the status for below as mentioned by #5716 (comment) :

  1. Test failure ReindexClientYamlTestSuiteIT #5930
  2. [BUG] TranslogTransferManagerTests consistently fail on Windows CI. #5923

@mch2
Copy link
Member Author

mch2 commented Jan 30, 2023

The reindex test exists on the 1.x line - https://github.com/opensearch-project/OpenSearch/blob/1.3/modules/reindex/src/yamlRestTest/resources/rest-api-spec/test/reindex/90_remote.yml#L384

The TranslogTransferManagerTests are not in 1.x.

@kotwanikunal
Copy link
Member

@mch2 I am assigning this issue to you. Please re-assign if you aren't looking into it anymore.

@kotwanikunal kotwanikunal assigned mch2 and unassigned kotwanikunal Feb 2, 2023
@minalsha
Copy link
Contributor

minalsha commented Feb 3, 2023

@kotwanikunal are these tests failing in recent nightly gradle checks for 2.x?

@kotwanikunal
Copy link
Member

@minalsha I was assigned this issue when I was the release manager for 2.5 but @mch2 was looking into it. I just reassigned it to Marc for any further updates.
Let me check if there are any changes/updates.

@mch2
Copy link
Member Author

mch2 commented Feb 3, 2023

I've updated the issue description with the remaining tests that are still failing. We can close this issue once those are resolved. We may encounter more failures on the path to green, once CI issues are resolved - opensearch-project/opensearch-build-libraries#113, but we can cut a new issue at that point.

@minalsha minalsha assigned vibrantvarun and unassigned mch2 Feb 3, 2023
@minalsha minalsha added Build Libraries & Interfaces and removed v2.5.0 'Issues and PRs related to version v2.5.0' labels Feb 3, 2023
@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jun 13, 2023

Hi @saratvemulapalli and @minalsha,

Is this issue all set to be closed?

@kotwanikunal
Copy link
Member

I think so. Closing it out. @mch2 Please re-open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Build Libraries & Interfaces Priority-High >test-failure Test failure from CI, local build, etc. windows
Projects
None yet
Development

No branches or pull requests

7 participants