Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate multi-node integ tests to their own job #458

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

dbwiddis
Copy link
Member

@dbwiddis dbwiddis commented Jan 26, 2024

Description

Multi-node integration tests were set to run on the same server as integration tests had just run on. On cluster startup, the nodes were reading the previous cluster state from the (dropped) single node and uses the same stored indices.

Additionally ML Commons recently made the default true for a feature that auto-deploys ML models after a node drop and recovery, so these recoveries were occurring.

While no guarantee separating this out will fix the flaky macOS tests, I've seen enough hints in the logs to suggest that leftover bits from previous tests are slowing startup and potentially contributing to test failures.

Issues Resolved

Might resolve flaky tests. At least will cut down on the debugging noise.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ad199e3) 71.88% compared to head (cd14863) 71.88%.

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #458   +/-   ##
=========================================
  Coverage     71.88%   71.88%           
  Complexity      620      620           
=========================================
  Files            78       78           
  Lines          3126     3126           
  Branches        236      236           
=========================================
  Hits           2247     2247           
  Misses          772      772           
  Partials        107      107           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this @dbwiddis. Great RCA.

.github/workflows/CI.yml Show resolved Hide resolved
.github/workflows/CI.yml Show resolved Hide resolved
@joshpalis
Copy link
Member

Additionally ML Commons recently made the default true for a feature that auto-deploys ML models after a node drop and recovery, so these recoveries were occurring.

Interesting, I had thought we we're preventing the auto-redeployment of ML Models via the cluster settings here during set up

@dbwiddis
Copy link
Member Author

Interesting, I had thought we we're preventing the auto-redeployment of ML Models via the cluster settings here during set up

Maybe that was a recent change that I missed when spelunking old logs.

But recovering indices and cluster state was happening and I'm pretty sure it impacted the timing of node startup.

@dbwiddis dbwiddis merged commit e2fdc10 into opensearch-project:main Jan 26, 2024
38 checks passed
@dbwiddis dbwiddis deleted the multi-node-integ branch January 26, 2024 17:58
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 26, 2024
Signed-off-by: Daniel Widdis <[email protected]>
(cherry picked from commit e2fdc10)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dbwiddis pushed a commit that referenced this pull request Jan 26, 2024
Separate multi-node integ tests to their own job (#458)


(cherry picked from commit e2fdc10)

Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Joshua Palis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport PRs to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants