Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes flaky continuous transforms and shrink tests #340

Merged
merged 2 commits into from
Apr 21, 2022

Conversation

downsrob
Copy link
Contributor

@downsrob downsrob commented Apr 20, 2022

Signed-off-by: Clay Downs [email protected]

Issue #, if available:
NA

Description of changes:

  • Fixes two continuous transforms flaky tests which were not waiting until the metadata was fully updated
  • Fixes some of the shrink ITs flakiness. These tests would sometimes fail on multi node setups because of the job scheduler index being moved between nodes. When the index was moved, all jobs would be descheduled so the force start window could be missed.
  • Shrink ITs were failing on remote tests against the release docker image as the docker images did not have enough free space to shrink an index and stay below the watermark thresholds. To fix this, now the shrink ITs set the watermark settings to be just a few bytes before each test.
  • When fixing the shrink ITs I noticed that the settings object passed in from the step context did not contain the watermark levels, so the default was always used. This was fixed by building a new settings object from the cluster settings to get these settings directly from the cluster state.
  • Clears indices and data streams after transforms and rollups test suites finish. Previously if the TransformRunnerITs or RollupRunnerIts were run twice in a row against the same remote cluster, the indices from the first run would stick around, messing up the second run.

CheckList:

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Clay Downs <[email protected]>
@downsrob downsrob marked this pull request as ready for review April 20, 2022 23:31
@downsrob downsrob requested review from a team, bowenlan-amzn, dbbaughe and thalurur April 20, 2022 23:31
@downsrob downsrob changed the title fixes flaky tests Fixes flaky continuous transforms and shrink tests Apr 20, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #340 (827428b) into main (3af62f5) will decrease coverage by 0.11%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main     #340      +/-   ##
============================================
- Coverage     75.16%   75.04%   -0.12%     
+ Complexity     2144     2135       -9     
============================================
  Files           262      262              
  Lines         12434    12444      +10     
  Branches       1966     1966              
============================================
- Hits           9346     9339       -7     
- Misses         2022     2032      +10     
- Partials       1066     1073       +7     
Impacted Files Coverage Δ
...atemanagement/step/shrink/AttemptMoveShardsStep.kt 55.55% <100.00%> (ø)
...exstatemanagement/step/shrink/AttemptShrinkStep.kt 50.50% <100.00%> (+9.09%) ⬆️
...xmanagement/indexstatemanagement/util/StepUtils.kt 80.64% <100.00%> (+4.74%) ⬆️
...statemanagement/model/destination/CustomWebhook.kt 67.14% <0.00%> (-28.58%) ⬇️
...nt/indexstatemanagement/model/destination/Slack.kt 55.55% <0.00%> (-22.23%) ⬇️
...arch/indexmanagement/rollup/RollupSearchService.kt 57.40% <0.00%> (-7.41%) ⬇️
...dexmanagement/transform/model/TransformMetadata.kt 89.32% <0.00%> (-2.92%) ⬇️
...anagement/indexstatemanagement/model/Transition.kt 63.01% <0.00%> (-2.74%) ⬇️
...earch/indexmanagement/transform/model/Transform.kt 85.47% <0.00%> (-0.43%) ⬇️
...nt/indexstatemanagement/ManagedIndexCoordinator.kt 75.84% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3af62f5...827428b. Read the comment docs.

Comment on lines +155 to +156
fun getDiskSettings(clusterSettings: ClusterSettings): Settings {
return Settings.builder().put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this (creating this almost empty Settings instance instead of passing the one that presumably has all the settings)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering the reason to do this is previously the settings passed in to DiskThresholdSettings are not right? The right way is to extract from clusterSettings?

Copy link
Contributor Author

@downsrob downsrob Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously used the settings object from the step context, as I had the same assumption that it would have all of the settings. Actually, the settings object only contains the settings initially passed into the runner, it does not seem to consume updates from the cluster settings automatically. To get up to date settings, we need to extract them from the clusterSettings. I wonder why we have this settings object in the step context at all, as these out of date settings aren't very useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I agree we need to remove settings from step context and only use clusterSettings to get update-to-date setting values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, maybe @thalurur has some input on it or it might have just been a miss that we were passing in a snapshot of the settings.

@downsrob downsrob merged commit 74dac63 into opensearch-project:main Apr 21, 2022
@downsrob downsrob deleted the flaky-test-fixes branch April 21, 2022 16:45
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 21, 2022
* fixes flaky tests

Signed-off-by: Clay Downs <[email protected]>

* Deletes data stream manually

Signed-off-by: Clay Downs <[email protected]>
(cherry picked from commit 74dac63)
downsrob added a commit that referenced this pull request Apr 21, 2022
* fixes flaky tests

Signed-off-by: Clay Downs <[email protected]>

* Deletes data stream manually

Signed-off-by: Clay Downs <[email protected]>
(cherry picked from commit 74dac63)

Co-authored-by: Clay Downs <[email protected]>
downsrob added a commit to downsrob/index-management that referenced this pull request Apr 29, 2022
…t#340)

* fixes flaky tests

Signed-off-by: Clay Downs <[email protected]>

* Deletes data stream manually

Signed-off-by: Clay Downs <[email protected]>
wuychn pushed a commit to ochprince/index-management that referenced this pull request Mar 16, 2023
…t#340)

* fixes flaky tests

Signed-off-by: Clay Downs <[email protected]>

* Deletes data stream manually

Signed-off-by: Clay Downs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants