-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943
Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943
Conversation
* The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of rafactoring the repository interaction, this added check fixes the test for now * closes elastic#38030
Pinging @elastic/es-security |
Jenkins run elasticsearch-ci/2 |
Jenkins run elasticsearch-ci/1 |
SnapshotsInProgress response = | ||
client().admin().cluster().state(new ClusterStateRequest()).get().getState().custom(SnapshotsInProgress.TYPE); | ||
assertThat(response.entries(), Matchers.empty()); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear you've got my rubber stamp LGTM , but shouldn't we fix the prepareSnapshotStatus(repo).setSnapshots(snapshot)
API to not return SUCCESS
if there is still an SnapshotsInProgress.TYPE
entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I think the snapshot status API needs fix so that it returns success consistently with respect to the SnapshotsInProgress.TYPE
metadata entry. If, for some obscure reason this is not possible, I believe a comment in the main code (similar to the one here) is required. If/when the change in main lands, I would remove this extra condition here.
@albertzaharovits thanks for taking a look :) You know ... I just realized these tests run against a single node => we shouldn't have any races here with the state like that => I'm double checking the solution, sorry for the noise! |
…-handle-snapshot-service
This reverts commit b3871f3.
…-handle-snapshot-service
Ok, this is a mess :) I was able to fix the API (at least for the single node case) via f7c4afb We have two APIs for getting the current list of snapshots,
|
@original-brownbear If I understand it correctly, every I would defer to others to lean on the API change. If you don't mind, I'll shirk away from the reviewers group. But, can't the test(s) use |
Sure no worries, thanks for the input so far!
Yea, but I figured we're seeing an actual bug in the API here, so why not fix it. We don't have other code dependent on this API it seems and it's just a 3 line change to get rid of a mean inconsistency here :)
I don't think we can really do anything on the client side here. To the client, the reason for |
Reusing the same enum for snapshots that are in progress and those that have completed has created this odd behavior where users can't distinguish between an in-progress snapshot that's in its finalization step and one where finalization was successfully completed. The same applies to the FAILED state as well. Even worse, we reuse the same enum across shard snapshot status and global snapshot status, which has caused even more confusion. I think it's a good time now to start separating these instead of adding a workaround. We should fix the test first without making changes to ES so that we can get test coverage back. Waiting for successful snapshot can be done by having the test first check whether |
…-handle-snapshot-service
…om:original-brownbear/elasticsearch into more-aggressive-cs-handle-snapshot-service
@ywelsch alright, no hack it is :) I just reverted to my initial check here and simply made the test verify that the snapshot is gone from the CS like we do in other tests. Can you take another look? |
@ywelsch ping :) |
* Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates elastic#40943 (comment) * Moved the contents of the class around a little so fields, constructors and nested classes/enums aren't all over the place especially now that we have yet another nested enum here * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks @ywelsch ! |
* master: (176 commits) Avoid unnecessary persistence of retention leases (elastic#42299) [ML][TEST] Fix limits in AutodetectMemoryLimitIT (elastic#42279) [ML Data Frame] Persist and restore checkpoint and position (elastic#41942) mute failing filerealm hash caching tests (elastic#42304) Safer Wait for Snapshot Success in ClusterPrivilegeTests (elastic#40943) Remove 7.0.2 (elastic#42282) Revert "Remove 7.0.2 (elastic#42282)" [DOCS] Copied note on slicing support to Slicing section. Closes 26114 (elastic#40426) Remove 7.0.2 (elastic#42282) Mute all ml_datafeed_crud rolling upgrade tests Move the FIPS configuration back to the build plugin (elastic#41989) Remove stray back tick that's messing up table format (elastic#41705) Add missing comma in code section (elastic#41678) add 7.1.1 and 6.8.1 versions (elastic#42253) Use spearate testkit dir for each run (elastic#42013) Add experimental and warnings to vector functions (elastic#42205) Fix version in tests since elastic#41906 was merged Bump version in BWC check after backport Prevent in-place downgrades and invalid upgrades (elastic#41731) Mute date_histo interval bwc test ...
* Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates #40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class
* master: (82 commits) Fix off-by-one error in an index shard test Cleanup Redundant BlobStoreFormat Class (elastic#42195) remove backcompat handling of 6.2.x versions (elastic#42044) Mute testDelayedOperationsBeforeAndAfterRelocated Execute actions under permit in primary mode only (elastic#42241) Mute another transforms_stats yaml test Deprecate support for chained multi-fields. (elastic#41926) Mute transforms_stats yaml test Make unwrapCorrupt Check Suppressed Ex. (elastic#41889) Remove Dead Code from Azure Repo Plugin (elastic#42178) Reorganize Painless doc structure (elastic#42303) Avoid unnecessary persistence of retention leases (elastic#42299) [ML][TEST] Fix limits in AutodetectMemoryLimitIT (elastic#42279) [ML Data Frame] Persist and restore checkpoint and position (elastic#41942) mute failing filerealm hash caching tests (elastic#42304) Safer Wait for Snapshot Success in ClusterPrivilegeTests (elastic#40943) Remove 7.0.2 (elastic#42282) Revert "Remove 7.0.2 (elastic#42282)" [DOCS] Copied note on slicing support to Slicing section. Closes 26114 (elastic#40426) Remove 7.0.2 (elastic#42282) ...
…ic#41940) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates elastic#40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class
* Safer Wait for Snapshot Success in ClusterPrivilegeTests * The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now * closes elastic#38030
* Safer Wait for Snapshot Success in ClusterPrivilegeTests * The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now * closes elastic#38030
…ic#41940) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates elastic#40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class
…2575) * Safer Wait for Snapshot Success in ClusterPrivilegeTests * The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now * closes #38030
… (#42573) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates #40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class
SUCCESS
means that there is still a snapshot ongoing (it must still be finalized in the repository). Hence this change translates theSUCCESS
state in the CS to in progress on the transport layer.