-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get repository metadata from the cluster state doesn't throw an exception if a repo is missing #92914
Get repository metadata from the cluster state doesn't throw an exception if a repo is missing #92914
Conversation
Hi @gmarouli, I've created a changelog YAML for you. |
@elasticmachine run elasticsearch-ci/part-1 |
1 similar comment
@elasticmachine run elasticsearch-ci/part-1 |
Pinging @elastic/es-data-management (Team:Data Management) |
Since this is currently mainly touching the snapshots side of things, I think someone on the distributed team should take a look (they own snapshots). I do think though that we should be defensive on the SLM side and elide the missing repos from the fetching action, we can use the same check we use in |
@elasticmachine update branch |
I see your point. For me it's not that clear which one is best. I think it depends how we define A] Redefine it to B] The I think it depends on how we want to define the scope of |
I think option A (as implemented here) is the right approach. When we extended the get-snapshots action to work across repositories we designed it to report failures on a repository-by-repository basis. "Does not exist" is just another per-repository failure, but it's one that was missed in the implementation because it happens much earlier in the process than the other kinds of failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (one nit)
server/src/internalClusterTest/java/org/elasticsearch/snapshots/GetSnapshotsIT.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
@dakrone Given that the |
Thanks for following up on this Mary, I'm happy to leave this on the snapshot side. As something so we don't hit the bug again (perhaps in a different place), could you add an integration test that has two SLM policies, where the repository for one of the policies doesn't exist, and then run retention and ensure that snapshots are deleted from the repository that does exist? I don't mind if we want to add it either here or in a subsequent PR (up to you). |
Theoretically, it should be an easy test to write (famous last words). If that is indeed the case I would prefer to wrap it up here ;). I will give it a try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left one really minor comment, thanks for adding the test Mary!
// Run retention every second | ||
ClusterUpdateSettingsRequest req = new ClusterUpdateSettingsRequest(); | ||
req.persistentSettings(Settings.builder().put(LifecycleSettings.SLM_RETENTION_SCHEDULE, "*/1 * * * * ?")); | ||
try (XContentBuilder builder = jsonBuilder()) { | ||
req.toXContent(builder, ToXContent.EMPTY_PARAMS); | ||
Request r = new Request("PUT", "/_cluster/settings"); | ||
r.setJsonEntity(Strings.toString(builder)); | ||
Response updateSettingsResp = client().performRequest(r); | ||
assertAcked(updateSettingsResp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than run retention every second, you could manually invoke the retention (we have the execute retention API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doubting about this, but in second thought it will be better and more efficient. I will change it :)
💔 Backport failed
You can use sqren/backport to manually backport by running |
…tion if a repo is missing (elastic#92914) Change the implementation of `TransportGetRepositoriesAction#getRepositories` to report the found and missing repositories instead of throwing an exception when a repository is missing. When the get-snapshots action was extended to work across repositories, it was designed to report failures on a repository-by-repository basis. A missing repository is just another per-repository failure, so we report it but it doesn't interrupt the normal flow of the method. (cherry picked from commit 9cd0b38) # Conflicts: # server/src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java
…tion if a repo is missing (elastic#92914) Change the implementation of `TransportGetRepositoriesAction#getRepositories` to report the found and missing repositories instead of throwing an exception when a repository is missing. When the get-snapshots action was extended to work across repositories, it was designed to report failures on a repository-by-repository basis. A missing repository is just another per-repository failure, so we report it but it doesn't interrupt the normal flow of the method. (cherry picked from commit 9cd0b38) # Conflicts: # server/src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java
…tion if a repo is missing (#92914) (#93106) Change the implementation of `TransportGetRepositoriesAction#getRepositories` to report the found and missing repositories instead of throwing an exception when a repository is missing. When the get-snapshots action was extended to work across repositories, it was designed to report failures on a repository-by-repository basis. A missing repository is just another per-repository failure, so we report it but it doesn't interrupt the normal flow of the method. (cherry picked from commit 9cd0b38) # Conflicts: # server/src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…tion if a repo is missing (elastic#92914) Change the implementation of `TransportGetRepositoriesAction#getRepositories` to report the found and missing repositories instead of throwing an exception when a repository is missing. When the get-snapshots action was extended to work across repositories, it was designed to report failures on a repository-by-repository basis. A missing repository is just another per-repository failure, so we report it but it doesn't interrupt the normal flow of the method. (cherry picked from commit 9cd0b38) # Conflicts: # server/src/main/java/org/elasticsearch/action/admin/cluster/repositories/get/TransportGetRepositoriesAction.java # server/src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java # x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/slm/SnapshotRetentionTask.java
…tion if a repo is missing (#92914) (#93108) Change the implementation of `TransportGetRepositoriesAction#getRepositories` to report the found and missing repositories instead of throwing an exception when a repository is missing. When the get-snapshots action was extended to work across repositories, it was designed to report failures on a repository-by-repository basis. A missing repository is just another per-repository failure, so we report it but it doesn't interrupt the normal flow of the method. (cherry picked from commit 9cd0b38) # Conflicts: # server/src/main/java/org/elasticsearch/action/admin/cluster/repositories/get/TransportGetRepositoriesAction.java # server/src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java # x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/slm/SnapshotRetentionTask.java
Expected situation
When retrieving snapshots by giving the names of multiple repositories, if a repository is missing we would like to retrieve the snapshots of the repositories that were found and report the other repositories as failures.
The bug
The bug was witnessed during the execution of SLM policies of which one referred to a missing repository. See #92849.
How we fixed it
In this PR we propose to change the function of
org.elasticsearch.action.admin.cluster.repositories.get.TransportGetRepositoriesAction#getRepositories
to return the found repositories and the missing ones instead of throwing an exception when it encounters a missing repository.Based on this change we adjusted the logic in the following places:
GetRepositoriesAction
now mentions in the exception all the missing repositories. For example:Instead of only one.
SnapshotRetentionTask
will now receive the snapshots from the repositories that can be retrieved and it will log in the debug level the missing repos.Fixes: #92849