-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] NullPointerException in delete_by_query
when index is deleted
#8418
Comments
Looks like a legit bug. Do you think you could reproduce this in a YAML REST test? |
@dblock from briefly looking at the YAML test framework it seems the requests all run sequentially? To repro this the index removal would need to run concurrent with the (Note that even if the steps are run concurrently, I wouldn't be surprised if there's not enough data to make the
|
I don't love the idea that the test may sometimes pass and sometimes fail. Maybe these YAML tests simply can't express this scenario and we're trying too hard? |
I agree, reproducing the race condition is probably more work than it's worth. There's probably a unit of logic here that needs to handle the index-no-longer-exists case more gracefully. Maybe this is |
@blampe I really don't know this code, if you want to keep digging I would focus on some test that reproduces this problem 100% of the time, it can be pretty low level - if it's very hard to test then maybe it should be telling something - anyway I am spitballing |
It should be possible to create a deterministic reproduction, but it will likely require a test that injects latches or something that isn't possible at the YAML test level. It should be possible as in integration test though (sorry I can't be more specific without diving a lot deeper into this issue). |
opensearch-project/index-management#855 might address the underlying issue. I imagine a library dependency needs to get bumped to pull in that fix, but I'm not sure how to do that. (If this is using index-management 2.x, the fix is being back ported in opensearch-project/index-management#871.) |
Describe the bug
delete_by_query
with a wildcard pattern can return a 500 if an underlying index is deleted while the query is running.I've seen two stack traces which seem to reflect the different phases of the
delete_by_query
that can encounter the race condition:To Reproduce
This has been reproducible in our production setup which consists of a steady stream of
delete_by_query
operations and occasional alias updates (concurrent with the deletes). The fact that there's an alias involved may or may not be relevant -- I haven't attempted to reproduce the issue with a plain index delete operation.High-level steps to reproduce the behavior:
foo-1
andfoo-2
.foo
containingfoo-1
andfoo-2
.a. Perform a
delete_by_query
onfoo*
(we useslices=auto
andconflict=proceed
).b. Update the
foo
alias with{"actions": [{"remove_index": {"index": "foo-2"}}]}
.Expected behavior
I would expect at least a non-500 status code. Perhaps a 404 in the case where
ignore_unavailable
isfalse
and a 200 in the case whereignore_unavailable
istrue
.Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Setting
ignore_unavailable
seems to have no impact on the behavior.The text was updated successfully, but these errors were encountered: