-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Concurrent Index Auto Create Failing Bulk Requests #82541
Fix Concurrent Index Auto Create Failing Bulk Requests #82541
Conversation
Batching these requests introduced a bug where auto-create requests for system indices would fail because system indices are always auto-created and thus throw resource-already-exists if multiple equal ones are batched together even though the index doesn't yet exist in the cluster state but only in the intermediary task executor state. This leads to bulk requests ignoring the exeception (thinking that the index already exists) in their auto-create callback when the request doesn't yet exist. Fixed by deduplicating these requests for now, added a TODO to do it a little nicer down the road but this fix is somewhat urgent as it breaks ML integ tests.
Pinging @elastic/es-data-management (Team:Data Management) |
builder.success(task); | ||
} catch (Exception e) { | ||
builder.failure(task, e); | ||
} | ||
} | ||
state = allocationService.reroute(state, "auto-create"); | ||
if (state != currentState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small optimization in case the full batch doesn't do anything no need to reroute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to the clusterStatePublished
event and pass it off to ClusterService#rerouteService
for batching with other pending reroutes too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about that. This adds one more CS update if no reroutes are pending and saves a reroute every now and then. A reroute is pretty fast these days though. With all the fixes we added recently I'm not sure it's worth it in 8.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's not trivially fast in a large cluster still so I think we still prefer batching. Yes it's one more publication in an otherwise-quiet cluster but that's not really where the problems arise.
ClusterState state = currentState; | ||
for (AckedClusterStateUpdateTask task : tasks) { | ||
final Map<CreateIndexRequest, CreateIndexTask> successfulRequests = new HashMap<>(tasks.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this and line 109 are good candidates to use var to simplify declaration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok but I left some thoughts and suggestions.
builder.success(task); | ||
} catch (Exception e) { | ||
builder.failure(task, e); | ||
} | ||
} | ||
state = allocationService.reroute(state, "auto-create"); | ||
if (state != currentState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to the clusterStatePublished
event and pass it off to ClusterService#rerouteService
for batching with other pending reroutes too?
private final class CreateIndexTask extends AckedClusterStateUpdateTask { | ||
|
||
final CreateIndexRequest request; | ||
final AtomicReference<String> indexNameRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be an AtomicReference
or could it just be a volatile
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly enough yes, with the weird way the listener is constructed here it currently seems like we need this.
request.masterNodeTimeout(), | ||
request.timeout(), | ||
false | ||
private final class CreateIndexTask extends AckedClusterStateUpdateTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we getting much benefit from the extends AckedClusterStateUpdateTask
here? I think if we dropped that and just implemented AckedClusterStateTaskListener
directly we wouldn't need to construct the listener
up front and could probably solve the TODO
about deduplicating the listeners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I agree, the main motivation of this PR is to unblock some broken tests quickly (also in ML QA). Are we good doing that kind of refactoring in a follow-up? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, maybe just leave a TODO here tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO added!
…stem-index-concurrency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks David + Ievgen! |
Batching these requests introduced a bug where auto-create requests for system
indices would fail because system indices are always auto-created and thus
throw resource-already-exists if multiple equal ones are batched together
even though the index doesn't yet exist in the cluster state but only
in the intermediary task executor state.
This leads to bulk requests ignoring the exeception (thinking that the index
already exists) in their auto-create callback when the request doesn't yet
exist.
Fixed by deduplicating these requests for now, added a TODO to do it a little
nicer down the road but this fix is somewhat urgent as it breaks ML integ
tests.
Non-issue as this hasn't been released yet.
relates #82159