Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make reindexing managed by a persistent task #43382

Merged
merged 66 commits into from
Jul 18, 2019

Conversation

Tim-Brooks
Copy link
Contributor

This is related to #42612. Currently the reindexing transport action
creates a task on the local coordinator node. Unfortunately this is not
resilient to coordinator node failures. This commit adds a new action
that creates a reindexing job as a persistent task.

@Tim-Brooks Tim-Brooks added >enhancement v8.0.0 :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down labels Jun 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks Tim-Brooks changed the title Persistent reindex Make reindexing managed by a persistent task Jun 19, 2019
@Tim-Brooks Tim-Brooks changed the base branch from master to reindex_v2 June 19, 2019 16:38
@Tim-Brooks
Copy link
Contributor Author

This PR still needs a few cleanups. I will assign reviewers and comment when it is ready.

@Tim-Brooks
Copy link
Contributor Author

@henningandersen @ywelsch I have adjusted this PR for some change in the direction of returning the task-id from the allocated persistent task. I have not adjusted it to work with rethrottling yet. But it appears that the other test are passing.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This approach looks good. Apart from minor comments I think it is ready to merge to feature branch. Feel free to defer some of them to follow-ups if that is easier.

* task can't totally validate until it starts but this is better than
* nothing.
*/
ActionRequestValidationException validationException = internal.getReindexRequest().validate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go outside the if now.

}
}

public static class Response extends AcknowledgedResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we either return acknowledged=true or an exception. Could this just be an ActionResponse then?


import java.util.List;

public class TransportGetReindexJobTaskAction extends TransportTasksAction<ReindexTask, GetReindexJobTaskAction.Request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused now?

rethrottle(logger, clusterService.localNode().getId(), client, bulkByScrollTask, request.getRequestsPerSecond(), listener);
} else if (task instanceof ReindexTask) {
BulkByScrollTask childTask = null;
for (Task task1 : taskManager.getTasks().values()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task1 rename to childCandidate?

return jobException;
}

public TaskId getTaskId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to getEphemeralTaskId(), it could make call-site code a little easier to read (at least I mixed it up with the persistent task id).

@@ -196,7 +196,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
}
builder.field("running_time_in_nanos", runningTimeNanos);
builder.field("cancellable", cancellable);
if (parentTaskId.isSet()) {
if (parentTaskId.isSet() && parentTaskId.getNodeId().equals("cluster") == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us add a todo here, to ensure we revisit this. I think this means that cluster: parent ids no longer appear in the _tasks output, which could harm diagnostics/troubleshooting.

@Tim-Brooks
Copy link
Contributor Author

@ywelsch - This is probably ready for another look. I have address Henning's comments. The primary issue currently is due to a task submitting another task, there are some races with the child tasks being available for a follow-up rethrottle action. I have currently resolved this for testing purposes by a spin loop waiting for all the tasks to be available in Rethrottle action.

I think the fix is to extract the BulkByScrollTask and reindex logic out to a different place that allows the entire setup to be synchronous with the persistent task starting (so a single ephemeral task). However, this work will continue to make this PR larger and extend the scope. I guess I just want to know if we should include that in this PR, or add that as a meta issue (it would probably be the next task I would work on, but it would be in a separate PR).

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more minor comments, and one item (rethrottle) that we probably need to discuss

@Tim-Brooks Tim-Brooks requested a review from ywelsch July 17, 2019 16:57
@Tim-Brooks
Copy link
Contributor Author

@ywelsch I made the changes.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tim-Brooks Tim-Brooks merged commit 480c545 into elastic:reindex_v2 Jul 18, 2019
@Tim-Brooks Tim-Brooks deleted the persistent_reindex branch April 30, 2020 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down >enhancement v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants