Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CCR bootstrap from remote #35975

Closed
31 of 32 tasks
Tim-Brooks opened this issue Nov 28, 2018 · 6 comments
Closed
31 of 32 tasks

Implement CCR bootstrap from remote #35975

Tim-Brooks opened this issue Nov 28, 2018 · 6 comments
Assignees
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement Meta Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@Tim-Brooks
Copy link
Contributor

Tim-Brooks commented Nov 28, 2018

CCR Bootstrap from Remote

Pre-feature freeze

Post-feature freeze

@Tim-Brooks Tim-Brooks added Meta v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.6.0 labels Nov 28, 2018
@Tim-Brooks Tim-Brooks self-assigned this Nov 28, 2018
@Tim-Brooks Tim-Brooks changed the title Implement CCR bootstrap from Remote Implement CCR bootstrap from remote Nov 28, 2018
@cjcenizal
Copy link
Contributor

@tbrooks8 @sebelga and I were wondering if the API could provide any information about the bootstrapping process, which we could display in the UI? For example, whether or not bootstrapping is in progress, how many documents have been replicated, how many remain, and whether there have been any errors.

Tim-Brooks added a commit that referenced this issue Dec 7, 2018
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Dec 12, 2018
This is related to elastic#35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.
Tim-Brooks added a commit that referenced this issue Dec 12, 2018
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.
@Tim-Brooks Tim-Brooks added v6.7.0 and removed v6.6.0 labels Dec 18, 2018
@jen-huang
Copy link

@tbrooks8 Do you have any information regarding CJ's questions ☝️ about what kind of information we could display the UI? We are trying to figure this out for 6.7 UI work, any preliminary info/docs about new APIs or changes to existing ones would be appreciated. Thank you!

@Tim-Brooks
Copy link
Contributor Author

My work does not currently included any new external APIs. The recovery from remote is implemented as a normal recovery (through a repository). There are pre-existing apis for recoveries: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-recovery.html.

If we need new APIs for information, that work still needs to be scoped out and those discussions should include people like @ywelsch and @jasontedor. My work primarily involves transferring segment files from the leader to the follower through our existing recovery infrastructure.

@sebelga
Copy link
Contributor

sebelga commented Dec 20, 2018

@tbrooks8 thanks for clarifying. This sounds like something to add in the detail panel of Index Management under the Summary tab. What do you think @bmcconaghy @yaronp68 ?

Tim-Brooks added a commit that referenced this issue Dec 20, 2018
This is related to #35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Dec 20, 2018
This is related to elastic#35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
Tim-Brooks added a commit that referenced this issue Dec 21, 2018
This is related to #35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
Tim-Brooks added a commit that referenced this issue Jan 14, 2019
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Jan 14, 2019
This is related to elastic#35975. This commit adds timeout functionality to
the local session on a leader node. When a session is started, a timeout
is scheduled using a repeatable runnable. If the session is not accessed
in between two runs the session is closed. When the sssion is closed,
the repeating task is cancelled.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Jan 15, 2019
This is related to elastic#35975. This commit implements rate limiting on the
follower side using the `RateLimitingInputStream`.
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 17, 2019
…hard

* Changed the shard changes api to include a special metadata in the exception being thrown
to indicate that the ops are no longer there.
* Changed ShardFollowNodeTask to handle this exception with special metadata
  and mark a shard as fallen behind its leader shard. The shard follow task
  will then abort its on going replication.

The code that does the restore from ccr repository still needs to be added.
This change should make that change a bit easier.

Relates to elastic#35975
@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Feb 8, 2019
This is related to elastic#35975. We do not want a slow master to fail a
recovery from remote process due to a slow put mappings call. This
commit increases the master node timeout on this call to 30 mins.
Tim-Brooks added a commit that referenced this issue Feb 8, 2019
This is related to #35975. We do not want a slow master to fail a
recovery from remote process due to a slow put mappings call. This
commit increases the master node timeout on this call to 30 mins.
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Feb 17, 2019
dnhatn added a commit that referenced this issue Feb 18, 2019
We simulate remote recovery in ShardFollowTaskReplicationTests 
by bootstrapping the follower with the safe commit of the leader.

Relates #35975
dnhatn added a commit that referenced this issue Feb 18, 2019
We simulate remote recovery in ShardFollowTaskReplicationTests 
by bootstrapping the follower with the safe commit of the leader.

Relates #35975
dnhatn added a commit that referenced this issue Feb 18, 2019
We simulate remote recovery in ShardFollowTaskReplicationTests 
by bootstrapping the follower with the safe commit of the leader.

Relates #35975
dnhatn added a commit that referenced this issue Feb 18, 2019
We simulate remote recovery in ShardFollowTaskReplicationTests 
by bootstrapping the follower with the safe commit of the leader.

Relates #35975
Tim-Brooks added a commit that referenced this issue Mar 5, 2019
This is related to #35975. It adds documentation on the remote recovery
process. Additionally, it adds documentation about the various settings
that can impact the process.
Tim-Brooks added a commit that referenced this issue Mar 5, 2019
This is related to #35975. It adds documentation on the remote recovery
process. Additionally, it adds documentation about the various settings
that can impact the process.
Tim-Brooks added a commit that referenced this issue Mar 5, 2019
This is related to #35975. It adds documentation on the remote recovery
process. Additionally, it adds documentation about the various settings
that can impact the process.
Tim-Brooks added a commit that referenced this issue Mar 5, 2019
This is related to #35975. It adds documentation on the remote recovery
process. Additionally, it adds documentation about the various settings
that can impact the process.
@jakelandis jakelandis added v7.3.0 and removed v7.2.0 labels Jun 17, 2019
@rjernst rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
@ywelsch
Copy link
Contributor

ywelsch commented Jul 23, 2020

@tbrooks8 can you check what remains to be done here so that we can close this issue?

@Tim-Brooks
Copy link
Contributor Author

Closing as all of the relevant tasks have been completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement Meta Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

10 participants