-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CCR bootstrap from remote #35975
Comments
@tbrooks8 @sebelga and I were wondering if the API could provide any information about the bootstrapping process, which we could display in the UI? For example, whether or not bootstrapping is in progress, how many documents have been replicated, how many remain, and whether there have been any errors. |
This is related to #35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.
This is related to elastic#35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.
This is related to #35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.
@tbrooks8 Do you have any information regarding CJ's questions ☝️ about what kind of information we could display the UI? We are trying to figure this out for 6.7 UI work, any preliminary info/docs about new APIs or changes to existing ones would be appreciated. Thank you! |
My work does not currently included any new external APIs. The recovery from remote is implemented as a normal recovery (through a repository). There are pre-existing apis for recoveries: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-recovery.html. If we need new APIs for information, that work still needs to be scoped out and those discussions should include people like @ywelsch and @jasontedor. My work primarily involves transferring segment files from the leader to the follower through our existing recovery infrastructure. |
@tbrooks8 thanks for clarifying. This sounds like something to add in the detail panel of Index Management under the Summary tab. What do you think @bmcconaghy @yaronp68 ? |
This is related to #35975. When the shard restore process is complete, the index mappings need to be updated to ensure that the data in the files restores is compatible with the follower mappings. This commit implements a mapping update as the final step in a shard restore.
This is related to elastic#35975. When the shard restore process is complete, the index mappings need to be updated to ensure that the data in the files restores is compatible with the follower mappings. This commit implements a mapping update as the final step in a shard restore.
This is related to #35975. When the shard restore process is complete, the index mappings need to be updated to ensure that the data in the files restores is compatible with the follower mappings. This commit implements a mapping update as the final step in a shard restore.
This is related to #35975. It implements a file based restore in the CcrRepository. The restore transfers files from the leader cluster to the follower cluster. It does not implement any advanced resiliency features at the moment. Any request failure will end the restore.
This is related to elastic#35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled.
This is related to elastic#35975. This commit implements rate limiting on the follower side using the `RateLimitingInputStream`.
…hard * Changed the shard changes api to include a special metadata in the exception being thrown to indicate that the ops are no longer there. * Changed ShardFollowNodeTask to handle this exception with special metadata and mark a shard as fallen behind its leader shard. The shard follow task will then abort its on going replication. The code that does the restore from ccr repository still needs to be added. This change should make that change a bit easier. Relates to elastic#35975
This is related to elastic#35975. We do not want a slow master to fail a recovery from remote process due to a slow put mappings call. This commit increases the master node timeout on this call to 30 mins.
This is related to #35975. We do not want a slow master to fail a recovery from remote process due to a slow put mappings call. This commit increases the master node timeout on this call to 30 mins.
We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975
We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975
We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975
We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975
This is related to #35975. It adds documentation on the remote recovery process. Additionally, it adds documentation about the various settings that can impact the process.
This is related to #35975. It adds documentation on the remote recovery process. Additionally, it adds documentation about the various settings that can impact the process.
This is related to #35975. It adds documentation on the remote recovery process. Additionally, it adds documentation about the various settings that can impact the process.
This is related to #35975. It adds documentation on the remote recovery process. Additionally, it adds documentation about the various settings that can impact the process.
@tbrooks8 can you check what remains to be done here so that we can close this issue? |
Closing as all of the relevant tasks have been completed. |
CCR Bootstrap from Remote
Pre-feature freeze
CcrRepository
for each remote clusterCcrRepositorys
based on settings at node start.Repository#restoreShard
.CcrRepository
restore works with security enabled (Allow system privilege to execute proxied actions #37508)PutFollowAction
through theRestoreService
using theCcrRepository
.CcrRepository
to init follower index #35719 prototypes this workPutFollowAction
semantics regards whether it should wait for restore to complete. (UseCcrRepository
to init follower index #35719 )ShardFollowTasksExecutor
andShardFollowNodeTask
have existing mechanisms for followers to update mapping versions as the leader mappings change.put
again if it falls behind (Add test forPutFollowAction
on a closed index #38236)Post-feature freeze
PutFollowAction
and throw exception if leader cluster is on higher version than follower (Add rolling upgrade multi cluster test module #38277)ccr.indices.recovery.max_bytes_per_sec
(Add documentation on remote recovery #39483)ccr.indices.recovery.recovery_activity_timeout
(Add documentation on remote recovery #39483)ccr.indices.recovery.internal_action_timeout
(Add documentation on remote recovery #39483)ccr.indices.recovery.chunk_size
(Add documentation on remote recovery #39483)ccr.indices.recovery.max_concurrent_file_chunks
(Add documentation on remote recovery #39483)following
documentation in ccr overview #39936)overview
docs information into a mechanics of replication page.TransportRequestOptions.Type.REG, TransportRequestOptions.Type.PING
. The recovery actions use theREG
channel type. However, we do support dedicatedRECOVERY
channel types. We could consider addingRECOVERY
channels to the remote cluster connection profile and use those.The text was updated successfully, but these errors were encountered: