Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Snapshot Finalization #47283

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Sep 30, 2019

As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.

This is also a nice to have for #46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.

As a result of elastic#45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/bwc

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/bwc

final RepositoryData updatedRepositoryData = getRepositoryData().addSnapshot(snapshotId, blobStoreSnapshot.state(), indices);
snapshotFormat.write(blobStoreSnapshot, blobContainer(), snapshotId.getUUID(), false);
writeIndexGen(updatedRepositoryData, repositoryStateId);
} catch (FileAlreadyExistsException ex) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch is gone now, it was dead code because we don't do the exists check for this blob anymore in the line above where we write the snap- blob.

indexMetaDataFormat.write(clusterMetaData.index(index.getName()), indexContainer(index), snapshotId.getUUID(), false);
}
} catch (IOException ex) {
throw new SnapshotException(metadata.name(), snapshotId, "failed to write metadata for snapshot", ex);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this specific rethrow because we write the index meta in parallel to the root level snap- blob with this change anyway so throwing with a separate message here seemed pointless.


public class MockEventuallyConsistentRepositoryTests extends ESTestCase {

private Environment environment;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just dead-code. Saw it when making adjustments here and just removed it when because I figured it wasn't worth a separate PR.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, nothing to worry about as it looks great already

@original-brownbear
Copy link
Member Author

Thanks @tlrx , all points addressed I think :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice change

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/packaging-sample

@original-brownbear
Copy link
Member Author

Thanks Tanguy!

@original-brownbear original-brownbear merged commit 5405f2e into elastic:master Sep 30, 2019
@original-brownbear original-brownbear deleted the parallelize-sn-finalization branch September 30, 2019 15:54
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Sep 30, 2019
As a result of elastic#45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.

This is also a nice to have for elastic#46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.
original-brownbear added a commit that referenced this pull request Sep 30, 2019
As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.

This is also a nice to have for #46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is
integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.

(cherry picked from commit 3091e26)
mkleen added a commit to crate/crate that referenced this pull request Nov 28, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.

(cherry picked from commit 3091e26)
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 28, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.

(cherry picked from commit 3091e26)
@original-brownbear original-brownbear restored the parallelize-sn-finalization branch January 6, 2021 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants