Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851

Merged
merged 6 commits into from
Nov 23, 2022

Conversation

original-brownbear
Copy link
Member

This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of RepositoryData will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952.

closes #89952 (this should be all that's needed to fix the memory issue in practice, judging by heap dumps as the problem really just is the initial set of snapshots going wild with concurrent loading of repo data on the huge snapshot-meta pool)

…ng repository instance

This makes use of the new deduplicator infrastructure to move to more
efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only
deduplicate after the repository entered a consistent state. The
adjusted solution is much simpler, in that it simply deduplicates such
that only a single loading of `RepositoryData` will ever happen at a
time, fixing memory issues from massively concurrent loading of the repo
data as described in elastic#89952.

closes elastic#89952
@original-brownbear original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.6.0 v8.7.0 labels Nov 23, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 23, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @original-brownbear, I've created a changelog YAML for you.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification. I left one small comment.

@@ -413,7 +413,7 @@ protected BlobStoreRepository(
this.namedXContentRegistry = namedXContentRegistry;
this.basePath = basePath;
this.maxSnapshotCount = MAX_SNAPSHOTS_SETTING.get(metadata.settings());
this.repoDataDeduplicator = new ResultDeduplicator<>(threadPool.getThreadContext());
this.repoDataLoadDeduplicator = new SingleResultDeduplicator<>(threadPool.getThreadContext(), this::doGetRepositoryData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SingleResultDeduplicator risks a stack overflow exception if the action doesn't fork, which is the case here. Can we fork to SNAPSHOT_META here rather than when calling repoDataLoadDeduplicator::execute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I pushed below commit, thanks for catching this! Also simplifies the call-sites nicely :)

feb751d

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

repoData.getGenId()
)
)
repoDataLoadDeduplicator.execute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@original-brownbear
Copy link
Member Author

Thanks David!

@original-brownbear original-brownbear merged commit 88e44a9 into elastic:main Nov 23, 2022
@original-brownbear original-brownbear deleted the 89952 branch November 23, 2022 16:55
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 23, 2022
…ng repository instance (elastic#91851)

This makes use of the new deduplicator infrastructure to move to more
efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only
deduplicate after the repository entered a consistent state. The
adjusted solution is much simpler, in that it simply deduplicates such
that only a single loading of `RepositoryData` will ever happen at a
time, fixing memory issues from massively concurrent loading of the repo
data as described in elastic#89952.

closes elastic#89952
elasticsearchmachine pushed a commit that referenced this pull request Jan 3, 2023
…-caching repository instance (#91851) (#91866)

* Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851)

This makes use of the new deduplicator infrastructure to move to more
efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only
deduplicate after the repository entered a consistent state. The
adjusted solution is much simpler, in that it simply deduplicates such
that only a single loading of `RepositoryData` will ever happen at a
time, fixing memory issues from massively concurrent loading of the repo
data as described in #89952.

closes #89952

* fix compile
@DaveCTurner
Copy link
Contributor

Would it be safe to backport this to 7.17 too? We definitely see #89952 affecting them, and I don't see any obvious reasons not to apply this there.

@original-brownbear
Copy link
Member Author

Yea it should be fine :) let me try ..

@original-brownbear
Copy link
Member Author

#92661 worked

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 4, 2023
…-caching repository instance (elastic#91851) (elastic#91866)

* Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (elastic#91851)

This makes use of the new deduplicator infrastructure to move to more
efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only
deduplicate after the repository entered a consistent state. The
adjusted solution is much simpler, in that it simply deduplicates such
that only a single loading of `RepositoryData` will ever happen at a
time, fixing memory issues from massively concurrent loading of the repo
data as described in elastic#89952.

closes elastic#89952

* fix compile
elasticsearchmachine pushed a commit that referenced this pull request Jan 4, 2023
…-caching repository instance (#91851) (#91866) (#92661)

* Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851)

This makes use of the new deduplicator infrastructure to move to more
efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only
deduplicate after the repository entered a consistent state. The
adjusted solution is much simpler, in that it simply deduplicates such
that only a single loading of `RepositoryData` will ever happen at a
time, fixing memory issues from massively concurrent loading of the repo
data as described in #89952.

closes #89952

* fix compile
@original-brownbear original-brownbear restored the 89952 branch November 30, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.17.9 v8.6.0 v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot creations have huge heap footprint after abrupt full-cluster restart
3 participants