-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851
Conversation
…ng repository instance This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @original-brownbear, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification. I left one small comment.
@@ -413,7 +413,7 @@ protected BlobStoreRepository( | |||
this.namedXContentRegistry = namedXContentRegistry; | |||
this.basePath = basePath; | |||
this.maxSnapshotCount = MAX_SNAPSHOTS_SETTING.get(metadata.settings()); | |||
this.repoDataDeduplicator = new ResultDeduplicator<>(threadPool.getThreadContext()); | |||
this.repoDataLoadDeduplicator = new SingleResultDeduplicator<>(threadPool.getThreadContext(), this::doGetRepositoryData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SingleResultDeduplicator
risks a stack overflow exception if the action doesn't fork, which is the case here. Can we fork to SNAPSHOT_META
here rather than when calling repoDataLoadDeduplicator::execute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I pushed below commit, thanks for catching this! Also simplifies the call-sites nicely :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
repoData.getGenId() | ||
) | ||
) | ||
repoDataLoadDeduplicator.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks David! |
…ng repository instance (elastic#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952
…-caching repository instance (#91851) (#91866) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952. closes #89952 * fix compile
Would it be safe to backport this to 7.17 too? We definitely see #89952 affecting them, and I don't see any obvious reasons not to apply this there. |
Yea it should be fine :) let me try .. |
#92661 worked |
…-caching repository instance (elastic#91851) (elastic#91866) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (elastic#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952 * fix compile
…-caching repository instance (#91851) (#91866) (#92661) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952. closes #89952 * fix compile
This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of
RepositoryData
will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952.closes #89952 (this should be all that's needed to fix the memory issue in practice, judging by heap dumps as the problem really just is the initial set of snapshots going wild with concurrent loading of repo data on the huge snapshot-meta pool)