Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851

original-brownbear · 2022-11-23T13:19:17Z

This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of RepositoryData will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952.

closes #89952 (this should be all that's needed to fix the memory issue in practice, judging by heap dumps as the problem really just is the initial set of snapshots going wild with concurrent loading of repo data on the huge snapshot-meta pool)

…ng repository instance This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952

elasticsearchmachine · 2022-11-23T13:19:41Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-11-23T13:19:42Z

Hi @original-brownbear, I've created a changelog YAML for you.

DaveCTurner

Nice simplification. I left one small comment.

DaveCTurner · 2022-11-23T14:13:05Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -413,7 +413,7 @@ protected BlobStoreRepository(
        this.namedXContentRegistry = namedXContentRegistry;
        this.basePath = basePath;
        this.maxSnapshotCount = MAX_SNAPSHOTS_SETTING.get(metadata.settings());
-        this.repoDataDeduplicator = new ResultDeduplicator<>(threadPool.getThreadContext());
+        this.repoDataLoadDeduplicator = new SingleResultDeduplicator<>(threadPool.getThreadContext(), this::doGetRepositoryData);


I think SingleResultDeduplicator risks a stack overflow exception if the action doesn't fork, which is the case here. Can we fork to SNAPSHOT_META here rather than when calling repoDataLoadDeduplicator::execute?

++ I pushed below commit, thanks for catching this! Also simplifies the call-sites nicely :)

feb751d

DaveCTurner

LGTM

DaveCTurner · 2022-11-23T14:58:22Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-                                                                    repoData.getGenId()
-                                                                )
-                                                        )
+                repoDataLoadDeduplicator.execute(


original-brownbear · 2022-11-23T16:54:35Z

Thanks David!

…ng repository instance (elastic#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952

…-caching repository instance (#91851) (#91866) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952. closes #89952 * fix compile

DaveCTurner · 2023-01-04T09:49:14Z

Would it be safe to backport this to 7.17 too? We definitely see #89952 affecting them, and I don't see any obvious reasons not to apply this there.

original-brownbear · 2023-01-04T09:52:37Z

Yea it should be fine :) let me try ..

original-brownbear · 2023-01-04T10:12:26Z

#92661 worked

…-caching repository instance (elastic#91851) (elastic#91866) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (elastic#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952 * fix compile

…-caching repository instance (#91851) (#91866) (#92661) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952. closes #89952 * fix compile

original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.6.0 v8.7.0 labels Nov 23, 2022

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 23, 2022

Update docs/changelog/91851.yaml

5f02d1b

original-brownbear requested review from tlrx and DaveCTurner November 23, 2022 14:00

DaveCTurner reviewed Nov 23, 2022

View reviewed changes

original-brownbear added 3 commits November 23, 2022 15:17

Merge remote-tracking branch 'elastic/main' into 89952

e110ba4

much cleaner threading

feb751d

Merge remote-tracking branch 'origin/89952' into 89952

3ffdf26

original-brownbear requested a review from DaveCTurner November 23, 2022 14:56

DaveCTurner approved these changes Nov 23, 2022

View reviewed changes

fix tests

863c93f

original-brownbear added the auto-backport-and-merge label Nov 23, 2022

original-brownbear merged commit 88e44a9 into elastic:main Nov 23, 2022

original-brownbear deleted the 89952 branch November 23, 2022 16:55

original-brownbear mentioned this pull request Nov 23, 2022

[8.6] Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) #91866

Merged

original-brownbear mentioned this pull request Jan 4, 2023

[7.17] Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) (#91866) #92661

Merged

original-brownbear added the v7.17.9 label Jan 4, 2023

original-brownbear restored the 89952 branch November 30, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851

Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851

original-brownbear commented Nov 23, 2022

elasticsearchmachine commented Nov 23, 2022

elasticsearchmachine commented Nov 23, 2022

DaveCTurner left a comment

DaveCTurner Nov 23, 2022

original-brownbear Nov 23, 2022

DaveCTurner left a comment

DaveCTurner Nov 23, 2022

original-brownbear commented Nov 23, 2022

DaveCTurner commented Jan 4, 2023

original-brownbear commented Jan 4, 2023

original-brownbear commented Jan 4, 2023

Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851

Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851

Conversation

original-brownbear commented Nov 23, 2022

elasticsearchmachine commented Nov 23, 2022

elasticsearchmachine commented Nov 23, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Nov 23, 2022

Choose a reason for hiding this comment

original-brownbear Nov 23, 2022

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Nov 23, 2022

Choose a reason for hiding this comment

original-brownbear commented Nov 23, 2022

DaveCTurner commented Jan 4, 2023

original-brownbear commented Jan 4, 2023

original-brownbear commented Jan 4, 2023