Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use CacheService for caching metadata blobs #70668

Merged
merged 17 commits into from
Mar 24, 2021

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Mar 22, 2021

This PR unifies CachedBlobContainerIndexInput and FrozenIndexInput so that they share the same infrastructure for caching metadata blobs as well as header and footer ranges for data blobs. The idea is to always use CacheService for this, which does not evict the metadata, and which efficiently stores the information on disk (using sparse file support).

This also allows us to align writes in FrozenCacheService to 4KB block sizes in this PR, which addresses an issue when reusing regions from the shared cache, as writes that are not aligned on page cache boundaries causes the existing data (which we don't care about) to be loaded from disk, which comes with a dramatic performance penalty.

In a follow-up, we should also add a test / bootstrap check that ensures that sparse file support is always provided.

Closes #70728
Closes #70763

@ywelsch ywelsch added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v7.13.0 v8.0.0 labels Mar 23, 2021
@ywelsch ywelsch marked this pull request as ready for review March 23, 2021 09:38
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@ywelsch ywelsch requested a review from tlrx March 23, 2021 10:45
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me. I've done a first pass and left some comments.


@Override
protected void doReadInternal(ByteBuffer b) throws IOException {
final long position = getAbsolutePosition();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should compute the absolute and the relative positions only once here and pass them along as parameters to all other read methods which will be able to use whatever position they want but in a more explicit manner (ie relativePosition + b.remaining() for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed the other comments here, but I'm not yet sure how to go about this one. I've pushed aaa69e6 which removes the virtual file logic from FrozenCacheService and which simplifies things a bit further when it comes to position calculations (same for FrozenIndexInput as well as CachedBlobContainerIndexInput now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's keep it as it is. I'll give it a try.

@@ -83,13 +57,13 @@ public FrozenIndexInput(
0L,
0L,
fileInfo.length(),
new CacheFileReference(directory, fileInfo.physicalName(), fileInfo.length()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn on the file's length that is passed here. It means that we'll create a potentially large file in the cache service (and cause potential evictions) while we know it will only be used to cache at most directory.getBlobCacheByteRange(name, fileInfo.length()).length() bytes. We could maybe affine this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that CacheService is always supposed to be unbounded (and that's how it should be, the goal is to remove the undocumented setting to even make this configurable, and remove the complexity around that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot about that point. Also my suggestion was pretty bad since the file length here passed to the sparse file tracker and that would just break things. Sorry for the noise.

@ywelsch
Copy link
Contributor Author

ywelsch commented Mar 23, 2021

@elasticmachine run elasticsearch-ci/2 (timeout downloading stuff from the internet)

dakrone added a commit to dakrone/elasticsearch that referenced this pull request Mar 23, 2021
dakrone added a commit that referenced this pull request Mar 23, 2021
dakrone added a commit that referenced this pull request Mar 23, 2021
@ywelsch ywelsch requested a review from tlrx March 23, 2021 22:45
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I've left only minor things along the review, feel free to apply or ignore

TrackingRepositoryPlugin tracker = null;
for (RepositoryPlugin plugin : getInstanceFromNode(PluginsService.class).filterPlugins(RepositoryPlugin.class)) {
if (plugin instanceof TrackingRepositoryPlugin) {
tracker = ((TrackingRepositoryPlugin) plugin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return here directly

@@ -33,6 +34,7 @@
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
import static org.hamcrest.Matchers.containsString;

@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single test in this class, why changing the scope to TEST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because when you run the test with tests.iters=x, it will reuse the same node, and that breaks the test (RestoreBlockingActionFilter.executed and unblocked are not reset)

@@ -83,13 +57,13 @@ public FrozenIndexInput(
0L,
0L,
fileInfo.length(),
new CacheFileReference(directory, fileInfo.physicalName(), fileInfo.length()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot about that point. Also my suggestion was pretty bad since the file length here passed to the sparse file tracker and that would just break things. Sorry for the noise.

long bytesWritten = positionalWrite(fc, fileChannelPos + bytesCopied, buf);
bytesCopied += bytesWritten;
bytesCopied += (bytesWritten - adjustment); // adjust to not break RangeFileTracker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the stats should show the exact bytes copied (ie we can confirm the 4K alignment) but progress updater got updated with the adjusted bytes copied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer that to a follow-up as it will break a lot of tests (again)


@Override
protected void doReadInternal(ByteBuffer b) throws IOException {
final long position = getAbsolutePosition();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's keep it as it is. I'll give it a try.

@ywelsch ywelsch merged commit 296ac1a into elastic:master Mar 24, 2021
ywelsch added a commit to ywelsch/elasticsearch that referenced this pull request Mar 24, 2021
This PR unifies CachedBlobContainerIndexInput and FrozenIndexInput so that they share the same infrastructure for
caching metadata blobs as well as header and footer ranges for data blobs. The idea is to always use CacheService for
this, which does not evict the metadata, and which efficiently stores the information on disk (using sparse file support).

This also allows us to align writes in FrozenCacheService to 4KB block sizes in this PR, which addresses an issue when
reusing regions from the shared cache, as writes that are not aligned on page cache boundaries causes the existing data
(which we don't care about) to be loaded from disk, which comes with a dramatic performance penalty.

Closes elastic#70728
Closes elastic#70763
ywelsch added a commit that referenced this pull request Mar 24, 2021
This PR unifies CachedBlobContainerIndexInput and FrozenIndexInput so that they share the same infrastructure for
caching metadata blobs as well as header and footer ranges for data blobs. The idea is to always use CacheService for
this, which does not evict the metadata, and which efficiently stores the information on disk (using sparse file support).

This also allows us to align writes in FrozenCacheService to 4KB block sizes in this PR, which addresses an issue when
reusing regions from the shared cache, as writes that are not aligned on page cache boundaries causes the existing data
(which we don't care about) to be loaded from disk, which comes with a dramatic performance penalty.

Closes #70728
Closes #70763
ywelsch added a commit to ywelsch/elasticsearch that referenced this pull request Mar 26, 2021
…lastic#70795)

This PR unifies CachedBlobContainerIndexInput and FrozenIndexInput so that they share the same infrastructure for
caching metadata blobs as well as header and footer ranges for data blobs. The idea is to always use CacheService for
this, which does not evict the metadata, and which efficiently stores the information on disk (using sparse file support).

This also allows us to align writes in FrozenCacheService to 4KB block sizes in this PR, which addresses an issue when
reusing regions from the shared cache, as writes that are not aligned on page cache boundaries causes the existing data
(which we don't care about) to be loaded from disk, which comes with a dramatic performance penalty.

Closes elastic#70728
Closes elastic#70763
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.1 v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure in CachedBlobContainerIndexInputTests.testRandomReads SearchableSnapshotDirectoryStatsTests failures
4 participants