Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve shards evictions in searchable snapshot cache service #67160

Merged
merged 19 commits into from
Jan 14, 2021

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jan 7, 2021

The searchable snapshot's cache service is notified when cache files of a specific shard must be evicted. The notifications are usually done in a cluster state applier thread that calls the CacheService#markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set of ShardEviction and submits the eviction of the shard to the generic thread pool. Because there's nothing preventing the cache service (and persistent cache service) to be closed before all shared evictions are processed, it is possible that invalidating a cache file fails and trips an assertion (as it happened in many tests failures recently #66958, #66730).

This pull request changes the CacheService so that it now waits for the evictions of shards to complete before closing the cache and persistent cache services. Like before, it allows searchable snapshot shards that have been previously marked as evicted to start by forcing the eviction of existing cache files. Finally, it removes the KeyedLock used before and improves test coverage.

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

/**
* Generates one or more cache files using the specified {@link CacheService}. Each cache files have been written at least once.
*/
protected List<CacheFile> randomCacheFiles(CacheService cacheService) throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved from the PersistentCacheTests class and sightly changed so that it always generate at least one random cache file and always read/write a range in each cache file.

if (pendingFutures.isEmpty() == false) {
try {
final CountDownLatch latch = new CountDownLatch(pendingFutures.size());
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to only wait for running shard evictions to complete and not process other shard evictions that have not been processed yet: if the shard files are still on disk they will be reused or removed after node starts again and if the shard files have been deleted from disk they won't be reloaded at node start up by the persistent cache logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We want to terminate as fast as we can.

I wonder if a simple read-write lock scheme could suffice? Processing the eviction takes the read-lock (and checks that state==started) and stop takes the write lock.

@tlrx
Copy link
Member Author

tlrx commented Jan 7, 2021

@henningandersen This relates one of your #66173 (comment) - sorry for the time it took me to get there.

tlrx added a commit that referenced this pull request Jan 11, 2021
This commit removes an assertion that makes many tests to fail 
on CI until #67160 is merged, which has been opened to fix the 
underlying issues around that assertion tripping (and brings 
back the assertion).

Relates #67160
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jan 11, 2021
This commit removes an assertion that makes many tests to fail 
on CI until elastic#67160 is merged, which has been opened to fix the 
underlying issues around that assertion tripping (and brings 
back the assertion).

Relates elastic#67160
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jan 11, 2021
This commit removes an assertion that makes many tests to fail 
on CI until elastic#67160 is merged, which has been opened to fix the 
underlying issues around that assertion tripping (and brings 
back the assertion).

Relates elastic#67160
tlrx added a commit that referenced this pull request Jan 11, 2021
This commit removes an assertion that makes many tests to fail 
on CI until #67160 is merged, which has been opened to fix the 
underlying issues around that assertion tripping (and brings 
back the assertion).

Relates #67160
tlrx added a commit that referenced this pull request Jan 11, 2021
This commit removes an assertion that makes many tests to fail 
on CI until #67160 is merged, which has been opened to fix the 
underlying issues around that assertion tripping (and brings 
back the assertion).

Relates #67160
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this. I have a few suggestions to simplify the code a bit. I might have missed a finer point on why this cannot work out.

synchronized (shardsEvictionsMutex) {
if (allowShardsEvictions) {
final ShardEviction shardEviction = new ShardEviction(snapshotUUID, snapshotIndexName, shardId);
if (addPendingShardEviction(shardEviction)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could remove runningShardsEvictions if we make pendingShardsEvictions a Map<ShardEviction, Future> and register the future that threadPool.generic().submit() returns in the map? Since we do this under the lock, we should be able to check whether it exists before submitting and registering in the map.

evictCacheFilesIfNeeded would get from the map (under lock) and wait for the future (not under lock).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this simplifies the code. One of the motivation to differentiate pending and running shard evictions was to allow a searchable snapshot shard to immediately execute the eviction of cache files, while in your suggestion it would mean that the shard will have to wait for the submitted runnable to complete in the generic thread pool before moving forward with recovery. Maybe we are OK with this?

Copy link
Contributor

@henningandersen henningandersen Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, thanks for clarifying.

With the current threading this seems impossible to happen, since evictCacheFilesIfNeeded is always called on the generic thread pool. Since the evict of shard folders must happen before restoring the same searchable snapshot shard again, the initial shard evict job would come before a subsequent restore of the same searchable snapshot shard in the generic queue. Thus if the restore is running, the evict job must either have started to run or completed.

I would be inclined to rely on this for now. If those were separate threads, I suppose waiting would like be out of the question so would require a new design if we change the threading model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial shard evict job would come before a subsequent restore of the same searchable snapshot shard in the generic queue

Thanks Henning, I think you're right. I went with your suggestion of simplification.

if (pendingFutures.isEmpty() == false) {
try {
final CountDownLatch latch = new CountDownLatch(pendingFutures.size());
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We want to terminate as fast as we can.

I wonder if a simple read-write lock scheme could suffice? Processing the eviction takes the read-lock (and checks that state==started) and stop takes the write lock.

try {
final CountDownLatch latch = new CountDownLatch(pendingFutures.size());
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown()));
if (latch.await(shardsEvictionsStopTimeout.duration(), shardsEvictionsStopTimeout.timeUnit()) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want a timeout here? I have a hard time reason through what it means when timed out - and if we can safely handle that, we should just avoid the wait altogether (not proposing that). I see value though in first waiting for some seconds, then outputting a warning before waiting indefinitely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also just do without a timeout, in tests we will interrupt if the shutdown takes too long anyway so just no timeout and logging the interrupted exception is good enough IMO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I went with Henning's suggestion of using a read/write lock for this.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smaller comments, didn't go through all details now before since there's some open design questions from Henning.

}
if (pendingFutures.isEmpty() == false) {
try {
final CountDownLatch latch = new CountDownLatch(pendingFutures.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to use a GroupedActionListener + PlainActionFuture here so we collect the Exceptions if any occur instead of suppressing them?

@@ -127,6 +130,13 @@
Setting.Property.NodeScope
);

public static final Setting<TimeValue> SNAPSHOT_CACHE_SHARD_EVICTIONS_SHUTDOWN_TIMEOUT = Setting.timeSetting(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this really needs a timeout setting does it? It's really only relevant for tests to begin with and we can just hard-code chose a reasonable value like 10s (that's what we did in pretty much all other similar places like waiting for recoveries to finish/cancel etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not, I went with a hard coded timeout of 10s then wait indefinitely.

try {
if (evictedShards.remove(shardEviction)) {
runnable.run();
CompletableFuture<Integer> processShardEvictionIfNeeded(ShardEviction shardEviction) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This can just return Future

assert runningShardsEvictions.get(shardEviction) == null : "found a running shard eviction for " + shardEviction;
return CompletableFuture.completedFuture(0);

} else if (runningShardsEvictions.containsKey(shardEviction)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be simplified with computeIfAbsent in an else branch?

try {
final CountDownLatch latch = new CountDownLatch(pendingFutures.size());
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown()));
if (latch.await(shardsEvictionsStopTimeout.duration(), shardsEvictionsStopTimeout.timeUnit()) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also just do without a timeout, in tests we will interrupt if the shutdown takes too long anyway so just no timeout and logging the interrupted exception is good enough IMO

@tlrx tlrx merged commit 99340fe into elastic:master Jan 14, 2021
@tlrx tlrx deleted the improve-shards-evictions branch January 14, 2021 14:38
@tlrx
Copy link
Member Author

tlrx commented Jan 14, 2021

Thanks a lot @henningandersen for the multiple comments and suggestions!

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jan 14, 2021
…c#67160)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jan 14, 2021
…c#67160)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit that referenced this pull request Jan 14, 2021
#67519)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently #66958, #66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit that referenced this pull request Jan 14, 2021
#67517)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently #66958, #66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.1 v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants