-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve shards evictions in searchable snapshot cache service #67160
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
/** | ||
* Generates one or more cache files using the specified {@link CacheService}. Each cache files have been written at least once. | ||
*/ | ||
protected List<CacheFile> randomCacheFiles(CacheService cacheService) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been moved from the PersistentCacheTests class and sightly changed so that it always generate at least one random cache file and always read/write a range in each cache file.
if (pendingFutures.isEmpty() == false) { | ||
try { | ||
final CountDownLatch latch = new CountDownLatch(pendingFutures.size()); | ||
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to only wait for running shard evictions to complete and not process other shard evictions that have not been processed yet: if the shard files are still on disk they will be reused or removed after node starts again and if the shard files have been deleted from disk they won't be reloaded at node start up by the persistent cache logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We want to terminate as fast as we can.
I wonder if a simple read-write lock scheme could suffice? Processing the eviction takes the read-lock (and checks that state==started) and stop takes the write lock.
@henningandersen This relates one of your #66173 (comment) - sorry for the time it took me to get there. |
This commit removes an assertion that makes many tests to fail on CI until elastic#67160 is merged, which has been opened to fix the underlying issues around that assertion tripping (and brings back the assertion). Relates elastic#67160
This commit removes an assertion that makes many tests to fail on CI until elastic#67160 is merged, which has been opened to fix the underlying issues around that assertion tripping (and brings back the assertion). Relates elastic#67160
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this. I have a few suggestions to simplify the code a bit. I might have missed a finer point on why this cannot work out.
synchronized (shardsEvictionsMutex) { | ||
if (allowShardsEvictions) { | ||
final ShardEviction shardEviction = new ShardEviction(snapshotUUID, snapshotIndexName, shardId); | ||
if (addPendingShardEviction(shardEviction)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could remove runningShardsEvictions
if we make pendingShardsEvictions
a Map<ShardEviction, Future> and register the future that threadPool.generic().submit()
returns in the map? Since we do this under the lock, we should be able to check whether it exists before submitting and registering in the map.
evictCacheFilesIfNeeded
would get from the map (under lock) and wait for the future (not under lock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this simplifies the code. One of the motivation to differentiate pending and running shard evictions was to allow a searchable snapshot shard to immediately execute the eviction of cache files, while in your suggestion it would mean that the shard will have to wait for the submitted runnable to complete in the generic thread pool before moving forward with recovery. Maybe we are OK with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, thanks for clarifying.
With the current threading this seems impossible to happen, since evictCacheFilesIfNeeded
is always called on the generic thread pool. Since the evict of shard folders must happen before restoring the same searchable snapshot shard again, the initial shard evict job would come before a subsequent restore of the same searchable snapshot shard in the generic queue. Thus if the restore is running, the evict job must either have started to run or completed.
I would be inclined to rely on this for now. If those were separate threads, I suppose waiting would like be out of the question so would require a new design if we change the threading model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the initial shard evict job would come before a subsequent restore of the same searchable snapshot shard in the generic queue
Thanks Henning, I think you're right. I went with your suggestion of simplification.
if (pendingFutures.isEmpty() == false) { | ||
try { | ||
final CountDownLatch latch = new CountDownLatch(pendingFutures.size()); | ||
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We want to terminate as fast as we can.
I wonder if a simple read-write lock scheme could suffice? Processing the eviction takes the read-lock (and checks that state==started) and stop takes the write lock.
try { | ||
final CountDownLatch latch = new CountDownLatch(pendingFutures.size()); | ||
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown())); | ||
if (latch.await(shardsEvictionsStopTimeout.duration(), shardsEvictionsStopTimeout.timeUnit()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want a timeout here? I have a hard time reason through what it means when timed out - and if we can safely handle that, we should just avoid the wait altogether (not proposing that). I see value though in first waiting for some seconds, then outputting a warning before waiting indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also just do without a timeout, in tests we will interrupt if the shutdown takes too long anyway so just no timeout and logging the interrupted exception is good enough IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I went with Henning's suggestion of using a read/write lock for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some smaller comments, didn't go through all details now before since there's some open design questions from Henning.
} | ||
if (pendingFutures.isEmpty() == false) { | ||
try { | ||
final CountDownLatch latch = new CountDownLatch(pendingFutures.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to use a GroupedActionListener
+ PlainActionFuture
here so we collect the Exceptions
if any occur instead of suppressing them?
@@ -127,6 +130,13 @@ | |||
Setting.Property.NodeScope | |||
); | |||
|
|||
public static final Setting<TimeValue> SNAPSHOT_CACHE_SHARD_EVICTIONS_SHUTDOWN_TIMEOUT = Setting.timeSetting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this really needs a timeout setting does it? It's really only relevant for tests to begin with and we can just hard-code chose a reasonable value like 10s (that's what we did in pretty much all other similar places like waiting for recoveries to finish/cancel etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not, I went with a hard coded timeout of 10s then wait indefinitely.
...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java
Outdated
Show resolved
Hide resolved
try { | ||
if (evictedShards.remove(shardEviction)) { | ||
runnable.run(); | ||
CompletableFuture<Integer> processShardEvictionIfNeeded(ShardEviction shardEviction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: This can just return Future
assert runningShardsEvictions.get(shardEviction) == null : "found a running shard eviction for " + shardEviction; | ||
return CompletableFuture.completedFuture(0); | ||
|
||
} else if (runningShardsEvictions.containsKey(shardEviction)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be simplified with computeIfAbsent
in an else
branch?
try { | ||
final CountDownLatch latch = new CountDownLatch(pendingFutures.size()); | ||
pendingFutures.forEach(completableFuture -> completableFuture.whenComplete((integer, throwable) -> latch.countDown())); | ||
if (latch.await(shardsEvictionsStopTimeout.duration(), shardsEvictionsStopTimeout.timeUnit()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also just do without a timeout, in tests we will interrupt if the shutdown takes too long anyway so just no timeout and logging the interrupted exception is good enough IMO
Thanks a lot @henningandersen for the multiple comments and suggestions! |
…c#67160) The searchable snapshot's cache service is notified when cache files of a specific shard must be evicted. The notifications are usually done in a cluster state applier thread that calls the CacheService# markShardAsEvictedInCache method. The markShardAsEvictedInCache adds the shard to an internal set of ShardEviction and submits the eviction of the shard to the generic thread pool. Because there's nothing preventing the cache service (and persistent cache service) to be closed before all shared evictions are processed, it is possible that invalidating a cache file fails and trips an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730). This commit changes the CacheService so that it now waits for the evictions of shards to complete before closing the cache and persistent cache services.
…c#67160) The searchable snapshot's cache service is notified when cache files of a specific shard must be evicted. The notifications are usually done in a cluster state applier thread that calls the CacheService# markShardAsEvictedInCache method. The markShardAsEvictedInCache adds the shard to an internal set of ShardEviction and submits the eviction of the shard to the generic thread pool. Because there's nothing preventing the cache service (and persistent cache service) to be closed before all shared evictions are processed, it is possible that invalidating a cache file fails and trips an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730). This commit changes the CacheService so that it now waits for the evictions of shards to complete before closing the cache and persistent cache services.
#67519) The searchable snapshot's cache service is notified when cache files of a specific shard must be evicted. The notifications are usually done in a cluster state applier thread that calls the CacheService# markShardAsEvictedInCache method. The markShardAsEvictedInCache adds the shard to an internal set of ShardEviction and submits the eviction of the shard to the generic thread pool. Because there's nothing preventing the cache service (and persistent cache service) to be closed before all shared evictions are processed, it is possible that invalidating a cache file fails and trips an assertion (as it happened in many tests failures recently #66958, #66730). This commit changes the CacheService so that it now waits for the evictions of shards to complete before closing the cache and persistent cache services.
#67517) The searchable snapshot's cache service is notified when cache files of a specific shard must be evicted. The notifications are usually done in a cluster state applier thread that calls the CacheService# markShardAsEvictedInCache method. The markShardAsEvictedInCache adds the shard to an internal set of ShardEviction and submits the eviction of the shard to the generic thread pool. Because there's nothing preventing the cache service (and persistent cache service) to be closed before all shared evictions are processed, it is possible that invalidating a cache file fails and trips an assertion (as it happened in many tests failures recently #66958, #66730). This commit changes the CacheService so that it now waits for the evictions of shards to complete before closing the cache and persistent cache services.
The searchable snapshot's cache service is notified when cache files of a specific shard must be evicted. The notifications are usually done in a cluster state applier thread that calls the
CacheService#markShardAsEvictedInCache
method.The
markShardAsEvictedInCache
adds the shard to an internal set ofShardEviction
and submits the eviction of the shard to thegeneric
thread pool. Because there's nothing preventing the cache service (and persistent cache service) to be closed before all shared evictions are processed, it is possible that invalidating a cache file fails and trips an assertion (as it happened in many tests failures recently #66958, #66730).This pull request changes the
CacheService
so that it now waits for the evictions of shards to complete before closing the cache and persistent cache services. Like before, it allows searchable snapshot shards that have been previously marked as evicted to start by forcing the eviction of existing cache files. Finally, it removes theKeyedLock
used before and improves test coverage.