-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add circuit breaker support for file cache #6591
Add circuit breaker support for file cache #6591
Conversation
@@ -586,6 +586,8 @@ protected Node( | |||
pluginCircuitBreakers, | |||
settingsModule.getClusterSettings() | |||
); | |||
// File cache will be initialized by the node once circuit breakers are in place. | |||
nodeEnvironment.initializeFileCache(settings, circuitBreakerService.getBreaker(CircuitBreaker.REQUEST)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization had to be moved out here due to the CircuitBreakerService
initialization. I realize this is not ideal, but the sequence of initialization forces this out.
Please chime in if you have a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into this a bit, it seems to me that NodeEnvironment should not hold a fileCache
instance and instead should just hold information about the path of the file cache, as NodeEnvironment is defined as "A component that holds all data paths for a single node."
I think this can be refactored as follows:
- Remove the actual file cache instance from NodeEnvironment
- Create the filecache instance in Node.java where are you currently calling
nodeEnvironment.initializeFileCache
- In Node.java, create a FileCacheCleaner instance using the FileCache instance (there is nothing per-index about the cleaner, so I think only a single instance is file). Pass that FileCacheCleaner instance to IndicesService to wire up as a listener.
- Profit!
What do you think?
Gradle Check (Jenkins) Run Completed with:
|
5714639
to
7828373
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6591 +/- ##
============================================
- Coverage 70.82% 70.79% -0.03%
- Complexity 59160 59166 +6
============================================
Files 4803 4803
Lines 283188 283200 +12
Branches 40837 40838 +1
============================================
- Hits 200574 200502 -72
- Misses 66129 66214 +85
+ Partials 16485 16484 -1
... and 492 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this implementation is that I could have zero active usage of the file cache, but non-active usage of a bajillion, and the circuit breaker would start to fail things but there would be nothing to start evicting the non-active entries in the cache even though they are eligible for eviction.
return theCache.put(filePath, indexInput); | ||
CachedIndexInput cachedIndexInput = theCache.put(filePath, indexInput); | ||
// This operation ensures that the PARENT breaker is not tripped | ||
circuitBreaker.addEstimateBytesAndMaybeBreak(0, "filecache_entry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails after inserting the thing into the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the logic to maintain FC operations post a circuit breaker trip.
In that case, we will be designing for two size constraints - storage (defined by capacity of the cache) and heap memory. If we want the auto fixing behavior, we have no other choice than to track the size of entries going into the cache, which is how I had it designed originally. I think it will be something on the lines of -
The really messy part about it is the size of an entry - which could vary based on the platform/JVM/OS etc. |
7828373
to
b9070f1
Compare
028a7b0
to
20aae7a
Compare
Gradle Check (Jenkins) Run Completed with:
|
@reta Any thoughts on this? |
20aae7a
to
8b80f74
Compare
Gradle Check (Jenkins) Run Completed with:
|
/** | ||
* Returns the current {@link FileCacheStats} | ||
*/ | ||
public FileCacheStats fileCacheStats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a much better home for this method than NodeEnvironment, but we now do have both FileCache::stats
and FileCache::fileCacheStats
methods here. Perhaps something to revisit in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I realized that and thought of the same. We can clean it up with another PR, but this looked like the right place for it to be in.
* If the user doesn't configure the cache size, it fails if the node is a data + search node. | ||
* Else it configures the size to 80% of available capacity for a dedicated search node, if not explicitly defined. | ||
*/ | ||
public void initializeFileCache(Settings settings, CircuitBreaker circuitBreaker) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated.
/** | ||
* Returns the {@link FileCache} instance for remote search node | ||
*/ | ||
public FileCache fileCache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only exposed for testing? If so, is there a way to make it protected or package-private? Otherwise, at least comment that it is exposed only for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, it is visible for test purposes across packages.
8b80f74
to
fec8c35
Compare
Gradle Check (Jenkins) Run Completed with:
|
fec8c35
to
86006f6
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Kunal Kotwani <[email protected]>
86006f6
to
9336243
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6591-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 06128a904964ce43813ebc8164417a5633d414a2
# Push it to GitHub
git push --set-upstream origin backport/backport-6591-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6591-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 06128a904964ce43813ebc8164417a5633d414a2
# Push it to GitHub
git push --set-upstream origin backport/backport-6591-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
Signed-off-by: Valentin Mitrofanov <[email protected]>
(cherry picked from commit 06128a9)
Backport PR: #7011 |
(cherry picked from commit 06128a9) Signed-off-by: Kunal Kotwani <[email protected]>
(cherry picked from commit 06128a9) Signed-off-by: Kunal Kotwani <[email protected]>
Description
Two possible solutions were assessed -
Approach 1 would be ideal if there was a fixed size of an entry for the cache, but it varies depending on the platform architecture, OS, JVM variations.
Approach 2 solves the problem by keeping tracking of total memory usage instead of individual entries, and checking the breaker as soon as an entry is added.
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.