Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tunable Cache Eviction Policies #5445

Open
esatterwhite opened this issue Sep 24, 2024 · 3 comments
Open

Tunable Cache Eviction Policies #5445

esatterwhite opened this issue Sep 24, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@esatterwhite
Copy link
Collaborator

esatterwhite commented Sep 24, 2024

Is your feature request related to a problem? Please describe.

We ingest and search on petabytes of data reaching as far back as 90 days old. We maintain multiple terabytes of cache for quickwit. Our Search workload is very front heavy in relation to time. Meaning, the further a given document's timestamp is from now the less it is accessed. The data that has been ingested in the last 3 days is generally the most frequently accessed data. However, We have internal jobs that run once a day or once a week that may access older data. Our users may periodically run reports or expensive aggregations across months of data, etc. Because Quickwit is using LRU cache eviction policies, these expensive, one off, and infrequently executed queries can invalidate a large portion of cache resulting in depredated performance across the cluster

Describe the solution you'd like
It would be good for our workload to implement means to switch the eviction policy. Namely, switch from the default Least Recently Used (LRU) to a Least Frequently Used (LFU) eviction policy. This would allow quickwit to evict split entries that are much less likely to be accessed again rather than just entries that happen to have a slightly older age than others keeping hot / relevant data in cache.

@esatterwhite esatterwhite added the enhancement New feature or request label Sep 24, 2024
@fulmicoton
Copy link
Contributor

thanks for the accurate description of your issue!

@esatterwhite
Copy link
Collaborator Author

esatterwhite commented Oct 10, 2024

@fulmicoton out of curiosity does the split cache operate per index?

Or could it? how complicated would that be? I'm just thinking of ways to remove / reduce the impact of a bad actor or the 1 off expensive query so that it doesn't impact the entire cluster.

Even in the case that we moved away from the daily index pattern - The per customer pattern still leaves us with thousands of indexes and a fairly unbalanced search work load

@fulmicoton
Copy link
Contributor

the split cache does not operate per index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants