-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix the ghost queue bug with s3fifo #432
Conversation
Master
After fix
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #432 +/- ##
==========================================
+ Coverage 80.63% 80.82% +0.19%
==========================================
Files 53 53
Lines 7173 7166 -7
==========================================
+ Hits 5784 5792 +8
+ Misses 1389 1374 -15 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Try to read the csv file path by environment variable. | ||
let path = std::env::var("TWITTER_TRACE_PATH").ok(); | ||
if let Some(path) = path { | ||
// Limit the number of keys to read. | ||
// MAX means read all keys, which may take a really long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an interactive command to download the dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download this is a little bit complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download this is a little bit complex.
Hi @xiaguan , where can I find the dataset? Let me handle the download part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/twitter/cache-trace, I think I used miss ratio related cluster's data.
I use SNIA to download. In this way, I could just download part of it(6GB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is really big. 🤣
Just attach some discussion of foyer with S3FIFO. |
And could you please sync this result to the comments in the |
The implementation on the master branch doesn't evict objects from the ghost queue. I mainly focused on fixing this bug. |
Oh, hack. You are right. 🥲 I just decreased the reference count. BTW, how do you like the reference count? I see you removed it. |
I just follow the paper. The paper doesn't maintain the reference count in ghost queue? |
Okay. Let me check then. I will update this branch directly if necessary and merge it. |
I got you. |
Hi @PsiACE , do you have any idea about the reference count? Glad to hear from you as another reference. |
s3fifo does not need to keep reference counting for ghost, as reference counting will cause Quick Demotion failure, that is, incorrect estimation of popularity. After I noticed this and made the corresponding modifications, my algorithm also approached the state-of-the-art level, but it still needs further optimization. |
Even when the ghost queue uses key hash not key itself? 🤔 This looks a bit counterintuitive. |
I plan to rewrite a cache benchmark tool like This PR fixed that the ghost queue doesn't add weights when pushing, but removes the hash reference count. I'm kind of concerned about removing the hash reference count, but for most workloads, I think it is okay. So let's merge it first. |
It can be almost directly reused.
|
Yep. I've already sent a PR 🥰 |
What's changed and what's your intention?
Fixed up the ghost queue issue with s3fifo and also added some real workload tests. Another thing I wanna point out is that the algorithms that perform really well on zipf usually have lower robustness, meaning they're a bit more limited in where they can be applied.
Checklist
make all
(ormake fast
instead if the old tests are not modified) in my local environment.Related issues or PRs (optional)