-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable KSM for ARC cache #14279
Comments
KSM is meant for anonymous pages of child processes that are not from files. As far as I know, the KSM code as designed cannot be applied to either the page cache or ARC, so it is not something that can be enabled. The deduplication that @behlendorf mentioned was for cases where ZFS should know that the buffers are the same, such as when the files have the same places on disk in snapshots and you are looking at them from snapshots. It does not apply to identical files unless Offhand, the way that I would expect KSM to work is that it periodically hashes the anonymous pages and stores those hashes in a data structure. Upon getting a hit, it will mark the page as CoW in both places, verify that the two are the same and then have one point to the other while increasing the page's reference counter. Implementing the idea in ARC would be non-trivial. I would expect getting it right to require significant effort spanning at least a year. :/ |
I posted an issue suggesting "lightweight" deduplication and I wonder if it would cover this case? Basically in issue #13572 my proposed/preferred solution is to allow deduplication to be enabled only for the contents of the ARC (and L2ARC), rather than for the entire contents of a dataset, to massively reduce the RAM impact of dedup. The intention of that issue is to enable dedup for file copying, since this usually involves reading records into ARC (and thus the "lightweight" dedup table) shortly before writing out the new copies. Since they'd be in the dedup table, they would instead be written out as cloned blocks (reflinks) rather than full copies. If that issue were implemented first, then the same basic mechanism could be used to dedup the contents of ARC, because there would be a dedup table there to be used. Basically if two identical records are loaded, they'd generate the same hash for the dedup table, and so one can be eliminated in ARC (or even retroactively eliminated on disk). |
@Sachiru
@behlendorf
i checked this , and this does not seem to apply for zfs-2.1.6.
i created 10 identical 100mb files (test#.dat) with contents from /dev/urandom, dropped caches with "echo 3 >/proc/sys/vm/drop_caches" and read those file with "cat test*.dat >/dev/null".
before reading, arc was <200mb, after reading arc was at 1,2GB
that means, arc is not able to detect that the files contents are the same
so, hereby i'm reopening #2772
if arc has no internal deduplication, it should benefit from standard kernel memory deduplication feature.
ram is precious ressource and most systems have lots of unused cpu.
The text was updated successfully, but these errors were encountered: