-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable KSM for memory allocated for ARC cache #2772
Comments
Intersting idea. I'm wondering about the performance implications. I think an easy way to evaluate this is to setup a VM with ZoL inside. |
Performance penalties for KSM should theoretically be minimal. I'm running three VMs at the moment, with relatively lax KSM parameters (sleep_millisecs set to 1000, pages_to_scan set to 150000, aka scan 580 MB worth of memory every one second, we can tune this for zfs by setting it to a high sleep_millisecs value like 10000 and a high pages_to_scan value like 300000 so that it scans large regions infrequently), and the processor load added on to my system as reported by uptime is only around 5-12%. |
great idea ! there were some suggested improvements to KSM handling in the linux kernel mailing list (in April if I remember correctly), I'll see whether I can post them here for reference, hopefully they'll make it into the kernel soon so the performance impact should be fairly negligible |
@kpande it also affects ZFS ? For my kind of workflow there in general wasn't much data reported to be merged: 10-200 MiB or with lots of firefox, chromium tabs and PDF files some more. On servers it could be a real gain, concerning the significantly lower cpu load |
Agreed :) |
This situation here will be considerably better in the 0.7.0 release. ARC buffers are now compressed in memory and the ARC is better about not keeping multiple copies of the same buffer. |
This situation here will be considerably better in the 0.7.0 release. ARC buffers are now compressed in memory and the ARC is better about not keeping multiple copies of the same buffer.
Gosh, looking forward to this. Thanks for all your good work, Brian!
…--
Tim Connors
|
For non-deduplicated datasets or filesystems, ARC cache retains full blocks in memory even if they are duplicates of something else. On systems that serve as backing storage for users (shared folder containing material/model/cad libraries/library families and/or versioned executables), this can result in several duplicate objects stored in RAM, which can be viewed as a waste of ARC resources.
KSM (Kernel Same Page Merging, http://en.wikipedia.org/wiki/Kernel_SamePage_Merging_(KSM) ) is supposed to optimize memory usage especially for memory-heavy applications. Although it is true that blocks have variable sizes, they are still allocated as 4k pages in memory (IIRC), which can then be examined and deduplicated.
This is beneficial even if the recovered memory ratios of ARC is small due to the inherently large nature of ARC allocations. Even if only duplicated data is around 10% of ARC, due to typically large ARC sizes significant data gains are observed. Deduplicating ARC contents by means of KSM also means that more data gets fitted into the ARC cache, and that the deduplication code need not be maintained by ZoL devs, only the glue code that allows ARC to be seen by KSM.
The text was updated successfully, but these errors were encountered: