Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable KSM for memory allocated for ARC cache #2772

Closed
Sachiru opened this issue Oct 8, 2014 · 9 comments
Closed

Enable KSM for memory allocated for ARC cache #2772

Sachiru opened this issue Oct 8, 2014 · 9 comments
Labels
Component: Memory Management kernel memory management Type: Feature Feature request or new feature

Comments

@Sachiru
Copy link

Sachiru commented Oct 8, 2014

For non-deduplicated datasets or filesystems, ARC cache retains full blocks in memory even if they are duplicates of something else. On systems that serve as backing storage for users (shared folder containing material/model/cad libraries/library families and/or versioned executables), this can result in several duplicate objects stored in RAM, which can be viewed as a waste of ARC resources.

KSM (Kernel Same Page Merging, http://en.wikipedia.org/wiki/Kernel_SamePage_Merging_(KSM) ) is supposed to optimize memory usage especially for memory-heavy applications. Although it is true that blocks have variable sizes, they are still allocated as 4k pages in memory (IIRC), which can then be examined and deduplicated.

This is beneficial even if the recovered memory ratios of ARC is small due to the inherently large nature of ARC allocations. Even if only duplicated data is around 10% of ARC, due to typically large ARC sizes significant data gains are observed. Deduplicating ARC contents by means of KSM also means that more data gets fitted into the ARC cache, and that the deduplication code need not be maintained by ZoL devs, only the glue code that allows ARC to be seen by KSM.

@maci0
Copy link
Contributor

maci0 commented Oct 8, 2014

Intersting idea. I'm wondering about the performance implications.
This should be made a configurable option in case this is implemented.
Also this got me thinking about compressing ARC. Do we do that ? I know there is compression for L2ARC, so I'm not sure. #1379

I think an easy way to evaluate this is to setup a VM with ZoL inside.
Then on the host system enable/disable KSM. IIRC KVM sets all pages MERGEABLE for KSM.
Might be useful to analyze it.

@Sachiru
Copy link
Author

Sachiru commented Oct 8, 2014

Performance penalties for KSM should theoretically be minimal. I'm running three VMs at the moment, with relatively lax KSM parameters (sleep_millisecs set to 1000, pages_to_scan set to 150000, aka scan 580 MB worth of memory every one second, we can tune this for zfs by setting it to a high sleep_millisecs value like 10000 and a high pages_to_scan value like 300000 so that it scans large regions infrequently), and the processor load added on to my system as reported by uptime is only around 5-12%.

@behlendorf behlendorf added Component: Memory Management kernel memory management Type: Feature Feature request or new feature labels Oct 8, 2014
@kernelOfTruth
Copy link
Contributor

great idea !

there were some suggested improvements to KSM handling in the linux kernel mailing list (in April if I remember correctly), I'll see whether I can post them here for reference, hopefully they'll make it into the kernel soon

so the performance impact should be fairly negligible

@kernelOfTruth
Copy link
Contributor

@kpande it also affects ZFS ?

For my kind of workflow there in general wasn't much data reported to be merged: 10-200 MiB or with lots of firefox, chromium tabs and PDF files some more.

On servers it could be a real gain, concerning the significantly lower cpu load

@kernelOfTruth
Copy link
Contributor

Agreed :)

@behlendorf
Copy link
Contributor

This situation here will be considerably better in the 0.7.0 release. ARC buffers are now compressed in memory and the ARC is better about not keeping multiple copies of the same buffer.

@spacelama
Copy link

spacelama commented Dec 1, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

6 participants
@behlendorf @spacelama @Sachiru @maci0 @kernelOfTruth and others