Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return number of pages freed in the __arc_shrunker_func(). #2753

Closed
wants to merge 1 commit into from

Conversation

dweeezil
Copy link
Contributor

@dweeezil dweeezil commented Oct 2, 2014

According to the comments in the kernel's shrinker.h file, the
scan_objects callback should return the number of objects actually
freed. Previously, the shrinker returned the number of objects which
were attempted to be freed. When the return value of a shrinker is
too high, callers may retry shrinking indefinitely.

@dweeezil
Copy link
Contributor Author

dweeezil commented Oct 2, 2014

During my research into corrupted system attributes, I found myself in a position where my kernel would spin in shrink_slab() when an echo 2 > /proc/sys/vm/drop_caches were done. It was looping because the ARC shrinker (and possibly the SPL shrinker; see below) always returned the number of objects attempted to be freed rather than the number of objects which were actually freed.

I've not yet researched whether this logic is correct for older kernels. I'm currently using a 3.17-rc5 for my testing but I do believe this is correct for most newer 3.1X versions.

A similar fix is needed in the SPL generic shrinker (which should also obviate the need for KMC_RECLAIM_ONCE) which I'll be posting shortly as a pull request for SPL.

@dweeezil
Copy link
Contributor Author

dweeezil commented Oct 2, 2014

A quick check shows a number of issues this may fix, such as #2394.

@behlendorf
Copy link
Contributor

@dweeezil I was skeptical of this patch at first but upon closer inspection you're right. There was a change in expected behavior in Linux 3.12 we missed when the .count_objects callback was added.

It used to be that you would register a single function for the Linux shrinker and it would be called in two different ways. When called with nr_to_scan == 0 it was expected to return the number of potentially freeable objects in the cache. When nr_to_scan != 0 it was expected to free what it could but and still return the current cache size. Everyone agreed this was a horrible interface.

So it was changed. A .count_objects callback was added which retained the behavior of nr_to_scan == 0 and was expected to return the cache size. However, it appears we missed that fact that the return value for .scan_objects was changed also.

To cleanly handle this for all kernels it looks like we're going to need to make slightly more extensive changes. I'm thinking along the lines of allowing the callers to register both a .count_objects and a .scan_objects callback. This would bring us back in sync with the current interface which is far far far more sane.

@dweeezil
Copy link
Contributor Author

dweeezil commented Oct 2, 2014

@behlendorf Thanks for the clarification. That's why I added the weasel wording in my follow-on comment regarding kernel versions. Unfortunately I burned a fair bit of time trying to figure what was going on with this one (kept running into it while dealing with the corrupted SA issues) and wanted to get some sort of dialog going regarding this issue (and the parallel issue in SPL).

Unfortunately, this is likely to start hitting a lot of people given the popularity of Ubuntu 14.04 which is using a 3.13 kernel (it hit me on one of our production servers).

I'll try to get this cleaned up sufficiently to support all the kernels we need. Speaking of which, how old of a kernel are we supposed to support?

@tomposmiko
Copy link

Sorry, to jump in, but I want to make sure, if I understand correctly (because I have Trusty servers:)

One can hit this issue only by running the command 'echo 2/3 > /proc/sys/vm/drop_caches', right?
If no one do it, the system is in safe?

@behlendorf
Copy link
Contributor

@dweeezil Funny you should mention supported kernel versions. Historically we've supported back to 2.6.26 and because of that we're dragging along a ton of compatibility code. I think the time has probably come to drop a lot of that dead code. Since the oldest kernel version we're regularly testing with is 2.6.32, I'd suggest we use that. If all goes well I'll propose a patch stack either today or tomorrow which removes the old code. I'm sure it will be a nice simplification.

Unfortunately, I'm not sure how much that is going to help with with reworking the shrinkers. That interface has under gone numerous changes since 2.6.32. I'm all for reworking the compatibility code if you can work up something cleaner given the new interface. I'd even seriously flirt with the idea of pulling it out of the SPL and in to the ZFS code. It doesn't look like any of the existing kmem_cache_create() callers provide a reclaim function so the existing slab implementation doesn't strictly require it.

This could be one more step along the path of chipping away at the SPL to pulling it in to the ZFS code to simplify things. Anyway, what the cleanest thing to do here needs some thought.

@tomposmiko yes it looks like you might hit this issue by running that command. However, this code may also be called for other reasons you still might have issues. Although, if you haven't seen them by now I wouldn't worry to much.

@behlendorf
Copy link
Contributor

You know, this would explain the rare failures I've seen running the SPLAT linux:shrinker test.

@chrisrd
Copy link
Contributor

chrisrd commented Oct 16, 2014

FWIW... since upgrading from linux-3.10 to linux-3.14 I have been struggling with zfs and other processes (ceph-osd) battling it out, with calls to try_to_free_pages() dominating the perf charts during load spikes over 100 and the box becoming unresponsive.

This patch (dweeezil/zfs@af59f0e) and the analogous SPL patch (dweeezil/spl@efbda78) look to have completely fixed the issue (applied on top of openzfs/spl@2bf35fb7 and @fbeddd6), with the box now having been up 26 hours under significant load, which it wouldn't have survived previously.

@behlendorf behlendorf added this to the 0.6.4 milestone Oct 16, 2014
@behlendorf
Copy link
Contributor

@chrisrd Thanks for the independent confirmation that this patch addresses the issue. We just need to rework it a little bit so it will handle both kernels older and newer than 3.12 where the this issue was introduced.

@dweeezil dweeezil force-pushed the arc-shrinker branch 2 times, most recently from 51288af to 2358ef8 Compare October 20, 2014 12:29
The new shrinker API as of Linux 3.12 modifies "struct shrinker" by
replacing the @shrink callback with the pair of @count_objects and
@scan_objects.  It also requires the return value of @count_objects to
return the number of objects actually freed whereas the previous @shrink
callback returned the number of remaining freeable objects.

This patch adds support for the new @scan_objects return value semantics.

References:
    torvalds/linux@24f7c6b
@behlendorf
Copy link
Contributor

Repalced by #2837

@behlendorf behlendorf closed this Oct 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants