Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update arc_available_memory() to check freemem #3639

Closed
wants to merge 2 commits into from

Conversation

behlendorf
Copy link
Contributor

While Linux doesn't provide detailed information about the state of
the VM it does provide us total free pages. This information should
be incorporated in to the arc_available_memory() calculation rather
than solely relying on a signal from direct reclaim.

It is also desirable that the amount of reclaim be tunable on a
target system. While the default values are expected to work well
for most workloads there may be cases where custom values are needed.

zfs_arc_lotsfree - Threshold in bytes for what the ARC should consider
to be a lot of free memory on the system.

zfs_arc_desfree - Threshold in bytes for what the ARC should consider
to be the desired available free memory on the system.

Note that zfs_arc_lotsfree and zfs_arc_desfree are defined in terms
of bytes unlike the illumos globals lotsfree and desfree. This was
done to make reading and setting the values easier. The current values
are available in /proc/spl/kstat/zfs/arcstats.

Signed-off-by: Brian Behlendorf [email protected]
Issue #3637

@behlendorf
Copy link
Contributor Author

Depends on openzfs/spl#467.

@behlendorf
Copy link
Contributor Author

Refreshed. The core of the patch remains unchanged but I merged the two module parameters in to one. Now there is a single zfs_arc_sys_free module option will a clearly explained purpose. Internally, it's mapped to illumos's lotsfree variable which has the advantage of letting us leave more upstream code largely unchanged.

       zfs_arc_sys_free (ulong)
                   The  target  number  of  bytes the ARC should leave as free
                   memory on the system.  Defaults to the larger  of  1/64  of
                   physical  memory  or  512K but may be overridden by setting
                   zfs_arc_lotsfree to a specific value.

                   Default value: 0.

While Linux doesn't provide detailed information about the state of
the VM it does provide us total free pages.  This information should
be incorporated in to the arc_available_memory() calculation rather
than solely relying on a signal from direct reclaim.  Conceptually
this brings arc_available_memory() back in sync with illumos.

It is also desirable that the target amount of free memory be tunable
on a system.  While the default values are expected to work well
for most workloads there may be cases where custom values are needed.
The zfs_arc_sys_free module option was added for this purpose.

zfs_arc_sys_free - The target number of bytes the ARC should leave
                   as free memory on the system.  This value can
                   checked in /proc/spl/kstat/zfs/arcstats and
                   setting this module option will override the
                   default value.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3637
This brings the behavior of arc_memory_throttle() back in sync with
illumos.  The update memory throttling policy, as used by illumos,
roughly goes like this:

* Never throttle if more than 10% of memory is free.  This threshold
  is configurable with the zfs_arc_lotsfree_percent module option.

* Minimize any throttling of kswapd even when free memory is below
  the set threshold.  Allow it to write out pages as quickly as
  possible to help alleviate the memory pressure.

* Delay all other threads when free memory is below the set threshold
  in order to avoid compounding the memory pressure.  Buffers will be
  evicted from the ARC to reduce the issue.

The Linux specific zfs_arc_memory_throttle_disable module option has
been removed in favor of the existing zfs_arc_lotsfree_percent tuning.
Setting zfs_arc_lotsfree_percent=0 will have the same effect as
zfs_arc_memory_throttle_disable and it was therefore redundant.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3637
@behlendorf
Copy link
Contributor Author

Refreshed with @dweeezil suggested updates. I've also added a second patch to the stack which brings arc_memory_throttle back in line with illumos. Please review.

@behlendorf behlendorf added this to the 0.6.5 milestone Jul 28, 2015
@snajpa
Copy link
Contributor

snajpa commented Jul 29, 2015

Slightly OT question: how does the reported value reflect memory fragmentation?

If I understand things correctly, SLAB caches are allocated to fit more objects of the same type in the same allocation in the future, in hope that new objects of the same type will be required soon.
Now, if we then move in time on a busy system, we'll get chunks of memory, where there are several objects, but some, not insignificant part, of the allocation remains unused.

Then if I this is correct, I get that such wasted memory is not accounted as free, but as used, is that so?

On a busy system, like I have the fortune to be responsible for, where there can be over 7k processes total, 90 containers hammering adequate number of datasets. Add to the mix random rsync/rdiff-backup runs across really wild dir structures and you get a perfect recipe for instant memory fragmentation.

It only takes a few days to get a system to a state like this :)

Thinking about this all it would probably be best if I created a ticket describing my workload and all the possible weird situations/obvious bugs we encounter - because I strongly believe that once we get ZoL running well on vpsFree.cz systems, it'll be ready for anything (ZPL related, we don't use ZVOLs at all).

@behlendorf
Copy link
Contributor Author

@snajpa good question. The proposes patches updates the code to use the kernel's nr_free_pages() function to determine the total number of free pages spread across all the NUMA zones. This is a count of total pages and when the value is low it's to be expected it represents non-contiguous pages scatter all over memory. Regardless, it provides a good way to assess how pressed for memory the system is.

The zfs_arc_sys_free=1/64-of-total-memory option can then be used to attempt to set a target amount of memory to keep free, and the zfs_arc_lotsfree_percent=10% option is used the throttle new writes if necessary when memory is low. The nice thing is that this was finally possible now that all support Linux kernels let us check free memory (this wasn't always the case) and after some upstream illumos ARC refactoring.

Then if I this is correct, I get that such wasted memory is not accounted as free, but as used, is that so?

That's right, and we've been systematically working to limit that wasted memory in two major ways.

  • As of 0.6.4.3 all slab caches with small objects use the Linux caches by default which suffer from less fragmentation by having smaller sized slabs.
  • The ABD patches once merged will remove the need entirely for slab caches with large objects. That will almost entirely eliminate the fragmentation issue for ARC buffers.

So things are moving in the right direction. @snajpa it would be great if you could test out these patches with your abusive workload. They're actually pretty straight forward so I expect they'll be merged shortly, but if you could test them soon that would be helpful!

@siebenmann
Copy link
Contributor

I'm happy to test this PR, but it looks like ZoL git tip has moved since it was made. Is it safe to use these changes in combination with git tip, given the thread priority changes in git tip and current SPL git tip?

@behlendorf
Copy link
Contributor Author

Merged as:

7e8bddd Update arc_memory_throttle() to check pageout
11f552f Update arc_available_memory() to check freemem

@siebenmann I've just wrapped up my testing of these patches and confirmed they're working as designed for my test workloads so I've merged them to master. It would be great if you could test the latest master source, which includes these patches, with your real workloads. I'm quite happy with how all the ARC sync-up work came together and I've subjectively found the ARC is now more responsive but the broader cross-section of workloads we can throw at it the better.

@behlendorf behlendorf closed this Jul 30, 2015
@behlendorf behlendorf deleted the issue-3637 branch April 19, 2021 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants