sio_cache_0 kernel slab high memory usage during scrub #8662

mailinglists35 · 2019-04-23T20:12:14Z

System information

Type	Version/Name
Distribution Name	ubuntu
Distribution Version	18.04lts
Linux Kernel	4.18
Architecture	amd64
ZFS Version	0.8.0-rc4
SPL Version	0.8.0-rc4

Describe the problem you're observing

detailed attachments on https://zfsonlinux.topicbox.com/groups/zfs-discuss/T225c012532a7c86c (arc_summary, /proc/spl/kmem/slab, dmesg)

when I scrub the pool (10TB 3-way mirror with ssd cache and ssd log), I observe that kernel slab memory occupied by sio_cache_0 fills up to the point where kernel starts oom killer on innocent apps
these slabs show as unreclaimable in oom killer dmesg

if I setup spl_kmem_cache_expire to 0x01 (illumos style, 15 seconds aging return of objects), the memory consumption stabilizes around 3GB - down from 10GB with default 0x02 -, but I find that still enormously.

Does sio_cache_0 really need all that ram during scrub?

Describe how to reproduce the problem

zpool scrub poolname

Include any warning/errors/backtraces from the system logs

The text was updated successfully, but these errors were encountered:

behlendorf · 2019-04-23T22:02:24Z

@mailinglists35 thanks for reporting this. The sio_cache by default is limited to 5% of system memory during the scrub, though we may exceed this somewhat due to memory fragmentation. Could you please post the output of the following command when it's consuming 10G. That should let us determine if the core issue here is fragmentation.

cat /proc/slabinfo  | grep sio_cache

It wasn't clear from the mailing list thread how much memory is in your system. Could you include that information as well.

You can further reduce the amount of memory ZFS is allowed to use for the scan by setting the zfs_scan_mem_lim_fact and zfs_scan_mem_lim_soft_fact module options.

zfs_scan_mem_lim_fact (int)

Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
This tunable determines the hard limit for I/O sorting memory usage.
When the hard limit is reached we stop scanning metadata and start issuing
data verification I/O. This is done until we get below the soft limit.

Default value: 20 which is 5% of RAM (1/20).

zfs_scan_mem_lim_soft_fact (int)

The fraction of the hard limit used to determined the soft limit for I/O sorting
by the sequential scan algorithm. When we cross this limit from bellow no action
is taken. When we cross this limit from above it is because we are issuing
verification I/O. In this case (unless the metadata scan is done) we stop
issuing verification I/O and start scanning metadata again until we get to the
hard limit.

Default value: 20 which is 5% of RAM (1/20).

mailinglists35 · 2019-04-24T10:19:28Z

the system has 16GB physical RAM

when slabtop reports 3GB (which is ~20% of ram), the output of the requested command is:

sio_cache_2       121752 123120    168   48    2 : tunables    0    0    0 : slabdata   2565   2565      0
sio_cache_1       607524 628686    152   53    2 : tunables    0    0    0 : slabdata  11862  11862      0
sio_cache_0       22944850 22945260    136   60    2 : tunables    0    0    0 : slabdata 382421 382421      0

I will update as well then it reaches the peak.

mailinglists35 · 2019-04-24T14:09:52Z

I can't seem to be able to trigger it again, but please leave this open until fragmentation - if that was the cause - gets high enough to occur again

tcaputi · 2019-04-24T20:53:11Z

@mailinglists35 Even 20% is 15% higher than it should be. Can you please do the following to enable dbgmsg logging and the contained dprintf messages:

echo $(($(cat /sys/module/zfs/parameters/zfs_flags) | 1)) > /sys/module/zfs/parameters/zfs_flags
echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable

Then please provide the all of the relevant dbgmsg logs like this:

cat /proc/spl/kstat/zfs/dbgmsg | grep dsl_scan

mailinglists35 · 2019-04-25T13:10:53Z

thank you, will do that.

related, I stumbled upon this:

I have seen total ZoL slab allocated space be as high as 10 GB (on this 16 GB machine) despite the ARC only reporting a 5 GB size. As you can see, this stuff can fluctuate back and forth during normal usage.
Sidebar: Accurately tracking ZoL slab memory usage

To accurately track ZoL memory usage you must defeat SLUB slab merging somehow. You can turn it off entirely with the slub_nomerge kernel paramter or hack the spl ZoL kernel module to defeat it (see the sidebar here).

Because you can set spl_kmem_cache_slab_limit as a module parameter for the spl ZoL kernel module, I believe that you can set it to zero to avoid having any ZoL slabs be native kernel slabs. This avoids SLUB slab merging entirely and also makes it so that all ZoL slabs appear in /proc/spl/kmem/slab. It may be somewhat less efficient.

does that still apply to current master? to accurately measure the data, should I boot with slub_nomerge, and/or should I set spl_kmem_cache_slab_limit to zero?

tcaputi · 2019-04-25T15:35:17Z

I wouldn't change anything from default until we have a better understanding of what's going on. One of the statements in dbgmsg will look something like this:

current scan memory usage: 0 bytes

This includes almost all of the memory currently being used by the scan, which is primarily the sio caches but includes other things as well, so don't expect the numbers to line up exactly. If this number is less than the total memory usage of all the sio caches that would be reason to start looking at the SPL and memory allocator. At that point I would say we should try the tunables you mentioned, but I want to sanity check that the scanning code's memory limiting is working properly in the first place.

tcaputi · 2019-04-29T20:27:16Z

@mailinglists35 any update o n this?

mailinglists35 · 2019-05-02T20:15:20Z

I can't make it use again 10..12GB :(

mailinglists35 · 2019-05-02T20:16:38Z

if this is an issue, I can close then reopen when it occurs again

tcaputi · 2019-05-14T16:11:42Z

@mailinglists35 should i close this issue?

richardelling · 2019-05-14T23:14:07Z

related comment for the archives...
I've got a telegraf collector that records /proc/slabinfo into a TSDB.
It is unclear to me that there is demand, beyond a few of us geeks who
track down these errors. So at this point, I'm not planning to upstream
to telegraf. Given enough demand, I'll do it. So let me know if it would help.

mailinglists35 · 2019-05-15T12:25:27Z

@tcaputi if it prevents release or gives other administrative troubles, please close. but later can I reopen or re-start talk here without having to open a new issue?

tcaputi · 2019-05-15T14:38:03Z

Yes, that's fine.

awnz · 2023-10-09T08:31:35Z

I think I'm experiencing this on a homelab/test server now (Proxmox 7.4.17, zfs zfs-2.1.11-pve1). Happening on a disk resilver. It also happens during a scrub, which I had to abort (the slab freed up when aborted).

Admittedly this is a low-memory node (8GB) but the rate it runs out when busy with the scanning part of resilvering sees it eat that memory up fast, like in a matter of minutes. I've attached two screenshots of btop, slabtop, zpool status and meminfo output about a minute apart to show the rate that it balloons out. The load on this node is some storage (Linstor) but no active VMs or containers (evacuated because of the instability).
If I don't offline the volume that's resilvering, it will OOM then panic.

While typing this I noticed there's a version mismatch between zfs-utils (2.1.11-pve1) and the kernel module (2.1.9-pve1), despite all packages being up-to-date. It's part of a three-node cluster, it's the same versions across all nodes, but node 2 seems to be the only node suffering from this. Nodes 1 and 3 completed weekly scrubs on Sunday without issue, but Node 2 OOMed and paniced as above.

Any suggestions what to look for next to isolate/debug this?

awnz · 2023-10-09T11:05:07Z

In the meantime I've stumbled across the zfs_scan_mem_lim_fact and _soft_fact paramaters which seem relevant, here: https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#zfs-scan-mem-lim-fact
sio_cache is mentioned in the documentation, which is what I'm seeing spinning out of control above.
Mine are set to the default 20 (divisor of physical memory) which seems not to be being honoured. I'm trying 100 and will report back.

amotin · 2023-10-09T14:47:50Z

@awnz IIRC scrub should log some status updates to dbgmsg in procfs on every txg. Seems it should also report memory usage there if ZFS module is built with ZFS_DEBUG and you enable dprintf's via echo 1 >/sys/module/zfs/parameters/zfs_flags. I wonder if it is accounting issue, or memory leak or the limits not working right.

awnz · 2023-10-09T21:42:06Z

It seems not to be compiled with debug. I've installed the relevant packages and will have another go at this this weekend.

In the meantime my dirty workaround is to watch for the memory runaway when the resilver enters the scan stage (or reboot from the kernel panic if I've missed it), then offline and then online the resilvered disk to break the scans into more manageable sized chunks that actually fit in memory. The memory is eaten when the scan runs but then released as the resilver stage progresses. nope that didn't work.

Went for options zfs zfs_scan_legacy=1 instead.

Since a scrub seems to result in the same behaviour, I'll retest with Proxmox debug packages installed and no workload on this node this weekend and report back. (edit 19/10: sorry was unable to test last weekend, will try again when I can)

tcaputi closed this as completed May 15, 2019

ufou mentioned this issue Jan 4, 2021

kernel slab high memory usage during scrub OOM kill other applications #11429

Closed

bschofield mentioned this issue Sep 11, 2023

Add a non-fractional memory limit for ZFS scrubs #15260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sio_cache_0 kernel slab high memory usage during scrub #8662

sio_cache_0 kernel slab high memory usage during scrub #8662

mailinglists35 commented Apr 23, 2019 •

edited

Loading

behlendorf commented Apr 23, 2019

mailinglists35 commented Apr 24, 2019

mailinglists35 commented Apr 24, 2019

tcaputi commented Apr 24, 2019

mailinglists35 commented Apr 25, 2019

tcaputi commented Apr 25, 2019

tcaputi commented Apr 29, 2019

mailinglists35 commented May 2, 2019

mailinglists35 commented May 2, 2019

tcaputi commented May 14, 2019

richardelling commented May 14, 2019

mailinglists35 commented May 15, 2019

tcaputi commented May 15, 2019

awnz commented Oct 9, 2023

awnz commented Oct 9, 2023

amotin commented Oct 9, 2023

awnz commented Oct 9, 2023 •

edited

Loading

sio_cache_0 kernel slab high memory usage during scrub #8662

sio_cache_0 kernel slab high memory usage during scrub #8662

Comments

mailinglists35 commented Apr 23, 2019 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

behlendorf commented Apr 23, 2019

mailinglists35 commented Apr 24, 2019

mailinglists35 commented Apr 24, 2019

tcaputi commented Apr 24, 2019

mailinglists35 commented Apr 25, 2019

tcaputi commented Apr 25, 2019

tcaputi commented Apr 29, 2019

mailinglists35 commented May 2, 2019

mailinglists35 commented May 2, 2019

tcaputi commented May 14, 2019

richardelling commented May 14, 2019

mailinglists35 commented May 15, 2019

tcaputi commented May 15, 2019

awnz commented Oct 9, 2023

awnz commented Oct 9, 2023

amotin commented Oct 9, 2023

awnz commented Oct 9, 2023 • edited Loading

mailinglists35 commented Apr 23, 2019 •

edited

Loading

awnz commented Oct 9, 2023 •

edited

Loading