kernel slab high memory usage during scrub OOM kill other applications #11429

ufou · 2021-01-04T19:12:45Z

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	18.04
Linux Kernel	5.4.0-58-generic
Architecture	amd64
ZFS Version	0.8.3-1ubuntu12.5
SPL Version	0.8.3-1ubuntu12.5

Describe the problem you're observing

We run the HWE Ubuntu kernel which therefore means we get the 0.8.* version of zfs/spl, our issue is probably the same as:
#8662

We run MySQL (Mariadb, actually) using zfs volumes for data and backup space (separate volumes), we run a scrub from cron every 4 weeks which takes ~4 hours, on our replicas the scrub generally completes without issue but with the primary we have seen MySQL crash (OOM killed on the last crash)

The servers are Intel Xeon Gold, with 512Gb RAM, disks are 6 x Intel S4510 SSD 3.8Tb in 3 x mirrored sets

Describe how to reproduce the problem

Start a scrub on the data volume, then watch meminfo for Unreclaim usage:

zpool scrub mysqldata

Every 2.0s: cat /proc/meminfo | grep claim                                                                                                                          
Mon Jan  4 19:04:27 2021

KReclaimable:    2442512 kB
SReclaimable:    2442512 kB
SUnreclaim:      1932272 kB

after 30s later:

Every 2.0s: cat /proc/meminfo | grep claim                                                                                                                          
Mon Jan  4 19:05:02 2021

KReclaimable:    2442976 kB
SReclaimable:    2442976 kB
SUnreclaim:      7637196 kB

Then issue the stop:

zpool scrub -s mysqldata

Check again:

Every 2.0s: cat /proc/meminfo | grep claim                                                                                                                          
Mon Jan  4 19:06:05 2021

KReclaimable:    2442976 kB
SReclaimable:    2442976 kB
SUnreclaim:      1970984 kB

I was unable to alter the behaviour of the SUnreclaim meminfo value by changing any of /sys/module/zfs/parameters/zfs_scan_mem_lim_fact, /sys/module/zfs/parameters/zfs_scan_mem_lim_soft_fact or by adding /sys/module/zfs/parameters/zfs_scrub_delay (permission denied as root)

Include any warning/errors/backtraces from the system logs

cat /proc/meminfo | grep claim
KReclaimable:    2453676 kB
SReclaimable:    2453676 kB
SUnreclaim:     16378036 kB

cat /proc/slabinfo  | grep sio_cache
sio_cache_2       2310396 2310528    168   48    2 : tunables    0    0    0 : slabdata  48136  48136      0
sio_cache_1       237122 237122    152   53    2 : tunables    0    0    0 : slabdata   4474   4474      0
sio_cache_0       106508040 106508040    136   30    1 : tunables    0    0    0 : slabdata 3550268 3550268      0

The text was updated successfully, but these errors were encountered:

For small objects the kernel's slab implemention is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to PAGE_SIZE on systems using larger pages. Since 16,384 bytes was experimentally determined to yield the best performance on 4K page systems this is used as the cutoff. This means on 4K page systems there is no functional change. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150

For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#12152 Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150

For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12152 Closes #11429 Closes #11574 Closes #12150

manojkumardevisetty · 2022-10-14T09:29:48Z

server3@server3:$ cat /proc/meminfo | grep claim
KReclaimable: 262152 kB
SReclaimable: 262152 kB
SUnreclaim: 49721732 kB
server3@server3:$ cat /proc/meminfo | grep claim
KReclaimable: 263072 kB
SReclaimable: 263072 kB
SUnreclaim: 49905428 kB
server3@server3:$ zpool scrub -s pool
cannot cancel scrubbing pool: permission denied
server3@server3:$ sudo zpool scrub -s pool
cannot cancel scrubbing pool: currently resilvering
server3@server3:~$

I want to cancle the resilering process. becuase I have 512 gb ram and my slab is eating it slowly all the memory.. can you able to help me on this

ufou added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Jan 4, 2021

matclayton mentioned this issue Feb 7, 2021

Scrubbing exhausts all available memory #11574

Closed

behlendorf mentioned this issue May 29, 2021

Linux: Set spl_kmem_cache_slab_limit when page size !4K #12152

Merged

13 tasks

tonynguien closed this as completed in 7837845 Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel slab high memory usage during scrub OOM kill other applications #11429

kernel slab high memory usage during scrub OOM kill other applications #11429

ufou commented Jan 4, 2021 •

edited

Loading

manojkumardevisetty commented Oct 14, 2022

kernel slab high memory usage during scrub OOM kill other applications #11429

kernel slab high memory usage during scrub OOM kill other applications #11429

Comments

ufou commented Jan 4, 2021 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

manojkumardevisetty commented Oct 14, 2022

ufou commented Jan 4, 2021 •

edited

Loading