arc_reclaim takes lots of cpu and memory disappears #292

jeremysanders · 2011-06-24T11:22:19Z

arc_reclaim was taking most of the CPU. The system had no free memory, but all the user processes were swapped out. The machine stopped responding during shutdown. It seems to have happened again (running rsync). I caught it early and killed the rsync, but lots of memory is "missing" from top (memory order sorting):

Mem: 4024644k total, 3839860k used, 184784k free, 5392k buffers
Swap: 4194300k total, 26844k used, 4167456k free, 17356k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2148 root 20 0 200m 1668 1048 S 0.0 0.0 0:31.63 snmpd
3047 root 20 0 15268 1292 812 R 1.6 0.0 0:00.02 top
2551 root 20 0 106m 1024 672 S 0.0 0.0 0:00.07 bash

arc_reclaim has stopped using lots of CPU after killing rsync.

[root@xback1 logs]# cat /proc/meminfo
MemTotal: 4024644 kB
MemFree: 173504 kB
Buffers: 13532 kB
Cached: 20672 kB
SwapCached: 8668 kB
Active: 7128 kB
Inactive: 38660 kB
Active(anon): 1172 kB
Inactive(anon): 10408 kB
Active(file): 5956 kB
Inactive(file): 28252 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4167768 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 5212 kB
Mapped: 4064 kB
Shmem: 0 kB
Slab: 403116 kB
SReclaimable: 7940 kB
SUnreclaim: 395176 kB
KernelStack: 2848 kB
PageTables: 4424 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6206620 kB
Committed_AS: 160528 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 3334504 kB
VmallocChunk: 34355911836 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 13888 kB
DirectMap2M: 2050048 kB
DirectMap1G: 2097152 kB

This is with 2.6.38.8 running rc4. Is there any debugging I should try? Should I run the latest git version?

jeremysanders · 2011-06-24T11:24:32Z

Oh, I also get this in the logs:

Jun 24 12:08:26 xback1 kernel: [ 1910.782053] kworker/2:1: page allocation failure. order:0, mode:0x20
Jun 24 12:08:26 xback1 kernel: [ 1910.782292] Pid: 2939, comm: kworker/2:1 Tainted: P 2.6.38.8-32.fc14.x86_64 #1
Jun 24 12:08:26 xback1 kernel: [ 1910.782737] Call Trace:
Jun 24 12:08:26 xback1 kernel: [ 1910.782960] [] __alloc_pages_nodemask+0x6e9/0x762
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? __slab_alloc+0xc9/0x302
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? mempool_alloc_slab+0x15/0x17
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] alloc_pages_current+0xb1/0xca
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? arch_local_irq_save+0x16/0x1c
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] alloc_slab_page+0x1c/0x4a
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] new_slab+0x6d/0x1a7
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] __slab_alloc+0x224/0x302
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? scsi_pool_alloc_command+0x46/0x6b
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? scsi_pool_alloc_command+0x46/0x6b
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] kmem_cache_alloc+0x5b/0xe1
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] scsi_pool_alloc_command+0x46/0x6b
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] scsi_host_alloc_command+0x21/0x6c
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] __scsi_get_command+0x1e/0x99
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] scsi_get_command+0x3d/0xa5
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] scsi_setup_fs_cmnd+0x6e/0xbe
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] sd_prep_fn+0x287/0xb96
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? cfq_dispatch_requests+0x8f4/0xa3c
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? __perf_event_task_sched_out+0x27/0x2c
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? perf_event_task_sched_out+0x56/0x64
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] blk_peek_request+0xe3/0x1c7
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] scsi_request_fn+0x7f/0x417
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? mmdrop+0x1a/0x2b
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? blk_unplug_work+0x0/0x2c
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] __generic_unplug_device+0x37/0x3b
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? _raw_spin_lock_irq+0x1f/0x21
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] generic_unplug_device+0x30/0x40
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] blk_unplug_work+0x27/0x2c
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] process_one_work+0x18d/0x286
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] worker_thread+0xfd/0x181
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? worker_thread+0x0/0x181
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] kthread+0x82/0x8a
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] kernel_thread_helper+0x4/0x10
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? kthread+0x0/0x8a
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] [] ? kernel_thread_helper+0x0/0x10
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Mem-Info:
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 DMA per-cpu:
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 0: hi: 0, btch: 1 usd: 0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 1: hi: 0, btch: 1 usd: 0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 2: hi: 0, btch: 1 usd: 0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 DMA32 per-cpu:
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 0: hi: 186, btch: 31 usd: 118
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 1: hi: 186, btch: 31 usd: 150
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 2: hi: 186, btch: 31 usd: 29
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 Normal per-cpu:
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 0: hi: 186, btch: 31 usd: 30
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 1: hi: 186, btch: 31 usd: 112
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] CPU 2: hi: 186, btch: 31 usd: 46
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] active_anon:4985 inactive_anon:3692 isolated_anon:0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] active_file:8817 inactive_file:10223 isolated_file:0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] unevictable:0 dirty:11 writeback:0 unstable:0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] free:3463 slab_reclaimable:2354 slab_unreclaimable:98736
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] mapped:3834 shmem:85 pagetables:1266 bounce:0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 DMA free:13852kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15672kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] lowmem_reserve[]: 0 3479 3984 3984
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 DMA32 free:0kB min:14252kB low:17812kB high:21376kB active_anon:6976kB inactive_anon:1536kB active_file:708kB inactive_file:6196kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3563296kB mlocked:0kB dirty:0kB writeback:0kB mapped:936kB shmem:8kB slab_reclaimable:568kB slab_unreclaimable:357424kB kernel_stack:112kB pagetables:1196kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:42 all_unreclaimable? no
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] lowmem_reserve[]: 0 0 505 505
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 Normal free:0kB min:2068kB low:2584kB high:3100kB active_anon:12964kB inactive_anon:13232kB active_file:34560kB inactive_file:34696kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:517120kB mlocked:0kB dirty:44kB writeback:0kB mapped:14400kB shmem:332kB slab_reclaimable:8848kB slab_unreclaimable:37520kB kernel_stack:2784kB pagetables:3868kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:64 all_unreclaimable? no
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] lowmem_reserve[]: 0 0 0 0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 DMA: 1_4kB 0_8kB 1_16kB 0_32kB 0_64kB 0_128kB 0_256kB 1_512kB 1_1024kB 2_2048kB 2_4096kB = 13844kB
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 DMA32: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0*4096kB = 0kB
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] 19149 total pagecache pages
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] 25 pages in swap cache
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Swap cache stats: add 131, delete 106, find 0/0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Free swap = 4193776kB
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] Total swap = 4194300kB
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] 1048560 pages RAM
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] 42399 pages reserved
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] 25740 pages shared
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] 906056 pages non-shared
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] cache: kmalloc-128, object size: 128, buffer size: 128, default order: 0, min order: 0
Jun 24 12:08:26 xback1 kernel: [ 1910.783014] node 0: slabs: 2937, objs: 93984, free: 0

jeremysanders · 2011-06-24T11:32:00Z

There's also a trace from the 1st crash, which appears to be from within zfs:

Jun 24 02:18:22 xback1 kernel: [147666.084757] arc_reclaim: page allocation failure. order:0, mode:0x4020
Jun 24 02:18:22 xback1 kernel: [147666.084998] Pid: 969, comm: arc_reclaim Tainted: P 2.6.38.8-32.fc14.x86_64 #1
Jun 24 02:18:22 xback1 kernel: [147666.085001] Call Trace:
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] __alloc_pages_nodemask+0x6e9/0x762
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? _raw_spin_lock+0xe/0x10
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? try_to_wake_up+0x214/0x226
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? wake_up_worker+0x24/0x26
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] alloc_pages_current+0xb1/0xca
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] alloc_slab_page+0x1c/0x4a
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] new_slab+0x6d/0x1a7
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] __slab_alloc+0x224/0x302
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? __netdev_alloc_skb+0x22/0x3f
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] __kmalloc_node_track_caller+0xb8/0x118
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? __netdev_alloc_skb+0x22/0x3f
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] __alloc_skb+0x7c/0x13f
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] __netdev_alloc_skb+0x22/0x3f
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] rtl8169_rx_interrupt.clone.31+0x1b0/0x33d [r8169]
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] rtl8169_poll+0x39/0x1a4 [r8169]
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] net_rx_action+0xac/0x1b1
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] __do_softirq+0xd2/0x19e
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? paravirt_read_tsc+0x9/0xd
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? sched_clock+0x9/0xd
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] call_softirq+0x1c/0x30
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] do_softirq+0x46/0x83
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] irq_exit+0x49/0x8b
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] do_IRQ+0x8e/0xa5
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ret_from_intr+0x0/0x15
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] ? spl_kmem_cache_reap_now+0x25/0x75 [spl]
Jun 24 02:18:22 xback1 kernel: [147666.085001] [] arc_kmem_reap_now+0xa1/0xce [zfs]
...

jeremysanders · 2011-06-24T13:46:36Z

Thought it might have been https://github.com/behlendorf/zfs/issues/251. Tried setting zfs_arc_max to 3GB, 3221225472 for the module options, (on a 4GB machine), but all the memory has gone still:

[root@xback1 ~]# cat /proc/meminfo
MemTotal: 4024644 kB
MemFree: 379960 kB
Buffers: 5348 kB
Cached: 17352 kB
SwapCached: 1364 kB
Active: 6180 kB
Inactive: 26404 kB
Active(anon): 3216 kB
Inactive(anon): 6668 kB
Active(file): 2964 kB
Inactive(file): 19736 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4176992 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 9000 kB
Mapped: 3452 kB
Shmem: 0 kB
Slab: 374972 kB
SReclaimable: 8144 kB
SUnreclaim: 366828 kB
KernelStack: 2880 kB
PageTables: 4580 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6206620 kB
Committed_AS: 159732 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 3200212 kB
VmallocChunk: 34355840508 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 13888 kB
DirectMap2M: 2050048 kB
DirectMap1G: 2097152 kB

behlendorf · 2011-06-24T17:13:31Z

The memory isn't lost, it looks like it's being consumed by the ARC. The ARC will be by far the biggest vmalloc consumer on your system and you can see this is where the bulk of the memory is:

VmallocUsed: 3200212 kB

What's odd in that for some reason the ARC can't release that memory. The only time that should happen is if it's all dirty, thus dropping it would result in data loss. Something must be preventing it from being safely flushed to disk. In your second comment I do see memory allocation failures from the SCSI driver so perhaps that's preventing the I/O. This may be a bit of a catch 22 due to getting in to a low memory condition.

You may be able to prevent this is the short term by further reducing your zfs_arc_max to say 2GB. Longer term this memory handling will be improved.

jeremysanders · 2011-06-24T18:30:40Z

Thanks - I've reverted to 2.6.35 and it seems to be working much better there, though the memory usage gets close to maximum at times.

The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#466 Closes openzfs#292

Adding test case to fetch specific snapshot information using target Signed-off-by: mayank <[email protected]>

This includes some incidental breaking changes to the CLI. Specifically, subcommand names and arg long names are all kebab-case, and a few argument names are simplified.

behlendorf closed this as completed in ab26409 Jan 11, 2012

fuhrmannb pushed a commit to fuhrmannb/cstor that referenced this issue Nov 3, 2020

add(test): test case for snapshot list (openzfs#292)

f98416a

Adding test case to fetch specific snapshot information using target Signed-off-by: mayank <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arc_reclaim takes lots of cpu and memory disappears #292

arc_reclaim takes lots of cpu and memory disappears #292

jeremysanders commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

behlendorf commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

arc_reclaim takes lots of cpu and memory disappears #292

arc_reclaim takes lots of cpu and memory disappears #292

Comments

jeremysanders commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

jeremysanders commented Jun 24, 2011

behlendorf commented Jun 24, 2011

jeremysanders commented Jun 24, 2011