3.17 kernel, BUG: using smp_processor_id() in preemptible - caller is zio_nowait #2769

kernelOfTruth · 2014-10-07T18:59:53Z

Hi,

just completed booting up a 3.17.0-based kernel

during loading of the kernel modules the following is shown:

[  110.691372] BUG: using smp_processor_id() in preemptible [00000000] code: zpool/7429
[  110.691384] caller is zio_nowait+0x157/0x170 [zfs]
[  110.691386] CPU: 1 PID: 7429 Comm: zpool Tainted: P           O   3.17.0-v2+ #1
[  110.691387] Hardware name: ASUS All Series/P9D WS, BIOS 2004 06/05/2014
[  110.691388]  ffffffff82ecfa7b ffffffff82a3e876 0000000000000001 ffffffff8254334b
[  110.691390]  ffff8800dc66d7c0 ffff8807df9e4540 0000000000000000 ffffffffc0381377
[  110.691392]  0000000000000000 ffff8807dc4e3940 0000000000000000 ffff8807dfe67998
[  110.691393] Call Trace:
[  110.691397]  [<ffffffff82a3e876>] ? dump_stack+0x4a/0x75
[  110.691400]  [<ffffffff8254334b>] ? check_preemption_disabled+0xeb/0xf0
[  110.691404]  [<ffffffffc0381377>] ? zio_nowait+0x157/0x170 [zfs]
[  110.691408]  [<ffffffffc02e42b7>] ? arc_read+0x2b7/0x990 [zfs]
[  110.691413]  [<ffffffffc02ebd5a>] ? dbuf_prefetch+0x13a/0x250 [zfs]
[  110.691419]  [<ffffffffc0300925>] ? dmu_tx_fini+0x115/0x1a0 [zfs]
[  110.691425]  [<ffffffffc0301166>] ? dmu_zfetch+0x4d6/0x1990 [zfs]
[  110.691430]  [<ffffffffc02eb397>] ? dbuf_read+0x5a7/0xda0 [zfs]
[  110.691435]  [<ffffffffc0302a50>] ? dnode_hold_impl+0x1c0/0x550 [zfs]
[  110.691441]  [<ffffffffc02f36f6>] ? dmu_buf_hold_noread+0x26/0x150 [zfs]
[  110.691446]  [<ffffffffc02f3845>] ? dmu_buf_hold+0x25/0x80 [zfs]
[  110.691453]  [<ffffffffc034e7b5>] ? zap_lockdir+0x55/0x820 [zfs]
[  110.691457]  [<ffffffffc02e9d89>] ? dbuf_rele_and_unlock+0x289/0x470 [zfs]
[  110.691461]  [<ffffffffc02ea1a1>] ? dmu_buf_rele+0x11/0x620 [zfs]
[  110.691469]  [<ffffffffc034aef9>] ? fzap_lookup+0x119/0x150 [zfs]
[  110.691473]  [<ffffffff820f7819>] ? get_parent_ip+0x9/0x20
[  110.691475]  [<ffffffff820f7877>] ? preempt_count_add+0x47/0xa0
[  110.691478]  [<ffffffff82a45622>] ? _raw_spin_lock_irqsave+0x12/0x60
[  110.691485]  [<ffffffffc034f618>] ? zap_lookup_norm+0x38/0x1b0 [zfs]
[  110.691491]  [<ffffffffc034f7ba>] ? zap_lookup+0x2a/0x30 [zfs]
[  110.691499]  [<ffffffffc032d546>] ? spa_lookup_by_guid+0x13a6/0x2a30 [zfs]
[  110.691507]  [<ffffffffc032ef13>] ? spa_tryimport+0x83/0x430 [zfs]
[  110.691513]  [<ffffffffc035d4a9>] ? zfs_secpolicy_smb_acl+0x1a99/0x4ef0 [zfs]
[  110.691519]  [<ffffffffc035fa71>] ? zfs_secpolicy_smb_acl+0x4061/0x4ef0 [zfs]
[  110.691522]  [<ffffffff821f7ef0>] ? do_vfs_ioctl+0x2d0/0x4a0
[  110.691524]  [<ffffffff821e6daf>] ? __fput+0x13f/0x1c0
[  110.691527]  [<ffffffff82203a29>] ? mntput_no_expire+0x9/0x130
[  110.691529]  [<ffffffff820ec224>] ? task_work_run+0x94/0xd0
[  110.691531]  [<ffffffff821f80f6>] ? SyS_ioctl+0x36/0x80
[  110.691533]  [<ffffffff82a45f62>] ? int_signal+0x12/0x17
[  110.691535]  [<ffffffff82a45cd6>] ? system_call_fastpath+0x1a/0x1f
[  110.691547] BUG: using smp_processor_id() in preemptible [00000000] code: zpool/7429
[  110.691552] caller is zio_nowait+0x157/0x170 [zfs]
[  110.691553] CPU: 1 PID: 7429 Comm: zpool Tainted: P           O   3.17.0-v2+ #1
[  110.691554] Hardware name: ASUS All Series/P9D WS, BIOS 2004 06/05/2014
[  110.691554]  ffffffff82ecfa7b ffffffff82a3e876 0000000000000001 ffffffff8254334b
[  110.691556]  ffff8800dc66d7c0 ffff8807dc4231a8 0000000000000000 ffffffffc0381377
[  110.691557]  0000000000000000 ffff8807dc4e3940 0000000000000000 ffff8807dfe67948
[  110.691558] Call Trace:
[  110.691560]  [<ffffffff82a3e876>] ? dump_stack+0x4a/0x75
[  110.691562]  [<ffffffff8254334b>] ? check_preemption_disabled+0xeb/0xf0
[  110.691566]  [<ffffffffc0381377>] ? zio_nowait+0x157/0x170 [zfs]
[  110.691570]  [<ffffffffc02e42b7>] ? arc_read+0x2b7/0x990 [zfs]
[  110.691574]  [<ffffffffc02ebd5a>] ? dbuf_prefetch+0x13a/0x250 [zfs]
[  110.691580]  [<ffffffffc0300925>] ? dmu_tx_fini+0x115/0x1a0 [zfs]
[  110.691585]  [<ffffffffc0301166>] ? dmu_zfetch+0x4d6/0x1990 [zfs]
[  110.691590]  [<ffffffffc02eb397>] ? dbuf_read+0x5a7/0xda0 [zfs]
[  110.691595]  [<ffffffffc0302a50>] ? dnode_hold_impl+0x1c0/0x550 [zfs]
[  110.691600]  [<ffffffffc02f36f6>] ? dmu_buf_hold_noread+0x26/0x150 [zfs]
[  110.691605]  [<ffffffffc02f3845>] ? dmu_buf_hold+0x25/0x80 [zfs]
[  110.691612]  [<ffffffffc034e7b5>] ? zap_lockdir+0x55/0x820 [zfs]
[  110.691616]  [<ffffffffc02e9d89>] ? dbuf_rele_and_unlock+0x289/0x470 [zfs]
[  110.691620]  [<ffffffffc02ea1a1>] ? dmu_buf_rele+0x11/0x620 [zfs]
[  110.691627]  [<ffffffffc034aef9>] ? fzap_lookup+0x119/0x150 [zfs]
[  110.691629]  [<ffffffff820f7819>] ? get_parent_ip+0x9/0x20
[  110.691631]  [<ffffffff820f7877>] ? preempt_count_add+0x47/0xa0
[  110.691633]  [<ffffffff82a45622>] ? _raw_spin_lock_irqsave+0x12/0x60
[  110.691639]  [<ffffffffc034f618>] ? zap_lookup_norm+0x38/0x1b0 [zfs]
[  110.691645]  [<ffffffffc034f7ba>] ? zap_lookup+0x2a/0x30 [zfs]
[  110.691653]  [<ffffffffc032d546>] ? spa_lookup_by_guid+0x13a6/0x2a30 [zfs]
[  110.691661]  [<ffffffffc032ef13>] ? spa_tryimport+0x83/0x430 [zfs]
[  110.691668]  [<ffffffffc035d4a9>] ? zfs_secpolicy_smb_acl+0x1a99/0x4ef0 [zfs]
[  110.691673]  [<ffffffffc035fa71>] ? zfs_secpolicy_smb_acl+0x4061/0x4ef0 [zfs]
[  110.691675]  [<ffffffff821f7ef0>] ? do_vfs_ioctl+0x2d0/0x4a0
[  110.691676]  [<ffffffff821e6daf>] ? __fput+0x13f/0x1c0
[  110.691678]  [<ffffffff82203a29>] ? mntput_no_expire+0x9/0x130
[  110.691680]  [<ffffffff820ec224>] ? task_work_run+0x94/0xd0
[  110.691681]  [<ffffffff821f80f6>] ? SyS_ioctl+0x36/0x80
[  110.691683]  [<ffffffff82a45f62>] ? int_signal+0x12/0x17
[  110.691685]  [<ffffffff82a45cd6>] ? system_call_fastpath+0x1a/0x1f
[  110.802751] BUG: using smp_processor_id() in preemptible [00000000] code: zpool/7429
[  110.802757] caller is zio_nowait+0x157/0x170 [zfs]
[  110.802759] CPU: 1 PID: 7429 Comm: zpool Tainted: P           O   3.17.0-v2+ #1
[  110.802760] Hardware name: ASUS All Series/P9D WS, BIOS 2004 06/05/2014
[  110.802761]  ffffffff82ecfa7b ffffffff82a3e876 0000000000000001 ffffffff8254334b
[  110.802762]  ffff8800dc66d7c0 ffff8807de81f9d8 0000000000000000 ffffffffc0381377
[  110.802764]  0000000000000000 ffff8807dc4e39e0 0000000000000000 ffff8807dfe676c8
[  110.802765] Call Trace:
[  110.802767]  [<ffffffff82a3e876>] ? dump_stack+0x4a/0x75
[  110.802769]  [<ffffffff8254334b>] ? check_preemption_disabled+0xeb/0xf0
[  110.802774]  [<ffffffffc0381377>] ? zio_nowait+0x157/0x170 [zfs]
[  110.802777]  [<ffffffffc02e42b7>] ? arc_read+0x2b7/0x990 [zfs]
[  110.802782]  [<ffffffffc03812cb>] ? zio_nowait+0xab/0x170 [zfs]
[  110.802786]  [<ffffffffc02ebd5a>] ? dbuf_prefetch+0x13a/0x250 [zfs]
[  110.802792]  [<ffffffffc0300925>] ? dmu_tx_fini+0x115/0x1a0 [zfs]
[  110.802798]  [<ffffffffc0301166>] ? dmu_zfetch+0x4d6/0x1990 [zfs]
[  110.802802]  [<ffffffffc02eb397>] ? dbuf_read+0x5a7/0xda0 [zfs]
[  110.802807]  [<ffffffffc02f2465>] ? dmu_return_arcbuf+0x175/0x590 [zfs]
[  110.802812]  [<ffffffffc02f2ee2>] ? dmu_read+0x92/0x320 [zfs]
[  110.802820]  [<ffffffffc03206ea>] ? metaslab_init+0x1ba/0x280 [zfs]
[  110.802829]  [<ffffffffc033c9d8>] ? vdev_metaslab_init+0xf8/0x220 [zfs]
[  110.802836]  [<ffffffffc033fda0>] ? vdev_load+0xb0/0xc0 [zfs]
[  110.802842]  [<ffffffffc033fd1b>] ? vdev_load+0x2b/0xc0 [zfs]
[  110.802851]  [<ffffffffc032d63d>] ? spa_lookup_by_guid+0x149d/0x2a30 [zfs]
[  110.802858]  [<ffffffffc032ef13>] ? spa_tryimport+0x83/0x430 [zfs]
[  110.802865]  [<ffffffffc035d4a9>] ? zfs_secpolicy_smb_acl+0x1a99/0x4ef0 [zfs]
[  110.802870]  [<ffffffffc035fa71>] ? zfs_secpolicy_smb_acl+0x4061/0x4ef0 [zfs]
[  110.802872]  [<ffffffff821f7ef0>] ? do_vfs_ioctl+0x2d0/0x4a0
[  110.802874]  [<ffffffff821e6daf>] ? __fput+0x13f/0x1c0
[  110.802876]  [<ffffffff82203a29>] ? mntput_no_expire+0x9/0x130
[  110.802877]  [<ffffffff820ec224>] ? task_work_run+0x94/0xd0
[  110.802879]  [<ffffffff821f80f6>] ? SyS_ioctl+0x36/0x80
[  110.802881]  [<ffffffff82a45f62>] ? int_signal+0x12/0x17
[  110.802883]  [<ffffffff82a45cd6>] ? system_call_fastpath+0x1a/0x1f

full error message:

http://pastebin.com/G6YkLSeb

is this an zfs- or rather a spl-related issue ?

The text was updated successfully, but these errors were encountered:

behlendorf · 2014-10-07T20:11:13Z

@kernelOfTruth is this the very latest master code? If so this is likely a regression due to e022864 which was just merged.

behlendorf · 2014-10-07T20:15:10Z

That's exactly what happened. Let me push a patch quick to fix it, if you could verify that fix that would be great.

behlendorf · 2014-10-07T20:36:04Z

@kernelOfTruth #2770 addresses the issue. If you could verify the fix that would be great, I'd like to get this merged faster than usual since I'm not keen on leaving the master branch in it's current state for long.

kernelOfTruth · 2014-10-07T22:10:06Z

@behlendorf unfortunately I can't tell anything whether it helps except the following experience:

after booting into the kernel with the patches ZFS code I get stall reported by rcu_preempt

did 4 reboots so far and no luck

1st boot) booting into composited xfce4, opened a few apps (tomboy notes, klipper, gnome-terminal) - during attempt to modify a file (~/.xinitrc) -> a letter was stuck, being repeated - numlock still worked for some time, mouse couldn't move - desktop frozen, then even magic sysrq key didn't work; - only reset helped (via button on PC case)

2nd boot) reboot, zfs mount -a; still in VT; if I remember correctly that was all - then there came an error message - unfortunately the first part of it scrolled by too fast (or even 2 traces) - only the 3rd I could take a picture:

INFO: rcu_preempt detected stalls on CPUs/tasks:
0: (2 GPs behind) idle= ... softirq= ... last_accelerate: ... , nonlazy_posted: ..., ....
1: (1 GPs behind) idle= ... softirq= ... last_accelerate: ... , nonlazy_posted: ..., ....
detected by 3, t=9602 jiffies, g=1827, c=1826, q=29)
Task dump for CPU0:
kworker/0:2

...

Call Trace:
cache_reap
process_one_work
worker_thread
rescuer_thread
kthread+0xca/0xe0
kthread_create_on_node
ret_from_fork
kthread_create_on_node

Task dump for CPU 1:
txg_sync

...
Call Trace:
spa_sync
__wake_up
txg_delay
txg_delay
__thread_exit
__thread_exit
kthread
kthread_create_on_node
ret_from_fork
kthread_create_on_node

hardlock after some time - only reset via button helped

3rd boot) zfs mount -a; /etc/init.d/slim restart (firing up the display manager); then when entering the usual desktop environment LXQt - it would hardlock up after a few seconds of starting up apps (e.g. klipper, tomboy notes, ...)

4th boot) zfs mount -a; /etc/init.d/slim restart (firing up the display manager); then when entering the usual desktop environment LXQt - it would hardlock up after a few seconds of starting up Chromium

After this even booting into the 3.17-based kernel with the zfs-code which had not been patched the locked would occur - this got me worried whether ~/.xinitrc or the pool was broken

Data seems fine so far and I'm currently writing from a 3.16.4-based kernel with latest code from yesterday, (CEST), to be honest I don't know whether the 5176 illumos code change already was included and am a little scared - so I'll stay on this kernel for some time

Sorry for being unable to provide more info (call trace, BUG name, etc.) - tried to get the error message but none is shown - also last kernel messages, etc. can't be shown since it's a hardlock which happens rather fast

kernelOfTruth · 2014-10-07T22:26:40Z

I'll try again and wipe all information related to 3.17 kernels from root and /lib/modules

hopefully that improves things ...

behlendorf · 2014-10-07T23:46:45Z

@kernelOfTruth thanks for the update. The root cause of the original BUG you've posted has been addressed in master. But please let us know if you observe any other strange symptoms.

kernelOfTruth · 2014-10-08T00:47:09Z

@behlendorf

compiled a new kernel with added zram & btrfs patches added on top only and no hardlocks so far

I suspect that there was an issue with the graphics driver (or even virtualbox, since I haven't installed that module this time yet) that interacted badly with ZFS

so far it has survived 2 incremental data transfers via rsync to an Btrfs and an ZFS volume (1.8 TB of data)

dmesg | grep -i zfs
[ 100.907717] ZFS: Loaded module v0.6.3-106_ga215ee1, ZFS pool version 5000, ZFS filesystem version 5

dmesg | grep -i spl
[ 100.893328] SPL: Loaded module v0.6.3-11_g81857a3
[ 100.919120] SPL: using hostid 0x00000000

no additional (problem indicating) messages so far in dmesg

Thanks 👍

kernelOfTruth · 2014-10-09T20:13:33Z

FYI:

There seems to be a deeper issue with 3.17.y kernels and RCU stalls:

http://marc.info/?l=linux-kernel&m=141288512930863 [RCU stalls -> lockup linux-3.17]

so I'm not the only one and it's most probably NOT related to ZFS or SPL

behlendorf added this to the 0.6.4 milestone Oct 7, 2014

behlendorf added Bug - Minor labels Oct 7, 2014

behlendorf closed this as completed in 8878261 Oct 7, 2014

behlendorf mentioned this issue Oct 8, 2014

Fix CPU_SEQID use in preemptible context #2770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.17 kernel, BUG: using smp_processor_id() in preemptible - caller is zio_nowait #2769

3.17 kernel, BUG: using smp_processor_id() in preemptible - caller is zio_nowait #2769

kernelOfTruth commented Oct 7, 2014

behlendorf commented Oct 7, 2014

behlendorf commented Oct 7, 2014

behlendorf commented Oct 7, 2014

kernelOfTruth commented Oct 7, 2014

kernelOfTruth commented Oct 7, 2014

behlendorf commented Oct 7, 2014

kernelOfTruth commented Oct 8, 2014

kernelOfTruth commented Oct 9, 2014

3.17 kernel, BUG: using smp_processor_id() in preemptible - caller is zio_nowait #2769

3.17 kernel, BUG: using smp_processor_id() in preemptible - caller is zio_nowait #2769

Comments

kernelOfTruth commented Oct 7, 2014

behlendorf commented Oct 7, 2014

behlendorf commented Oct 7, 2014

behlendorf commented Oct 7, 2014

kernelOfTruth commented Oct 7, 2014

kernelOfTruth commented Oct 7, 2014

behlendorf commented Oct 7, 2014

kernelOfTruth commented Oct 8, 2014

kernelOfTruth commented Oct 9, 2014