Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arc_adapt hangs in "D" state while reading data from a snapshot #1215

Closed
mailinglists35 opened this issue Jan 17, 2013 · 37 comments
Closed

arc_adapt hangs in "D" state while reading data from a snapshot #1215

mailinglists35 opened this issue Jan 17, 2013 · 37 comments
Milestone

Comments

@mailinglists35
Copy link

I get this combination of "VERIFY(!RW_LOCK_HELD(&l->l_rwlock)) failed" + "zap.c" + "arc_adapt" remaining in D state every time I try to read significant amount data from a snapshot, on every version I've installed since half a year ago or so.

To continue using pool/fs I must hard reboot the machine or shortcut via "echo b > /proc/sysrq-trigger"

Reposting latest kernel log, echo t > /proc/sysrq-trigger and spl dump. I can provide ssh if needed or run debugging version etc
As logs says, running daily ppa 0.6.0.92-0ubuntu1~oneiric1 on debian 3.2.0-4-amd64

http://pastebin.com/nQJ4Kcup
http://dl.transfer.ro/transfer_ro-17jan-2bae4ca17cf4.zip


root@homerouter:~# zpool get all homebackup |egrep -v 'default|-'
NAME        PROPERTY               VALUE                  SOURCE
homebackup  guid                   9392157595122154710    local
homebackup  version                28                     local
homebackup  failmode               continue               local
homebackup  ashift                 12                     local
homebackup  freeing                0                      local
homebackup  feature@async_destroy  disabled               local
homebackup  feature@empty_bpobj    disabled               local

root@homerouter:~# zfs get all homebackup/homerouter |egrep -v 'default|-'
NAME                   PROPERTY              VALUE                       SOURCE
homebackup/homerouter  mountpoint            /mnt/seagate910/homerouter  local
homebackup/homerouter  atime                 off                         inherited from homebackup
homebackup/homerouter  primarycache          metadata                    inherited from homebackup
homebackup/homerouter  logbias               throughput                  inherited from homebackup
@mailinglists35
Copy link
Author

Also kernel log and dump from from a second machine, daily 0.6.0.92-0ubuntu1~precise1 on ubuntu 3.2.0-35-generic

http://dl.transfer.ro/transfer_ro-17jan-1cd8a1f9976c0341.zip
http://dl.transfer.ro/transfer_ro-17jan-43afbdee1088bc6.zip

@nedbass
Copy link
Contributor

nedbass commented Jan 18, 2013

openzfs/spl#97 reference link to original issue with more debug data

@nedbass
Copy link
Contributor

nedbass commented Jan 18, 2013

@mailinglists35 Can you apply the above debug patch to the SPL and try to reproduce the issue? If you hit the bug please post console output to this issue. Thanks

@mailinglists35
Copy link
Author

Hi,
Thanks for the patch but giving up after losing three sleep hours trying to get zfs to compile.
Tried -Wno-error, -fpermissive, stackoverflow.com, linuxquestions.org, etc, don't know what is the proper way to get this to compile.
Using debian wheezy (=gcc 4.7)

I downloaded:
https://github.com/nedbass/spl/archive/b_issue1215.zip
http://archive.zfsonlinux.org/downloads/zfsonlinux/zfs/zfs-0.6.0-rc13.tar.gz

I compiled spl ok, but zfs refuses to compile here:

make[3]: Entering directory `/usr/src/linux-headers-3.2.0-4-amd64'
CC [M] /usr/local/src/git-zfs/zfs-0.6.0-rc13/module/zfs/../../module/zfs/zfs_vfsops.o
/usr/local/src/git-zfs/zfs-0.6.0-rc13/module/zfs/../../module/zfs/zfs_vfsops.c: In function ‘zfs_sb_teardown’:
/usr/local/src/git-zfs/zfs-0.6.0-rc13/module/zfs/../../module/zfs/zfs_vfsops.c:1040:3: error: implicit declaration of function ‘spl_invalidate_inodes’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[7]: *** [/usr/local/src/git-zfs/zfs-0.6.0-rc13/module/zfs/../../module/zfs/zfs_vfsops.o] Error 1

@behlendorf
Copy link
Contributor

Update ZFS to the latest master source. That symbol was recently removed from the SPL because the latest ZFS version doesn't use it.

@mailinglists35
Copy link
Author

http://fpaste.org/v1Dp/

@mailinglists35
Copy link
Author

ran it twice just to make sure
http://slexy.org/view/s2GZRX4uD4

@mailinglists35
Copy link
Author

the first call trace happens when pressing TAB key in bash to complete an .zfs/sapshot/path
the kernel bug happens when doing a "find" on that path

@mailinglists35
Copy link
Author

Exact console output:

root@homerouter:~# find /mnt/seagate910/homerouter/.zfs/snapshot/zfs-auto-snap_daily-2012-12-25-0640/ 2>&1 > /dev/null

Message from syslogd@homerouter at Jan 21 04:40:09 ...
kernel:[ 283.584787] Oops: 0000 [#1] SMP

Message from syslogd@homerouter at Jan 21 04:40:09 ...
kernel:[ 283.588477] Stack:

Message from syslogd@homerouter at Jan 21 04:40:09 ...
kernel:[ 283.588477] Call Trace:

Message from syslogd@homerouter at Jan 21 04:40:09 ...
kernel:[ 283.588477] Code: 5b 5d e9 d2 7e ca e0 53 48 89 f7 c7 46 28 01 00 00 00 48 89 f3 e8 fa fe ff ff 85 c0 74 1f 48 8b 73 20 48 c7 c7 5a 1b 6f a0 31 c0 <8b> 96 e4 01 00 00 48 81 c6 98 03 00 00 e8 fc 16 ca e0 48 89 df

Message from syslogd@homerouter at Jan 21 04:40:09 ...
kernel:[ 283.588477] CR2: 00000000000001e4

root@homerouter:~# screendump
000
[ 283.588477] 0000000000000000 ffffffffa065a823 ffff8800d62e9d00 ffffffffa065c
160
[ 283.588477] Call Trace:
[ 283.588477] [] ? dbuf_evict_user+0x37/0x5a [zfs]
[ 283.588477] [] ? dbuf_clear+0x14/0x103 [zfs]
[ 283.588477] [] ? dbuf_evict+0x9/0x12 [zfs]
[ 283.588477] [] ? dbuf_do_evict+0x36/0x4d [zfs]
[ 283.588477] [] ? arc_do_user_evicts+0x48/0xbf [zfs]
[ 283.588477] [] ? arc_adjust_meta+0x171/0x171 [zfs]
[ 283.588477] [] ? arc_adapt_thread+0x65/0xb4 [zfs]
[ 283.588477] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
[ 283.588477] [] ? __thread_create+0x2be/0x2be [spl]
[ 283.588477] [] ? kthread+0x76/0x7e
[ 283.588477] [] ? kernel_thread_helper+0x4/0x10
[ 283.588477] [] ? kthread_worker_fn+0x139/0x139
[ 283.588477] [] ? gs_change+0x13/0x13
[ 283.588477] Code: 5b 5d e9 d2 7e ca e0 53 48 89 f7 c7 46 28 01 00 00 00 48 89
f3 e8 fa fe ff ff 85 c0 74 1f 48 8b 73 20 48 c7 c7 5a 1b 6f a0 31 c0 <8b> 96 e4
01 00 00 48 81 c6 98 03 00 00 e8 fc 16 ca e0 48 89 df
[ 283.588477] RIP [] zap_leaf_pageout+0x24/0x78 [zfs]
[ 283.588477] RSP
[ 283.588477] CR2: 00000000000001e4
[ 283.592469] ---[ end trace dc87b297ca9fa742 ]---

nedbass added a commit to nedbass/spl that referenced this issue Jan 21, 2013
If destroying a held lock, log who's holding it before panicking.  If
releasing a lock that's being destroyed, dump a stack trace.  This
should tell us something about what the lock holder is doing when the
lock is destroyed.
@nedbass
Copy link
Contributor

nedbass commented Jan 21, 2013

@mailinglists35 Sorry for the buggy debug patch. I'm guessing it tried to dereference a NULL rwp->rw_owner. I refreshed the patch to avoid that. Would you mind trying it again with the updated patch? Thanks

@mailinglists35
Copy link
Author

root@homerouter:~# find /mnt/seagate910/homerouter/.zfs/snapshot/zfs-auto-snap_hourly-2013-01-20-1517/ 2>&1 > /dev/null
Jan 21 21:09:30 homerouter kernel: [59218.492562] Issue 1215: freeing rwlock held by ?(0)
Jan 21 21:09:30 homerouter kernel: [59218.492618] VERIFY(!RW_LOCK_HELD(&l->l_rwlock)) failed

Message from syslogd@homerouter at Jan 21 21:09:30 ...
kernel:[59218.492618] VERIFY(!RW_LOCK_HELD(&l->l_rwlock)) failed
Jan 21 21:09:30 homerouter kernel: [59218.492664] SPLError: 10011:0:(zap.c:445:zap_leaf_pageout()) SPL PANIC
Jan 21 21:09:30 homerouter kernel: [59218.492709] SPL: Showing stack for process 10011
Jan 21 21:09:30 homerouter kernel: [59218.492760] Pid: 10011, comm: arc_adapt Tainted: P W O 3.2.0-4-amd64 #1 Debian 3.2.35-2
Jan 21 21:09:30 homerouter kernel: [59218.492828] Call Trace:
Jan 21 21:09:30 homerouter kernel: [59218.492876] [] ? spl_debug_dumpstack+0x24/0x2a [spl]
Jan 21 21:09:30 homerouter kernel: [59218.492925] [] ? spl_debug_bug+0x7f/0xc8 [spl]
Jan 21 21:09:30 homerouter kernel: [59218.492996] [] ? zap_leaf_pageout+0x78/0x86 [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493054] [] ? dbuf_evict_user+0x37/0x5a [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493112] [] ? dbuf_clear+0x14/0x103 [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493170] [] ? dbuf_evict+0x9/0x12 [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493227] [] ? dbuf_do_evict+0x36/0x4d [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493282] [] ? arc_do_user_evicts+0x48/0xbf [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493340] [] ? arc_adjust_meta+0x171/0x171 [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493427] [] ? arc_adapt_thread+0x65/0xb4 [zfs]
Jan 21 21:09:30 homerouter kernel: [59218.493486] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
Jan 21 21:09:30 homerouter kernel: [59218.493545] [] ? __thread_create+0x2be/0x2be [spl]
Jan 21 21:09:30 homerouter kernel: [59218.493597] [] ? kthread+0x76/0x7e
Jan 21 21:09:30 homerouter kernel: [59218.493707] [] ? kernel_thread_helper+0x4/0x10
Jan 21 21:09:30 homerouter kernel: [59218.493753] [] ? kthread_worker_fn+0x139/0x139
Jan 21 21:09:30 homerouter kernel: [59218.493799] [] ? gs_change+0x13/0x13
Jan 21 21:09:30 homerouter kernel: [59218.494177] SPL: Dumping log to /tmp/spl-log.1358791770.10011

Message from syslogd@homerouter at Jan 21 21:09:30 ...
kernel:[59218.492664] SPLError: 10011:0:(zap.c:445:zap_leaf_pageout()) SPL PANIC

If I uploaded the spl dump, http://dl.transfer.ro/transfer_ro-21jan-95eb2d56ac.zip is the kernel stack trace still needed?
Uploading anyway http://fpaste.org/jOOW/

@nedbass
Copy link
Contributor

nedbass commented Jan 21, 2013

Thanks for the testing. I was hoping to identify which process was holding the lock being freed, but your results suggest no one is in fact holding it. Perhaps we have a race in the test itself.

@nedbass
Copy link
Contributor

nedbass commented Jan 21, 2013

Can you post the output of

grep CONFIG_RWSEM_GENERIC_SPINLOCK /boot/config-3.2.0*

@mailinglists35
Copy link
Author

# CONFIG_RWSEM_GENERIC_SPINLOCK is not set

@nedbass
Copy link
Contributor

nedbass commented Jan 22, 2013

Also can you post the config.log file from your spl build directory? Thanks

@mailinglists35
Copy link
Author

@nedbass
Copy link
Contributor

nedbass commented Jan 22, 2013

So a NULL rw_owner means someone may be holding the rwlock as a reader. Since we never see the debug message about dropping a freed lock, perhaps someone takes the lock and forgets to drop it. Looking around the code for places where this might happen, I think I've found one case in zap.c:

1245 int                                                                             
1246 fzap_cursor_move_to_key(zap_cursor_t *zc, zap_name_t *zn)                       
1247 {                                                                               
1248         int err;                                                                
1249         zap_leaf_t *l;                                                          
1250         zap_entry_handle_t zeh;                                                 
1251                                                                                 
1252         if (zn->zn_key_orig_numints * zn->zn_key_intlen > ZAP_MAXNAMELEN)       
1253                 return (ENAMETOOLONG);                                          
1254                                                                                 
1255         err = zap_deref_leaf(zc->zc_zap, zn->zn_hash, NULL, RW_READER, &l);     
1256         if (err != 0)                                                           
1257                 return (err);                                                   
1258
/*****************  Now holding l->l_rwlock as reader via
 *****************  zap_deref_leaf->zap_get_leaf_byblk->rw_enter */
1259         err = zap_leaf_lookup(l, zn, &zeh);                                     
1260         if (err != 0)                                                           
1261                 return (err);  /* Missing  zap_put_leaf(l) ? */
1262                                                                                 
1263         zc->zc_leaf = l;                                                        
1264         zc->zc_hash = zeh.zeh_hash;                                             
1265         zc->zc_cd = zeh.zeh_cd;                                                 
1266                                                                                 
1267         return (err);                                                           
1268 }                                                                               
1269 

Have no idea if this happens in the case of this bug, but if there is in fact a missing put here we could fix it and see if it helps.

@behlendorf
Copy link
Contributor

@nedbass This does look like a real bug but it's unclear to me how it could be a problem in this context. The {f}zap_cursor_move_to_key() functions were added by Ricardo for use by Lustre and are not used by the ZPL.

Also by my reading of this code we should be dropping the leaf->l_rwlock on failure and success in fzap_cursor_move_to_key(). See fzap_cursor_retrieve() is basically performs the same operation and is careful to drop the lock before returning. So are all other callers of zap_deref_leaf(). All paths here should exit through rw_exit(&zc->zc_leaf->l_rwlock);.

@nedbass
Copy link
Contributor

nedbass commented Jan 22, 2013

@behlendorf thanks for the analysis.

Note that zfsctl_snapdir_lookup() calls dmu_snapshot_id() which calls zap_cursor_move_to_key().

@behlendorf
Copy link
Contributor

Ahh, somehow I missed that call site. Your right. Then this is a very good explanation for the issue, this is all ZoL specific code. I'd expect that dropping the lock as described above should resolve the issue.

@nedbass
Copy link
Contributor

nedbass commented Jan 22, 2013

@mailinglists35 please try the above patch. Thanks

@mailinglists35
Copy link
Author

thanks. I might be rushing to report, but apparently the patch resolved the issue.
I'll test the second machine in a couple of hours.

however now I get this when first accessing the .zfs/snapshot directory (pressing tab autocomplete in bash)
I'll update the comment after I check the second box. iirc this happened before but got somehow fixed recently.


Jan 23 02:37:14 homerouter kernel: [104702.400761] ------------[ cut here ]-----
-------
Jan 23 02:37:14 homerouter kernel: [104702.400810] WARNING: at /build/buildd-lin
ux_3.2.35-2-amd64-v9djlH/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x2a/0x96()
Jan 23 02:37:14 homerouter kernel: [104702.400876] Hardware name: HP Compaq dc58
00 Microtower
Jan 23 02:37:14 homerouter kernel: [104702.400918] Modules linked in: zfs(P) zun
icode(P) zavl(P) zcommon(P) znvpair(P) spl(O) zlib_deflate nf_conntrack_netlink
nfnetlink pppoe pppox ppp_generic slhc bluetooth drbd lru_cache tun ip6table_filter ip6_tables ebtable_nat ebtables ipt_LOG ipt_REJECT xt_recent xt_multiport xt_conntrack iptable_filter xt_TCPMSS xt_tcpmss iptable_mangle ipt_MASQUERADE xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse bridge stp smsc47b397 loop kvm_intel kvm tpm_infineon snd_hda_codec_analog hp_wmi sparse_keymap rfkill evdev asix snd_hda_intel snd_hda_codec coretemp snd_hwdep snd_pcm_oss tpm_tis snd_mixer_oss tpm tpm_bios mcs7830 usbnet psmouse acpi_cpufreq mperf snd_pcm serio_raw wmi i915 processor video button drm_kms_helper drm thermal_sys i2c_algo_bit iTCO_wdt i2c_core iTCO_vendor_support snd_page_alloc snd_timer snd pcspkr soundcor
Jan 23 02:37:14 homerouter kernel: e ext4 crc16 jbd2 mbcache md_mod ses enclosure dm_mirror dm_region_hash dm_log dm_mod sg sd_mod crc_t10dif usb_storage floppy e100 mii ata_generic uhci_hcd ata_piix xhci_hcd ehci_hcd libata scsi_mod e1000e usbcore usb_common [last unloaded: zlib_deflate]
Jan 23 02:37:14 homerouter kernel: [104702.403731] Pid: 20801, comm: bash Tainted: P           O 3.2.0-4-amd64 #1 Debian 3.2.35-2
Jan 23 02:37:14 homerouter kernel: [104702.403792] Call Trace:
Jan 23 02:37:14 homerouter kernel: [104702.403834]  [] ? warn_slowpath_common+0x78/0x8c
Jan 23 02:37:14 homerouter kernel: [104702.403879]  [] ? d_set_d_op+0x2a/0x96
Jan 23 02:37:14 homerouter kernel: [104702.403943]  [] ? zpl_snapdir_lookup+0x7b/0x91 [zfs]
Jan 23 02:37:14 homerouter kernel: [104702.403990]  [] ? d_alloc_and_lookup+0x3a/0x60
Jan 23 02:37:14 homerouter kernel: [104702.404061]  [] ? walk_component+0x219/0x406
Jan 23 02:37:14 homerouter kernel: [104702.404107]  [] ? path_lookupat+0x7c/0x2bd
Jan 23 02:37:14 homerouter kernel: [104702.404153]  [] ? should_resched+0x5/0x23
Jan 23 02:37:14 homerouter kernel: [104702.404206]  [] ? _cond_resched+0x7/0x1c
Jan 23 02:37:14 homerouter kernel: [104702.404257]  [] ? do_path_lookup+0x1c/0x87
Jan 23 02:37:14 homerouter kernel: [104702.404311]  [] ? user_path_at_empty+0x47/0x7b
Jan 23 02:37:14 homerouter kernel: [104702.404364]  [] ? do_page_fault+0x2fc/0x337
Jan 23 02:37:14 homerouter kernel: [104702.404368]  [] ? vma_link+0x6d/0x8b
Jan 23 02:37:14 homerouter kernel: [104702.404372]  [] ? vfs_fstatat+0x32/0x60
Jan 23 02:37:14 homerouter kernel: [104702.404375]  [] ? __set_task_blocked+0x5e/0x65
Jan 23 02:37:14 homerouter kernel: [104702.404379]  [] ? sys_newstat+0x12/0x2b
Jan 23 02:37:14 homerouter kernel: [104702.404382]  [] ? set_current_blocked+0x2d/0x43
Jan 23 02:37:14 homerouter kernel: [104702.404385]  [] ? sigprocmask+0x5c/0x63
Jan 23 02:37:14 homerouter kernel: [104702.404387]  [] ? sys_rt_sigprocmask+0x63/0x9d
Jan 23 02:37:14 homerouter kernel: [104702.404391]  [] ? system_call_fastpath+0x16/0x1b
Jan 23 02:37:14 homerouter kernel: [104702.404393] ---[ end trace 3e9bd0ee9d143772 ]---
Jan 23 02:37:14 homerouter kernel: [104702.404395] ------------[ cut here ]------------
Jan 23 02:37:14 homerouter kernel: [104702.404397] WARNING: at /build/buildd-linux_3.2.35-2-amd64-v9djlH/linux-3.2.35/fs/dcache.c:1295 d_set_d_op+0x50/0x96()
Jan 23 02:37:14 homerouter kernel: [104702.404400] Hardware name: HP Compaq dc5800 Microtower
Jan 23 02:37:14 homerouter kernel: [104702.404401] Modules linked in: zfs(P) zunicode(P) zavl(P) zcommon(P) znvpair(P) spl(O) zlib_deflate nf_conntrack_netlink nfnetlink pppoe pppox ppp_generic slhc bluetooth drbd lru_cache tun ip6table_filter ip6_tables ebtable_nat ebtables ipt_LOG ipt_REJECT xt_recent xt_multiport xt_conntrack iptable_filter xt_TCPMSS xt_tcpmss iptable_mangle ipt_MASQUERADE xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse bridge stp smsc47b397 loop kvm_intel kvm tpm_infineon snd_hda_codec_analog hp_wmi sparse_keymap rfkill evdev asix snd_hda_intel snd_hda_codec coretemp snd_hwdep snd_pcm_oss tpm_tis snd_mixer_oss tpm tpm_bios mcs7830 usbnet psmouse acpi_cpufreq mperf snd_pcm serio_raw wmi i915 processor video button drm_kms_helper drm thermal_sys i2c_algo_bit iTCO_wdt i2c_core iTCO_vendor_support snd_page_alloc snd_timer snd pcspkr soundcor
Jan 23 02:37:14 homerouter kernel: e ext4 crc16 jbd2 mbcache md_mod ses enclosure dm_mirror dm_region_hash dm_log dm_mod sg sd_mod crc_t10dif usb_storage floppy e100 mii ata_generic uhci_hcd ata_piix xhci_hcd ehci_hcd libata scsi_mod e1000e usbcore usb_common [last unloaded: zlib_deflate]
Jan 23 02:37:14 homerouter kernel: [104702.404455] Pid: 20801, comm: bash Tainted: P        W  O 3.2.0-4-amd64 #1 Debian 3.2.35-2
Jan 23 02:37:14 homerouter kernel: [104702.404457] Call Trace:
Jan 23 02:37:14 homerouter kernel: [104702.404460]  [] ? warn_slowpath_common+0x78/0x8c
Jan 23 02:37:14 homerouter kernel: [104702.404463]  [] ? d_set_d_op+0x50/0x96
Jan 23 02:37:14 homerouter kernel: [104702.404479]  [] ? zpl_snapdir_lookup+0x7b/0x91 [zfs]
Jan 23 02:37:14 homerouter kernel: [104702.404482]  [] ? d_alloc_and_lookup+0x3a/0x60
Jan 23 02:37:14 homerouter kernel: [104702.404485]  [] ? walk_component+0x219/0x406
Jan 23 02:37:14 homerouter kernel: [104702.404488]  [] ? path_lookupat+0x7c/0x2bd
Jan 23 02:37:14 homerouter kernel: [104702.404491]  [] ? should_resched+0x5/0x23
Jan 23 02:37:14 homerouter kernel: [104702.404494]  [] ? _cond_resched+0x7/0x1c
Jan 23 02:37:14 homerouter kernel: [104702.404497]  [] ? do_path_lookup+0x1c/0x87
Jan 23 02:37:14 homerouter kernel: [104702.404500]  [] ? user_path_at_empty+0x47/0x7b
Jan 23 02:37:14 homerouter kernel: [104702.404504]  [] ? do_page_fault+0x2fc/0x337
Jan 23 02:37:14 homerouter kernel: [104702.404507]  [] ? vma_link+0x6d/0x8b
Jan 23 02:37:14 homerouter kernel: [104702.404510]  [] ? vfs_fstatat+0x32/0x60
Jan 23 02:37:14 homerouter kernel: [104702.404513]  [] ? __set_task_blocked+0x5e/0x65
Jan 23 02:37:14 homerouter kernel: [104702.404516]  [] ? sys_newstat+0x12/0x2b
Jan 23 02:37:14 homerouter kernel: [104702.404518]  [] ? set_current_blocked+0x2d/0x43
Jan 23 02:37:14 homerouter kernel: [104702.404521]  [] ? sigprocmask+0x5c/0x63
Jan 23 02:37:14 homerouter kernel: [104702.404524]  [] ? sys_rt_sigprocmask+0x63/0x9d
Jan 23 02:37:14 homerouter kernel: [104702.404527]  [] ? system_call_fastpath+0x16/0x1b
Jan 23 02:37:14 homerouter kernel: [104702.404529] ---[ end trace 3e9bd0ee9d143773 ]---

@nedbass
Copy link
Contributor

nedbass commented Jan 23, 2013

Jan 23 02:37:14 homerouter kernel: [104702.400810] WARNING: at /build/buildd-linux_3.2.35-2-amd64-v9djlH/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x2a/0x96()

That is #1230, a regression recently introduced into master. I believe it should be harmless aside from the scary backtrace, but a fix for it should land soon.

Glad to hear the initial test results are good.

@mailinglists35
Copy link
Author

tested the second box, spl no longer panics. I have started a read of all snapshots overnight on both machines and post update when finished, but as far as it seems the bug is gone. where do you want the beer delivered? :)

nedbass added a commit to nedbass/zfs that referenced this issue Jan 23, 2013
Callers of zap_deref_leaf() must be careful to drop leaf->l_rwlock
since that function returns with the lock held on success.  All other
callers drop the lock correctly but it seems fzap_cursor_move_to_key()
does not.  This may block writers or cause VERIFY failures when the
lock is freed.

Fixes openzfs#1215
Fixes openzfs/spl#143
Fixes openzfs/spl#97
@ahrens
Copy link
Member

ahrens commented Jan 23, 2013

I agree that fzap_cursor_move_to_key() should drop the lock. However, I believe that this routine is being used incorrectly, and should be removed.

dmu_snapshot_id() should return the ID of the named snapshot. However, if the snapshot does not exist, it returns a different snapshot's ID. Instead, we should use dsl_dataset_snap_lookup().

@mailinglists35
Copy link
Author

I let this run overnight on both machines:
cd /path/to/.zfs/snapshot; ls -1|while read dir; do rsync -axPH -n --delete / ./$dir; sleep 1; done

one machine is now with arc_adapt running 100% cpu and on the other rsync is blocked in D state and no other zfs commands complete and some kernel warnings on dmesg. will post details when I get some time.

@nedbass
Copy link
Contributor

nedbass commented Jan 23, 2013

@ahrens would it not be a layering violation to use dsl_dataset_snap_lookup() from the POSIX layer? The ZFS architectural diagram shows the ZPL accessing DSL indirectly through the ZAP and DMU layers:

zfs-components

Also, I'm not sure I understand your comment about dmu_snapshot_id() returning a different snapshot's ID. It looks to me like it returns ENOENT and leaves *idp unmodified if the snapshot doesn't exist.

@ahrens
Copy link
Member

ahrens commented Jan 23, 2013

@nedbass dmu_snapshot_id() moves the cursor to the given snapshot name, then it retrieves the cursor. If the snapshot is deleted after the call to zap_cursor_move_to_key() but before the call to zap_cursor_retrieve(), the entry you expect will not exist when zap_cursor_retrieve() is called, so it will return the next entry.

As far as layering goes, you could make dmu_snapshot_id() be a one-liner:

    return (dsl_dataset_snap_lookup(dmu_objset_ds(os), snapname, idp));

Although personally I'd just inline that into the one caller. The layering diagram is idealized, and I don't think it really hurts anything for the ZPL to pass the dataset to the DSL (as evidenced by many other uses of dmu_objset_ds() in the ZPL and ZIL).

@nedbass
Copy link
Contributor

nedbass commented Jan 23, 2013

@ahrens Thanks for the helpful explanation.

@behlendorf
Copy link
Contributor

@ahrens I know this functionality was originally added for Lustre, but after looking at the latest Lustre source I don't see any zap_cursor_move_to_key() consumers. I'll verify with the Lustre guys if they still need this functionality and if they don't we can consider dropping it. In the meanwhile there's no good reason to leave the function broken so I'll pull in @nedbass's locking fix.

But you raise a good point about dmu_snapshot_id(). This function which was originally added for the ZoL snapshot implementation when it was still quite rough. You may have noticed that ZoL is forced to handle the .zfs directories quite differently than OpenSolaris to accommodate Linux. But I'm all for cleaning up this cruft.

It looks like OpenSolaris's zfsctl_snapdir_lookup() relies on dmu_objset_hold() which we could use to get the objset id. Although that seems a bit heavy weight. Or we could just call dsl_dataset_snap_lookup() directory, although since it's static we'd need to add it to the headers. I'll rework this code a little bit and try and come up with something a big cleaner. With a little work we can probably address #1133 too. That often confuses people since until they traverse in to the directory it won't be mounted and they won't see the attributes for the root inode.

@mailinglist35 If you can post stack traces we can identify if this are different issues.

@mailinglists35
Copy link
Author

stack traces:
http://slexy.org/view/s20gQaMmc8 - note the kswapd0 in D state
http://paste.kde.org/654986 - note the arc_adapt in R state

oh, maybe this helps, there are some nonstandard kernel values on each box:


root@homerouter:~# cat /etc/sysctl.d/*conf|egrep -v '^#|^ *$'
vm.swappiness=0
net.ipv4.ip_forward=1
vm.overcommit_memory=2
net.ipv6.conf.all.disable_ipv6=1
kernel.hung_task_timeout_secs=15


root@mailhost:~# cat /etc/sysctl.d/*conf|egrep -v '^#|^ *$'
kernel.printk = 4 4 1 7
net.ipv6.conf.all.use_tempaddr = 2
net.ipv6.conf.default.use_tempaddr = 2
kernel.kptr_restrict = 1
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.tcp_syncookies=1
kernel.yama.ptrace_scope = 1
vm.mmap_min_addr = 65536
net.core.wmem_max=1048576
net.core.rmem_max=1048576
net.core.wmem_default=1048576
net.core.rmem_default=1048576
net.ipv4.tcp_mem=1048576
net.ipv4.tcp_rmem=1048576
net.ipv4.tcp_wmem=1048576
kernel.hung_task_timeout_secs=15

@nedbass
Copy link
Contributor

nedbass commented Jan 23, 2013

@mailinglists35 The first log (homerouter) looks like #1224, so probably unrelated to this bug.

As to the second log (mailhost) I guess the arc_adapt is busy trying to reclaim cached metadata, similar to the report in #790. Again it looks unrelated to this bug, but you could try tuning zfs_arc_meta_limit, or just avoid such intensive metadata workloads on that box.

@maxximino
Copy link
Contributor

@nedbass , in mailhost SPL Panic'ed (see line 823). Maybe there is another use of the lock without release?

(@nedbass, sorry for the noise, I didn't see the reboot before your message)

@nedbass
Copy link
Contributor

nedbass commented Jan 23, 2013

@maxximino I thought that at first too, but the box was rebooted since then (around line 4934)

@mailinglists35
Copy link
Author

@nedbass apologies, I forgot to rotate the logs first.

@mailinglists35
Copy link
Author

thanks everyone. I'll wait for the patch to be merged & the ppa package updated and then move on to other suggested issues

dajhorn referenced this issue in zfsonlinux/pkg-zfs Jan 24, 2013
Callers of zap_deref_leaf() must be careful to drop leaf->l_rwlock
since that function returns with the lock held on success.  All other
callers drop the lock correctly but it seems fzap_cursor_move_to_key()
does not.  This may block writers or cause VERIFY failures when the
lock is freed.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes #1215
Closes openzfs/spl#143
Closes openzfs/spl#97
@behlendorf
Copy link
Contributor

The remaining dmu_snapshot_id() change has been filed as #1238.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
Callers of zap_deref_leaf() must be careful to drop leaf->l_rwlock
since that function returns with the lock held on success.  All other
callers drop the lock correctly but it seems fzap_cursor_move_to_key()
does not.  This may block writers or cause VERIFY failures when the
lock is freed.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1215
Closes openzfs/spl#143
Closes openzfs/spl#97
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
Retire the dmu_snapshot_id() function which was introduced in the
initial .zfs control directory implementation.  There is already
an existing dsl_dataset_snap_lookup() which does exactly what we
need, and the dmu_snapshot_id() function as implemented is racy.

openzfs#1215 (comment)

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1238
FransUrbo pushed a commit to FransUrbo/pkg-zfsonlinux-zfs that referenced this issue Dec 31, 2013
Retire the dmu_snapshot_id() function which was introduced in the
initial .zfs control directory implementation.  There is already
an existing dsl_dataset_snap_lookup() which does exactly what we
need, and the dmu_snapshot_id() function as implemented is racy.

openzfs/zfs#1215 (comment)

Signed-off-by: Brian Behlendorf <[email protected]>
Closes #1238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants