segfault when using rollback while doing dd #795

Pinkbyte · 2012-06-23T21:02:59Z

System:

zfstest ~ # uname -a
Linux zfstest 3.2.12-gentoo-ZFSTEST #2 SMP Sun Jun 17 03:40:10 MSK 2012 x86_64 Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz GenuineIntel GNU/Linux

zfstest ~ # cat /proc/meminfo | grep Mem
MemTotal:        4046560 kB
MemFree:         3817324 kB

zfstest ~ # zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
bledge              1.81G  7.91G    30K  none
bledge/root         1.44G  7.91G  1.37G  /
bledge/usr           382M  7.91G    30K  none
bledge/usr/portage   382M  7.91G   382M  /usr/portage

zfstest ~ # zfs list -t snapshot
NAME                            USED  AVAIL  REFER  MOUNTPOINT
bledge/root@before_updates      658K      -  1.36G  -
bledge/root@eix_installed       112K      -  1.36G  -
bledge/root@eix_cache_created   115K      -  1.37G  -
bledge/root@world_updated       192K      -  1.37G  -
bledge/root@sshd_enabled         73K      -  1.37G  -
bledge/root@create_test_sh       78K      -  1.37G  -

ZFS and SPL versions: 0.6.0_rc9

How to model this situation(it happens not every times, but very often):

open 2 terminals;
write in first terminal: dd if=/dev/zero of=/zerofile bs=1M
write in second terminal: zfs rollback bledge/root@create_test_sh

dd interrupts with input/output error, but i think it is ok for this situation...

Second terminal:

zfstest ~ # zfs rollback bledge/root@create_test_sh
internal error: unable to open /etc/mtab
zfstest ~ # ls /etc/mtab
ls: cannot access /etc/mtab: Input/output error
zfstest ~ # rm /etc/mtab
Segmentation fault

First terminal:

general protection fault: 0000 [#1] SMP
CPU 1
Modules linked in: zfs(P) zunicode(P) zavl(P) zcommon(P) znvpair(P) spl(O) scsi_wait_scan

Pid: 3393, comm: rm Tainted: P W O 3.2.12-gentoo-ZFSTEST #2 Bochs Bochs
RIP: 0010:[<ffffffffa01221eb>] [<ffffffffa01221eb>] zfs_inode_destroy+0x6b/0x100 [zfs]
RSP: 0018:ffff8801143cda78 EFLAGS: 00010282
RAX: ffff8800c90e4980 RBX: ffff8800c90e49a8 RCX: dead000000100100
RDX: dead000000200200 RSI: dead000000100100 RDI: ffff8800cab29420
RBP: ffff8800cab29000 R08: ff18f1e80b868e03 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c90e4800
R13: ffff8800cab29420 R14: 0000000000000600 R15: 0000000000000000
FS: 00007f069cb3f700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f069c66d3b0 CR3: 00000001143b6000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rm (pid: 3393, threadinfo ffff8801143cc000, task ffff8800c6aac1a0)
Stack:
 0000000000000000 ffff8800cab29000 ffff8800c90e49a8 0000000000000000
 ffff8800c90ae180 ffffffffa01227dc 000000000000002c 000000000001d02c
 ffff8800c90e4920 0000000000000000 0000000000050008 ffff8800c8366e08
Call Trace:
 [<ffffffffa01227dc>] ? zfs_inode_update+0x55c/0x670 [zfs]
 [<ffffffffa0124a04>] ? zfs_zget+0x154/0x1e0 [zfs]
 [<ffffffffa01056f0>] ? zfs_dirent_lock+0x470/0x570 [zfs]
 [<ffffffffa011d72d>] ? zfs_remove+0x12d/0x450 [zfs]
 [<ffffffff815f5f95>] ? _raw_spin_lock+0x5/0x10
 [<ffffffffa0132010>] ? zpl_fallocate_common+0x610/0x710 [zfs]
 [<ffffffff810eb1dd>] ? vfs_unlink+0x8d/0x100
 [<ffffffff810eb3f1>] ? do_unlinkat+0x1a1/0x1d0
 [<ffffffff810e1c4f>] ? sys_newfstatat+0x1f/0x50
 [<ffffffff815f68d2>] ? system_call_fastpath+0x16/0x1b
Code: ad 20 04 00 00 4c 89 ef e8 e3 28 4d e1 48 be 00 01 10 00 00 00 ad de 4c 89 ef 4c 89 e0 48 03 85 00 04 00 00 48 8b 08 48 8b 50 08 <48> 89 51 08 48 89 0a 48 89 30 48 b9 00 02 20 00 00 00 ad de 48
RIP [<ffffffffa01221eb>] zfs_inode_destroy+0x6b/0x100 [zfs]
 RSP <ffff8801143cda78>
---[ end trace 596d4c2a441ea645 ]---

After that, any command in already opened terminal stucks that terminal, only reboot helps...

The text was updated successfully, but these errors were encountered:

behlendorf · 2012-06-29T19:30:13Z

Thanks for the reproducer it should make it easier to run down.

nedbass · 2012-12-27T18:56:15Z

The following procedure reliably reproduces a variation of this bug for me.

I'm not sure if rolling back a mounted filesystem is even supposed to work. The old Solaris ZFS administration guide says a mounted filesystem is unmounted and remounted, and the rollback fails if the unmount fails. However the current code doesn't seem to attempt this, so that documentation may be obsolete. Also, I verified that this test works on Open Indiana, so there may be a reasonable way to fix this on Linux.

zfs create tank/fish
zfs snapshot tank/fish@a
touch /tank/fish/a
tail -f /tank/fish/a &
sleep 1
zfs rollback tank/fish@a
sleep 1
touch /tank/fish/b

The last touch triggers a WARNING and GPF then segfaults:

[  365.184184] ------------[ cut here ]------------
[  365.184193] WARNING: at /build/buildd/linux-3.5.0/fs/inode.c:964 unlock_new_inode+0x79/0x90()
[  365.184195] Hardware name: Bochs
[  365.184196] Modules linked in: zfs(PO) zcommon(PO) zunicode(PO) znvpair(PO) zavl(PO) splat(O) spl(O) zlib_deflate stap_1e31e8a81d4a0393e2aa39814f900635_2290(O) parport_pc rfcomm bnep bluetooth ppdev kvm snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi microcode snd_rawmidi snd_seq_midi_event snd_seq snd_timer psmouse snd_seq_device serio_raw snd soundcore virtio_balloon snd_page_alloc i2c_piix4 mac_hid lp parport floppy
[  365.184228] Pid: 2843, comm: touch Tainted: P           O 3.5.0-17-generic #28-Ubuntu
[  365.184229] Call Trace:
[  365.184247]  [<ffffffff81051c4f>] warn_slowpath_common+0x7f/0xc0
[  365.184250]  [<ffffffff81051caa>] warn_slowpath_null+0x1a/0x20
[  365.184253]  [<ffffffff8119bc69>] unlock_new_inode+0x79/0x90
[  365.184289]  [<ffffffffa03d2633>] zfs_znode_alloc+0x5c3/0x780 [zfs]
[  365.184309]  [<ffffffffa03634c6>] ? sa_idx_tab_rele+0x66/0x1b0 [zfs]
[  365.184332]  [<ffffffffa03d30b4>] zfs_mknode+0x8c4/0xed0 [zfs]
[  365.184356]  [<ffffffffa03cb98a>] zfs_create+0x59a/0x760 [zfs]
[  365.184379]  [<ffffffffa03ee7a8>] zpl_create+0xa8/0x1d0 [zfs]
[  365.184382]  [<ffffffff8118eb04>] vfs_create+0xb4/0x120
[  365.184384]  [<ffffffff8118ff4b>] do_last+0x8ab/0xa10
[  365.184388]  [<ffffffff81191399>] path_openat+0xd9/0x430
[  365.184393]  [<ffffffff81689ad6>] ? ftrace_call+0x5/0x2b
[  365.184396]  [<ffffffff81191811>] do_filp_open+0x41/0xa0
[  365.184400]  [<ffffffff8119e906>] ? alloc_fd+0xc6/0x110
[  365.184403]  [<ffffffff81181355>] do_sys_open+0xf5/0x230
[  365.184406]  [<ffffffff811814b1>] sys_open+0x21/0x30
[  365.184408]  [<ffffffff81689d29>] system_call_fastpath+0x16/0x1b
[  365.184410] ---[ end trace c6ce396af0377dc6 ]---
[  365.184424] general protection fault: 0000 [#1] SMP 
[  365.184428] CPU 10 
[  365.184429] Modules linked in: zfs(PO) zcommon(PO) zunicode(PO) znvpair(PO) zavl(PO) splat(O) spl(O) zlib_deflate stap_1e31e8a81d4a0393e2aa39814f900635_2290(O) parport_pc rfcomm bnep bluetooth ppdev kvm snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi microcode snd_rawmidi snd_seq_midi_event snd_seq snd_timer psmouse snd_seq_device serio_raw snd soundcore virtio_balloon snd_page_alloc i2c_piix4 mac_hid lp parport floppy
[  365.184453] 
[  365.184455] Pid: 2843, comm: touch Tainted: P        W  O 3.5.0-17-generic #28-Ubuntu Bochs Bochs
[  365.184461] RIP: 0010:[<ffffffffa03d1d3e>]  [<ffffffffa03d1d3e>] zfs_inode_destroy+0x9e/0x1e0 [zfs]
[  365.184488] RSP: 0018:ffff880051de7708  EFLAGS: 00010292
[  365.184489] RAX: ffff8800527276b0 RBX: ffff8800527276d8 RCX: dead000000100100
[  365.184491] RDX: dead000000200200 RSI: dead000000200200 RDI: ffff880052710530
[  365.184492] RBP: ffff880051de7748 R08: f018000000000000 R09: 00527277780c0000
[  365.184493] R10: ff8f8d9d2491de03 R11: 0000000000000006 R12: ffff880052710000
[  365.184494] R13: ffff880052727528 R14: ffff880052710530 R15: 0000000000000000
[  365.184496] FS:  00007f7389e0e700(0000) GS:ffff88007fd40000(0000) knlGS:0000000000000000
[  365.184498] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  365.184499] CR2: 0000000000407b10 CR3: 00000000791d9000 CR4: 00000000000006e0
[  365.184503] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  365.184506] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  365.184508] Process touch (pid: 2843, threadinfo ffff880051de6000, task ffff880052708000)
[  365.184509] Stack:
[  365.184510]  ffff880051de7778 ffffffff81689ad6 ffffffffa03eed70 0000000000000006
[  365.184514]  ffff8800527276d8 ffff880052727760 ffffffffa0402fe0 ffffffffa0402fe0
[  365.184518]  ffff880051de7778 ffffffffa03eed93 ffff880051de7778 ffffffff8119c19e
[  365.184522] Call Trace:
[  365.184525]  [<ffffffff81689ad6>] ? ftrace_call+0x5/0x2b
[  365.184547]  [<ffffffffa03eed70>] ? zpl_dirty_inode+0x10/0x10 [zfs]
[  365.184569]  [<ffffffffa03eed93>] zpl_inode_destroy+0x23/0x70 [zfs]
[  365.184572]  [<ffffffff8119c19e>] ? __destroy_inode+0x2e/0xf0
[  365.184575]  [<ffffffff8119c29c>] destroy_inode+0x3c/0x70
[  365.184577]  [<ffffffff8119c3f3>] evict+0x123/0x1b0
[  365.184580]  [<ffffffff8119c589>] iput+0x109/0x210
[  365.184602]  [<ffffffffa03d263b>] zfs_znode_alloc+0x5cb/0x780 [zfs]
[  365.184621]  [<ffffffffa03634c6>] ? sa_idx_tab_rele+0x66/0x1b0 [zfs]
[  365.184645]  [<ffffffffa03d30b4>] zfs_mknode+0x8c4/0xed0 [zfs]
[  365.184668]  [<ffffffffa03cb98a>] zfs_create+0x59a/0x760 [zfs]
[  365.184690]  [<ffffffffa03ee7a8>] zpl_create+0xa8/0x1d0 [zfs]
[  365.184693]  [<ffffffff8118eb04>] vfs_create+0xb4/0x120
[  365.184695]  [<ffffffff8118ff4b>] do_last+0x8ab/0xa10
[  365.184699]  [<ffffffff81191399>] path_openat+0xd9/0x430
[  365.184702]  [<ffffffff81689ad6>] ? ftrace_call+0x5/0x2b
[  365.184705]  [<ffffffff81191811>] do_filp_open+0x41/0xa0
[  365.184708]  [<ffffffff8119e906>] ? alloc_fd+0xc6/0x110
[  365.184710]  [<ffffffff81181355>] do_sys_open+0xf5/0x230
[  365.184713]  [<ffffffff811814b1>] sys_open+0x21/0x30
[  365.184716]  [<ffffffff81689d29>] system_call_fastpath+0x16/0x1b
[  365.184717] Code: 84 24 18 05 00 00 0f 84 13 01 00 00 4c 89 e8 49 03 84 24 10 05 00 00 48 be 00 02 20 00 00 00 ad de 4c 89 f7 48 8b 08 48 8b 50 08 <48> 89 51 08 48 89 0a 48 b9 00 01 10 00 00 00 ad de 48 89 08 48 
[  365.184752] RIP  [<ffffffffa03d1d3e>] zfs_inode_destroy+0x9e/0x1e0 [zfs]
[  365.184773]  RSP <ffff880051de7708>
[  365.184776] ---[ end trace c6ce396af0377dc7 ]---

The WARN from unlock_new_inode() is that inode doesn't have the I_NEW flag set:

        WARN_ON(!(inode->i_state & I_NEW));

Regarding the GPF, zfs_inode_destroy+0x9e resolves to __list_del(), so it seems that zfs_inode_destroy() is calling list_remove() on a corrupt zsb->z_all_znodes list.

Finally, I've determined that insert_inode_locked() is returning EBUSY when called from zfs_znode_alloc(), causing control to jump to the error label:

error:
        unlock_new_inode(ip);
        iput(ip);
        return NULL;

where unlock_new_inode() triggers the WARN and iput() leads to the GPF.

behlendorf · 2013-01-17T00:24:23Z

@Pinkbyte If you get a chance can you test #1214 it addresses your rollback issue. It passes all of my testing but a little more never hurts.

Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#795

This functionality is no longer required by ZFS, see commit openzfs/zfs@7b3e34b. Since there are no other consumers, and because it adds additional autoconf complexity which must be maintained the spl_invalidate_inodes() function has been removed. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs/zfs#795

Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes openzfs#795

Blocks are ingested to the zettacache in 2 cases: * when writing a new object, we add all its blocks to the zettacache * when reading a block which is not in the zettacache, we get the object and add all its (not-already-present) blocks to the zettacache In both cases, we are adding many blocks (up to ~300) that are part of the same object. The current code mostly handles each block individually, causing repeated work, especially to lock and unlock various data structures. This commit streamlines the batch insertion of all the (not-already-present) blocks in object. There are 3 main aspects to this: * bulk lookup: `zettacache::Inner::lookup_all_impl()` takes a list of keys and executes a callback for each of them, providing the IndexValue. The Locked lock is obtained at most twice. * bulk insert: `zettacache::Inner::insert_all_impl()` takes a list of keys and buffers, and writes all of them to disk. The Locked lock is obtained once. * bulk LockedKey: the new `RangeLock` is used to lock the range of keys covered by the object. Only one lock is obtained for the object, instead of one lock from the LockSet for each block. The performance of ingesting via reading random blocks is improved by 100-200% (performance is 2-3x what is was before).

behlendorf mentioned this issue Jan 17, 2013

Fix 'zfs rollback' on mounted file systems #1214

Closed

behlendorf closed this as completed in 7b3e34b Jan 17, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault when using rollback while doing dd #795

segfault when using rollback while doing dd #795

Pinkbyte commented Jun 23, 2012

behlendorf commented Jun 29, 2012

nedbass commented Dec 27, 2012

behlendorf commented Jan 17, 2013

segfault when using rollback while doing dd #795

segfault when using rollback while doing dd #795

Comments

Pinkbyte commented Jun 23, 2012

behlendorf commented Jun 29, 2012

nedbass commented Dec 27, 2012

behlendorf commented Jan 17, 2013