Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to handle kernel NULL pointer #5295

Closed
leelists opened this issue Oct 18, 2016 · 20 comments
Closed

Unable to handle kernel NULL pointer #5295

leelists opened this issue Oct 18, 2016 · 20 comments
Milestone

Comments

@leelists
Copy link

leelists commented Oct 18, 2016

Kernel Oops with
spl 0.7.0-rc1_8_g0d26756
zfs 0.7.0-rc1_125_gb60eac3

on ubuntu 16.06 / 4.4.0-43-generic

this system is a zfs receiver and an rsync target

[ 1980.612566] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1980.612570] IP: [] __mutex_lock_slowpath+0x98/0x130
[ 1980.612574] PGD 0
[ 1980.612575] Oops: 0002 [#4] SMP
[ 1980.612577] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables 8021q garp mrp vxlan ip6_udp_tunnel udp_tunnel bridge stp llc binfmt_misc zfs(POE) zunicode(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) kvm_intel kvm ppdev irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper serio_raw cryptd parport_pc parport autofs4 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid10 raid1 e1000e psmouse ahci libahci ptp pps_core
[ 1980.612609] CPU: 1 PID: 885 Comm: z_upgrade Tainted: P D OE 4.4.0-43-generic #63-Ubuntu
[ 1980.612611] Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[ 1980.612612] task: ffff8802142bd280 ti: ffff8800d4a8c000 task.ti: ffff8800d4a8c000
[ 1980.612613] RIP: 0010:[] [] __mutex_lock_slowpath+0x98/0x130
[ 1980.612615] RSP: 0018:ffff8800d4a8fd80 EFLAGS: 00010282
[ 1980.612616] RAX: 0000000000000000 RBX: ffff8800bd635510 RCX: 0000000000000001
[ 1980.612617] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8800bd635514
[ 1980.612618] RBP: ffff8800d4a8fdd0 R08: 0000000000000002 R09: 0000000100150003
[ 1980.612619] R10: ffff88021f3a0300 R11: 0000000000000000 R12: ffff8800bd635514
[ 1980.612620] R13: ffff8802142bd280 R14: 00000000ffffffff R15: ffff8800bd635518
[ 1980.612621] FS: 0000000000000000(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000
[ 1980.612622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1980.612623] CR2: 0000000000000000 CR3: 000000021de0a000 CR4: 00000000000406e0
[ 1980.612624] Stack:
[ 1980.612625] ffff8800bd635518 0000000000000000 ffff8801c8ac3d80 000000003267c79e
[ 1980.612627] 00000000c0284e3a ffff8800bd635510 ffff8800bd635538 ffff8800bd635510
[ 1980.612629] ffff880210db2c60 ffff88021f0da100 ffff8800d4a8fde8 ffffffff8182f73f
[ 1980.612630] Call Trace:
[ 1980.612633] [] mutex_lock+0x1f/0x30
[ 1980.612668] [] dmu_objset_upgrade_task_cb+0x27/0xb0 [zfs]
[ 1980.612673] [] taskq_thread+0x255/0x440 [spl]
[ 1980.612677] [] ? wake_up_q+0x70/0x70
[ 1980.612680] [] ? taskq_cancel_id+0x130/0x130 [spl]
[ 1980.612682] [] kthread+0xd8/0xf0
[ 1980.612684] [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1980.612686] [] ret_from_fork+0x3f/0x70
[ 1980.612687] [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1980.612688] Code: e8 ae 1f 00 00 8b 03 83 f8 01 0f 84 94 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1f 49 c7 45 00 02 00 00 00 4c 89 e7
[ 1980.612706] RIP [] __mutex_lock_slowpath+0x98/0x130
[ 1980.612708] RSP
[ 1980.612708] CR2: 0000000000000000
[ 1980.612710] ---[ end trace bc05199d55331642 ]---

@leelists
Copy link
Author

This seem's to be related to a [z_upgrade] thread

@leelists
Copy link
Author

Same system after recompiling with --enable-debug

[ 1194.503581] VERIFY(!dmu_objset_is_dirty(os, t)) failed
[ 1194.503586] PANIC at dmu_objset.c:767:dmu_objset_evict()
[ 1194.503588] Showing stack for process 1344
[ 1194.503592] CPU: 0 PID: 1344 Comm: txg_sync Tainted: P OE 4.4.0-43-generic #63-Ubuntu
[ 1194.503594] Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[ 1194.503596] 0000000000000286 000000001d8b9c6d ffff88020d1ff7a0 ffffffff813f1f93
[ 1194.503600] ffffffffc0b05a97 00000000000002ff ffff88020d1ff7b0 ffffffffc0215e52
[ 1194.503602] ffff88020d1ff938 ffffffffc0215f1b ffff88020d1ff7d8 ffffffff00000028
[ 1194.503605] Call Trace:
[ 1194.503613] [] dump_stack+0x63/0x90
[ 1194.503625] [] spl_dumpstack+0x42/0x50 [spl]
[ 1194.503632] [] spl_panic+0xbb/0xf0 [spl]
[ 1194.503636] [] ? mutex_lock+0x12/0x30
[ 1194.503688] [] ? dnode_setdirty+0x134/0x4d0 [zfs]
[ 1194.503726] [] ? dbuf_dirty+0x6c0/0x1290 [zfs]
[ 1194.503781] [] ? zap_unlockdir+0x72/0x90 [zfs]
[ 1194.503823] [] dmu_objset_evict+0x53/0x170 [zfs]
[ 1194.503867] [] dsl_dataset_clone_swap_sync_impl+0x210/0xa30 [zfs]
[ 1194.503870] [] ? avl_find+0x61/0xf0 [zavl]
[ 1194.503923] [] ? unique_insert+0x15d/0x1c0 [zfs]
[ 1194.503976] [] ? dsl_dataset_hold_obj+0x7bd/0xb90 [zfs]
[ 1194.504021] [] dsl_dataset_rollback_sync+0x12c/0x1f0 [zfs]
[ 1194.504071] [] ? refcount_add_many+0x84/0xf0 [zfs]
[ 1194.504109] [] ? dbuf_rele+0x38/0x70 [zfs]
[ 1194.504147] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[ 1194.504191] [] ? dsl_dataset_rollback_check+0x1e4/0x290 [zfs]
[ 1194.504238] [] dsl_sync_task_sync+0x11d/0x120 [zfs]
[ 1194.504284] [] dsl_pool_sync+0x526/0x620 [zfs]
[ 1194.504336] [] spa_sync+0x426/0x1210 [zfs]
[ 1194.504390] [] txg_sync_thread+0x3ac/0x6b0 [zfs]
[ 1194.504444] [] ? txg_init+0x270/0x270 [zfs]
[ 1194.504451] [] thread_generic_wrapper+0x7a/0xc0 [spl]
[ 1194.504458] [] ? __thread_exit+0x20/0x20 [spl]
[ 1194.504462] [] kthread+0xd8/0xf0
[ 1194.504464] [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1194.504468] [] ret_from_fork+0x3f/0x70
[ 1194.504470] [] ? kthread_create_on_node+0x1e0/0x1e0

@leelists
Copy link
Author

aren't z_upgrade and zfs rollback compatible ?

@leelists
Copy link
Author

leelists commented Oct 18, 2016

for now i'm disabling zfs receive until upgrade is finished :-(

@behlendorf
Copy link
Contributor

@leelists thanks for reporting this. Yes, this look like a case which was overlooked and slipped though the testing. @jxiong can you please investigate this.

@behlendorf behlendorf added this to the 0.7.0 milestone Oct 18, 2016
@jxiong
Copy link
Contributor

jxiong commented Oct 21, 2016

The first stack trace looks like that the objset has been freed. dmu_objset_userobjspace_upgrade() assumes that the objset has owned by caller, therefore the upgrade task can be canceled at the time of disown. Is it possible that the dataset as zfs receiver is not owned?

@leelists for the 2nd stack trace, is upgrade task still running? I will try to reproduce it on my local node.

jxiong pushed a commit to jxiong/zfs that referenced this issue Oct 24, 2016
'zfs recv' could disown a living objset without calling
dmu_objset_disown(). This will cause the problem that the objset
would be released while upgrading thread is still running.

This patch calls dmu_objset_upgrade_stop() in dmu_recv_end().

ZoL-bug-id: openzfs#5295

Signed-off-by: Jinshan Xiong <[email protected]>
jxiong pushed a commit to jxiong/zfs that referenced this issue Oct 25, 2016
'zfs recv' could disown a living objset without calling
dmu_objset_disown(). This will cause the problem that the objset
would be released while upgrading thread is still running.

This patch avoids the problem by checking if a dataset is snapshot
before calling dmu_objset_userobjspace_upgrade().

ZoL-bug-id: openzfs#5295

Signed-off-by: Jinshan Xiong <[email protected]>
@ab-oe
Copy link
Contributor

ab-oe commented Oct 27, 2017

I recently also hit this issue on the 0.7.2 and reproduced it with following scenario:

  1. Create two pools, each containing one dataset on old ZFS filesystem. One for source and one for destination.
  2. In loop create snapshots every 15s and send them to the destination. (I used znapzend as it do what I want. It executes send and receive with following parameters: zfs send -I src/vol00 autosnap_2017-10-27-094600 src/vol00 autosnap_2017-10-27-094615 | zfs recv -F dst/vol00).
  3. Write data to source. (I used 30GB of small files. This is 1/3 of whole pool size).
  4. Update ZFS.
  5. Reboot the system.
  6. Upgrade ZFS and reboot the system.
  7. After the reboot z_upgrade will be running and znapzend starts to do it's send/receive. The system may crash.
  8. If the z_upgrade is not running anymore and system didn't crash execute zfs umount -a && zfs mount -a.
    Sometimes the 8th step has to be repeated several times before the system crash.

The dmu_recv_end_sync calls the dsl_dataset_clone_swap_sync_impl which evicts origin dataset's objset that can be used by running upgrade thread. This causes following NULL pointer dereference:

[  346.265216] BUG: unable to handle kernel NULL pointer dereference at 0000000000000238
[  346.265222] IP: [<ffffffffa03959d5>] dmu_objset_sync+0x35/0x400 [zfs]
[  346.265282] PGD bb0ac067 PUD b8c71067 PMD 0 
[  346.265286] Oops: 0000 [#1] SMP 
[  346.265289] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) sha256_generic drbg target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) crc32c_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd acpi_cpufreq processor ac button vmxnet3(O) nls_iso8859_1 nls_cp437 sg pvscsi(O) pata_acpi ata_generic vfat fat aufs
[  346.265318] CPU: 1 PID: 7987 Comm: txg_sync Tainted: P           O    4.4.45-oe64-gb9dcf0a #14
[  346.265321] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[  346.265323] task: ffff880219b8b200 ti: ffff8800acaf0000 task.ti: ffff8800acaf0000
[  346.265325] RIP: 0010:[<ffffffffa03959d5>]  [<ffffffffa03959d5>] dmu_objset_sync+0x35/0x400 [zfs]
[  346.265368] RSP: 0018:ffff8800acaf3bf8  EFLAGS: 00010282
[  346.265370] RAX: ffff8800754bf780 RBX: 0000000000000000 RCX: 000000000002a4c0
[  346.265372] RDX: 000000000002a4bf RSI: 0000000000000000 RDI: 000000000000000f
[  346.265373] RBP: ffff880235169f00 R08: 00000000000186c0 R09: 0000000000000000
[  346.265375] R10: ffff880200dc4a00 R11: ffff880221bdc000 R12: ffff8800754bf780
[  346.265376] R13: ffff8801e635ae40 R14: ffff8800acaf3d10 R15: ffff8801e635ae40
[  346.265379] FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[  346.265381] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  346.265382] CR2: 0000000000000238 CR3: 00000000bbabe000 CR4: 00000000000406e0
[  346.265455] Stack:
[  346.265457]  ffff8800aca1c370 ffff8800aca1c350 ffff880235169f00 ffff8800b8fdc000
[  346.265460]  ffff8800ab519a90 ffff8800aca1c448 ffffffff
[10:33:59 AM] ArturP: 346.265463]  ffff880200dc4200 ffff8800aca1c350 ffff880235169f00 ffff8800ab519998
[  346.265466] Call Trace:
[  346.265517]  [<ffffffffa03a3ca5>] ? dnode_setdirty+0xb5/0x100 [zfs]
[  346.265556]  [<ffffffffa038964e>] ? dbuf_dirty+0x40e/0x7f0 [zfs]
[  346.265599]  [<ffffffffa03a7467>] ? dsl_dataset_sync+0x67/0x300 [zfs]
[  346.265645]  [<ffffffffa03b7a5b>] ? dsl_pool_sync+0x8b/0x3c0 [zfs]
[  346.265697]  [<ffffffffa03d3569>] ? spa_sync+0x3d9/0xd00 [zfs]
[  346.265751]  [<ffffffffa03e4e51>] ? txg_sync_thread+0x301/0x4c0 [zfs]
[  346.265804]  [<ffffffffa03e4b50>] ? txg_delay+0x150/0x150 [zfs]
[  346.265832]  [<ffffffffa023f120>] ? thread_generic_wrapper+0x70/0x80 [spl]
[  346.265839]  [<ffffffffa023f0b0>] ? __thread_exit+0x10/0x10 [spl]
[  346.265843]  [<ffffffff81077b7a>] ? kthread+0xca/0xe0
[  346.265845]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  346.265850]  [<ffffffff8170ff8f>] ? ret_from_fork+0x3f/0x70
[  346.265852]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  346.265854] Code: 55 41 54 49 89 f5 55 53 48 89 d5 48 89 fb 48 c7 c2 20 5a 4a a0 31 f6 48 81 ec 90 00 00 00 bf 80 00 00 00 e8 ce 72 ea ff 49 89 c4 <48> 8b 83 38 02 00 00 48 8b 10 49 89 14 24 48 8b 50 08 49 89 54 
[  346.265887] RIP  [<ffffffffa03959d5>] dmu_objset_sync+0x35/0x400 [zfs]
[  346.265930]  RSP <ffff8800acaf3bf8>
[  346.265931] CR2: 0000000000000238
[  346.265934] ---[ end trace b713caff89008bb1 ]---

occasionally it causes the same error as reported by @leelists. This also may happen when the rollback is performed as it calls dsl_dataset_clone_swap_sync_impl.

For testing purposes I patched the ZoL source to return EBUSY when there is an upgradeable objset on dirty dataset and it resolved the issue. The upgrade task can't be stopped here because forcing synchronization causes a deadlock in this path. For the rollback code it seems to be safe to execute dmu_objset_upgrade_stop.

diff --git a/include/sys/dmu_objset.h b/include/sys/dmu_objset.h
index 11b8fc6..e53aa00 100644
--- a/include/sys/dmu_objset.h
+++ b/include/sys/dmu_objset.h
@@ -212,6 +212,7 @@ void dmu_objset_do_userquota_updates(objset_t *os, dmu_tx_t *tx);
 void dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before, dmu_tx_t *tx);
 boolean_t dmu_objset_userused_enabled(objset_t *os);
 int dmu_objset_userspace_upgrade(objset_t *os);
+void dmu_objset_upgrade_stop(objset_t *os);
 boolean_t dmu_objset_userspace_present(objset_t *os);
 boolean_t dmu_objset_userobjused_enabled(objset_t *os);
 boolean_t dmu_objset_userobjspace_upgradable(objset_t *os);
diff --git a/module/zfs/dmu_objset.c b/module/zfs/dmu_objset.c
index 609e43f..ab837ec 100644
--- a/module/zfs/dmu_objset.c
+++ b/module/zfs/dmu_objset.c
@@ -82,7 +82,6 @@ int dmu_rescan_dnode_threshold = 1 << DN_MAX_INDBLKSHIFT;
 static void dmu_objset_find_dp_cb(void *arg);
 
 static void dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb);
-static void dmu_objset_upgrade_stop(objset_t *os);
 
 void
 dmu_objset_init(void)
@@ -1334,7 +1333,7 @@ dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
 	mutex_exit(&os->os_upgrade_lock);
 }
 
-static void
+void
 dmu_objset_upgrade_stop(objset_t *os)
 {
 	mutex_enter(&os->os_upgrade_lock);
diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index 1984e71..dc2a49f 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -3831,10 +3831,18 @@ dmu_recv_end_check(void *arg, dmu_tx_t *tx)
 
 	if (!drc->drc_newfs) {
 		dsl_dataset_t *origin_head;
+		objset_t* os = NULL;
 
 		error = dsl_dataset_hold(dp, drc->drc_tofs, FTAG, &origin_head);
 		if (error != 0)
 			return (error);
+
+		os = origin_head->ds_objset;
+		if (os && dmu_objset_userobjspace_upgradable(os)
+		    && dsl_dataset_is_dirty(origin_head)) {
+			return (SET_ERROR(EBUSY));
+		}
+
 		if (drc->drc_force) {
 			/*
 			 * We will destroy any snapshots in tofs (i.e. before
diff --git a/module/zfs/zfs_ioctl.c b/module/zfs/zfs_ioctl.c
index 5f333a4..8c0b923 100644
--- a/module/zfs/zfs_ioctl.c
+++ b/module/zfs/zfs_ioctl.c
@@ -3751,6 +3751,7 @@ zfs_ioc_rollback(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
 		dsl_dataset_t *ds;
 
 		ds = dmu_objset_ds(zfsvfs->z_os);
+		dmu_objset_upgrade_stop(zfsvfs->z_os);
 		error = zfs_suspend_fs(zfsvfs);
 		if (error == 0) {
 			int resume_err;

@behlendorf
Copy link
Contributor

@ab-oe thanks for investigating this. Since the upgrade only runs when a filesystem is mounted and thus has a long hold I would have thought this case would already be covered by the dmu_recv_end_check() -> dsl_dataset_clone_swap_check_impl() -> dsl_dataset_handoff_check() check. Were you able to determine why this isn't the case?

@ab-oe
Copy link
Contributor

ab-oe commented Oct 30, 2017

@behlendorf I checked the dsl_dataset_handoff_check() the dsl_dataset_long_held() always returns B_FALSE. I did some debug and saw that it goes through following path:

  • mount: zfsvfs_create() -> dmu_objset_own() - takes a long hold of a dataset with a tag ffff8801322aa000 (address of zfsvfs) and starts upgrade task.
  • receive: dsl_dataset_handoff_check() performs a dsl_dataset_long_rele() on this dataset with a tag ffff8801322aa000 so the reference counter for this tag is decreased and now is 0. The dsl_dataset_long_held(ds) returns B_FALSE . The dsl_dataset_long_hold() is called again for this dataset and tag.

There are no other long holds/releases for tested dataset.

The zfs_ioc_userobjspace_upgrade takes additional long hold for whole upgrade task but other places where the dmu_objset_userobjspace_upgrade() is called don't.

@ab-oe
Copy link
Contributor

ab-oe commented Oct 30, 2017

@behlendorf I secured the upgrade code with long hold instead of previous work-around and it also works well. Tests are running for 3h now without any crash. The previous work-around also led to some race conditions that ends with:

[  387.915300] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  387.915306] IP: [<ffffffff8170e10f>] __mutex_lock_slowpath+0x6f/0x100
[  387.915315] PGD 1f7686067 PUD 1f7a64067 PMD 0
[  387.915319] Oops: 0002 [#1] SMP
[  387.915460] Call Trace:
[  387.915542]  [<ffffffffa03bc8aa>] ? txg_wait_synced+0x10a/0x120 [zfs]
[  387.915547]  [<ffffffff8170e1ae>] ? mutex_lock+0xe/0x20
[  387.915590]  [<ffffffffa036b651>] ? dmu_objset_upgrade_task_cb+0x81/0xc0 [zfs]
[  387.915598]  [<ffffffffa015919e>] ? taskq_thread+0x24e/0x450 [spl]
[  387.915601]  [<ffffffff8107e880>] ? wake_up_q+0x60/0x60
[  387.915608]  [<ffffffffa0158f50>] ? taskq_cancel_id+0x100/0x100 [spl]
[  387.915611]  [<ffffffff81077b7a>] ? kthread+0xca/0xe0
[  387.915613]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  387.915616]  [<ffffffff8170ff8f>] ? ret_from_fork+0x3f/0x70
[  387.915619]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  387.915620] Code: e8 37 17 00 00 8b 03 83 f8 01 0f 84 92 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c
89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04
[  387.915647] RIP  [<ffffffff8170e10f>] __mutex_lock_slowpath+0x6f/0x100
[  387.915651]  RSP <ffff88021cdd7db0>
[  387.915653] CR2: 0000000000000000
[  387.915656] ---[ end trace 65a12a065d544c34 ]---
diff --git a/module/zfs/dmu_objset.c b/module/zfs/dmu_objset.c
index 609e43f..3dc320f 100644
--- a/module/zfs/dmu_objset.c
+++ b/module/zfs/dmu_objset.c
@@ -64,6 +64,8 @@
  */
 krwlock_t os_lock;
 
+static char *upgrade_tag = "z_upgrade";
+
 /*
  * Tunable to overwrite the maximum number of threads for the parallelization
  * of dmu_objset_find_dp, needed to speed up the import of pools with many
@@ -1313,6 +1315,7 @@ dmu_objset_upgrade_task_cb(void *data)
        os->os_upgrade_exit = B_TRUE;
        os->os_upgrade_id = 0;
        mutex_exit(&os->os_upgrade_lock);
+       dsl_dataset_long_rele(dmu_objset_ds(os), upgrade_tag);
 }
 
 static void
@@ -1321,6 +1324,7 @@ dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
        if (os->os_upgrade_id != 0)
                return;
 
+       dsl_dataset_long_hold(dmu_objset_ds(os), upgrade_tag);
        mutex_enter(&os->os_upgrade_lock);
        if (os->os_upgrade_id == 0 && os->os_upgrade_status == 0) {
                os->os_upgrade_exit = B_FALSE;
@@ -1328,8 +1332,10 @@ dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
                os->os_upgrade_id = taskq_dispatch(
                    os->os_spa->spa_upgrade_taskq,
                    dmu_objset_upgrade_task_cb, os, TQ_SLEEP);
-               if (os->os_upgrade_id == TASKQID_INVALID)
+               if (os->os_upgrade_id == TASKQID_INVALID) {
+                       dsl_dataset_long_rele(dmu_objset_ds(os), upgrade_tag);
                        os->os_upgrade_status = ENOMEM;
+               }
        }
        mutex_exit(&os->os_upgrade_lock);
 }

@behlendorf
Copy link
Contributor

@ab-oe OK, that makes sense. Once you're happy with your local tests it would be great if you could open a PR for review with the long hold version of the fix. There existing a similar test case in tests/functional/upgrade/ where you can add the test case you posted above.

@ab-oe
Copy link
Contributor

ab-oe commented Oct 31, 2017

Ok. I can prepare the PR with fix. Could you give me any tip how to make sure that the upgrade is running on objset during the send/receive?

@behlendorf
Copy link
Contributor

I'd suggest using upgrade_userobj_001_pos which does this as an example. Although it looks like you may need to move the pool create/destroy out of setup/cleanup to make sure you're starting with a non-upgraded pool.

@ab-oe
Copy link
Contributor

ab-oe commented Nov 2, 2017

@behlendorf thank you but I still have problems with this test. I have no idea how to make upgrade task run for long enough with the testing environment restrictions. Could you give me more clues?

@behlendorf
Copy link
Contributor

@ab-oe I see. We could potentially use zinject to artificially slow down the upgrade but I don't think it worth the added complexity. I'm OK with relying on the manual testing you've done which reproduced the issue since the root cause is clearly understand and addressed.

@ab-oe
Copy link
Contributor

ab-oe commented Nov 6, 2017

@behlendorf thank you. I already tried with delay but it seems that the upgrade starts much later and it doesn't affect time of upgrade itself or I can't catch this. However the receive/rollback always ended with success instead of returning busy error. I got one more question. This mechanism handles the case when we do rollback/receive while upgrade is running. What about opposite situation when the ugprade is started while receive/rollback is in progress?

@behlendorf
Copy link
Contributor

@ab-oe good question. In the opposite case I don't believe there's an issue since an upgrade can't be started on a snapshot which is what both zfs recv and zfs rollback will be operating on. After the receive/rollback in complete they'll be promoted at which point the upgrade is possible and safe.

@ab-oe
Copy link
Contributor

ab-oe commented Nov 7, 2017

@behlendorf I thought about origin dataset here not about the snapshot. If the upgrade starts after the dsl_dataset_handoff_check() (for example after the remount) the origin object set will be evicted in dsl_dataset_clone_swap_sync_impl(). Am I right?

@behlendorf
Copy link
Contributor

Yes, but I don't believe the origin dataset can have an upgrade running at this time so it should be safe to evict.

@ab-oe
Copy link
Contributor

ab-oe commented Nov 8, 2017

@behlendorf I understand. Thank you for clearing this.

behlendorf pushed a commit that referenced this issue Nov 10, 2017
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes #5295 
Closes #6837
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 21, 2017
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295
Closes openzfs#6837
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Jan 29, 2018
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295
Closes openzfs#6837
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Feb 13, 2018
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295
Closes openzfs#6837
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Apr 28, 2019
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295 
Closes openzfs#6837
citrus-it added a commit to citrus-it/zfs that referenced this issue Dec 18, 2020
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running the
testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59
	dmu_objset_upgrade+0x73
	dmu_objset_id_quota_upgrade+0x15
	dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get this
to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos),
slows down dmu_objset_disown() enough to avoid the condition.

Signed-off-by: Andy Fiddaman <[email protected]>
citrus-it added a commit to citrus-it/zfs that referenced this issue Dec 18, 2020
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Signed-off-by: Andy Fiddaman <[email protected]>
citrus-it added a commit to citrus-it/zfs that referenced this issue Dec 18, 2020
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Signed-off-by: Andy Fiddaman <[email protected]>
behlendorf pushed a commit that referenced this issue Dec 21, 2020
After porting the fix for #5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andy Fiddaman <[email protected]>
Closes #11368
ghost pushed a commit to zfsonfreebsd/ZoF that referenced this issue Dec 23, 2020
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andy Fiddaman <[email protected]>
Closes openzfs#11368
behlendorf pushed a commit that referenced this issue Dec 23, 2020
After porting the fix for #5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andy Fiddaman <[email protected]>
Closes #11368
jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andy Fiddaman <[email protected]>
Closes openzfs#11368
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andy Fiddaman <[email protected]>
Closes openzfs#11368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants