Unable to handle kernel NULL pointer #5295

leelists · 2016-10-18T06:39:10Z

Kernel Oops with
spl 0.7.0-rc1_8_g0d26756
zfs 0.7.0-rc1_125_gb60eac3

on ubuntu 16.06 / 4.4.0-43-generic

this system is a zfs receiver and an rsync target

[ 1980.612566] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1980.612570] IP: [] __mutex_lock_slowpath+0x98/0x130
[ 1980.612574] PGD 0
[ 1980.612575] Oops: 0002 [#4] SMP
[ 1980.612577] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables 8021q garp mrp vxlan ip6_udp_tunnel udp_tunnel bridge stp llc binfmt_misc zfs(POE) zunicode(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) kvm_intel kvm ppdev irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper serio_raw cryptd parport_pc parport autofs4 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid10 raid1 e1000e psmouse ahci libahci ptp pps_core
[ 1980.612609] CPU: 1 PID: 885 Comm: z_upgrade Tainted: P D OE 4.4.0-43-generic #63-Ubuntu
[ 1980.612611] Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[ 1980.612612] task: ffff8802142bd280 ti: ffff8800d4a8c000 task.ti: ffff8800d4a8c000
[ 1980.612613] RIP: 0010:[] [] __mutex_lock_slowpath+0x98/0x130
[ 1980.612615] RSP: 0018:ffff8800d4a8fd80 EFLAGS: 00010282
[ 1980.612616] RAX: 0000000000000000 RBX: ffff8800bd635510 RCX: 0000000000000001
[ 1980.612617] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8800bd635514
[ 1980.612618] RBP: ffff8800d4a8fdd0 R08: 0000000000000002 R09: 0000000100150003
[ 1980.612619] R10: ffff88021f3a0300 R11: 0000000000000000 R12: ffff8800bd635514
[ 1980.612620] R13: ffff8802142bd280 R14: 00000000ffffffff R15: ffff8800bd635518
[ 1980.612621] FS: 0000000000000000(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000
[ 1980.612622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1980.612623] CR2: 0000000000000000 CR3: 000000021de0a000 CR4: 00000000000406e0
[ 1980.612624] Stack:
[ 1980.612625] ffff8800bd635518 0000000000000000 ffff8801c8ac3d80 000000003267c79e
[ 1980.612627] 00000000c0284e3a ffff8800bd635510 ffff8800bd635538 ffff8800bd635510
[ 1980.612629] ffff880210db2c60 ffff88021f0da100 ffff8800d4a8fde8 ffffffff8182f73f
[ 1980.612630] Call Trace:
[ 1980.612633] [] mutex_lock+0x1f/0x30
[ 1980.612668] [] dmu_objset_upgrade_task_cb+0x27/0xb0 [zfs]
[ 1980.612673] [] taskq_thread+0x255/0x440 [spl]
[ 1980.612677] [] ? wake_up_q+0x70/0x70
[ 1980.612680] [] ? taskq_cancel_id+0x130/0x130 [spl]
[ 1980.612682] [] kthread+0xd8/0xf0
[ 1980.612684] [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1980.612686] [] ret_from_fork+0x3f/0x70
[ 1980.612687] [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1980.612688] Code: e8 ae 1f 00 00 8b 03 83 f8 01 0f 84 94 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1f 49 c7 45 00 02 00 00 00 4c 89 e7
[ 1980.612706] RIP [] __mutex_lock_slowpath+0x98/0x130
[ 1980.612708] RSP
[ 1980.612708] CR2: 0000000000000000
[ 1980.612710] ---[ end trace bc05199d55331642 ]---

leelists · 2016-10-18T06:57:10Z

This seem's to be related to a [z_upgrade] thread

leelists · 2016-10-18T08:47:41Z

Same system after recompiling with --enable-debug

[ 1194.503581] VERIFY(!dmu_objset_is_dirty(os, t)) failed
[ 1194.503586] PANIC at dmu_objset.c:767:dmu_objset_evict()
[ 1194.503588] Showing stack for process 1344
[ 1194.503592] CPU: 0 PID: 1344 Comm: txg_sync Tainted: P OE 4.4.0-43-generic #63-Ubuntu
[ 1194.503594] Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[ 1194.503596] 0000000000000286 000000001d8b9c6d ffff88020d1ff7a0 ffffffff813f1f93
[ 1194.503600] ffffffffc0b05a97 00000000000002ff ffff88020d1ff7b0 ffffffffc0215e52
[ 1194.503602] ffff88020d1ff938 ffffffffc0215f1b ffff88020d1ff7d8 ffffffff00000028
[ 1194.503605] Call Trace:
[ 1194.503613] [] dump_stack+0x63/0x90
[ 1194.503625] [] spl_dumpstack+0x42/0x50 [spl]
[ 1194.503632] [] spl_panic+0xbb/0xf0 [spl]
[ 1194.503636] [] ? mutex_lock+0x12/0x30
[ 1194.503688] [] ? dnode_setdirty+0x134/0x4d0 [zfs]
[ 1194.503726] [] ? dbuf_dirty+0x6c0/0x1290 [zfs]
[ 1194.503781] [] ? zap_unlockdir+0x72/0x90 [zfs]
[ 1194.503823] [] dmu_objset_evict+0x53/0x170 [zfs]
[ 1194.503867] [] dsl_dataset_clone_swap_sync_impl+0x210/0xa30 [zfs]
[ 1194.503870] [] ? avl_find+0x61/0xf0 [zavl]
[ 1194.503923] [] ? unique_insert+0x15d/0x1c0 [zfs]
[ 1194.503976] [] ? dsl_dataset_hold_obj+0x7bd/0xb90 [zfs]
[ 1194.504021] [] dsl_dataset_rollback_sync+0x12c/0x1f0 [zfs]
[ 1194.504071] [] ? refcount_add_many+0x84/0xf0 [zfs]
[ 1194.504109] [] ? dbuf_rele+0x38/0x70 [zfs]
[ 1194.504147] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[ 1194.504191] [] ? dsl_dataset_rollback_check+0x1e4/0x290 [zfs]
[ 1194.504238] [] dsl_sync_task_sync+0x11d/0x120 [zfs]
[ 1194.504284] [] dsl_pool_sync+0x526/0x620 [zfs]
[ 1194.504336] [] spa_sync+0x426/0x1210 [zfs]
[ 1194.504390] [] txg_sync_thread+0x3ac/0x6b0 [zfs]
[ 1194.504444] [] ? txg_init+0x270/0x270 [zfs]
[ 1194.504451] [] thread_generic_wrapper+0x7a/0xc0 [spl]
[ 1194.504458] [] ? __thread_exit+0x20/0x20 [spl]
[ 1194.504462] [] kthread+0xd8/0xf0
[ 1194.504464] [] ? kthread_create_on_node+0x1e0/0x1e0
[ 1194.504468] [] ret_from_fork+0x3f/0x70
[ 1194.504470] [] ? kthread_create_on_node+0x1e0/0x1e0

leelists · 2016-10-18T08:58:08Z

aren't z_upgrade and zfs rollback compatible ?

leelists · 2016-10-18T08:59:02Z

for now i'm disabling zfs receive until upgrade is finished :-(

behlendorf · 2016-10-18T17:37:02Z

@leelists thanks for reporting this. Yes, this look like a case which was overlooked and slipped though the testing. @jxiong can you please investigate this.

jxiong · 2016-10-21T05:44:56Z

The first stack trace looks like that the objset has been freed. dmu_objset_userobjspace_upgrade() assumes that the objset has owned by caller, therefore the upgrade task can be canceled at the time of disown. Is it possible that the dataset as zfs receiver is not owned?

@leelists for the 2nd stack trace, is upgrade task still running? I will try to reproduce it on my local node.

'zfs recv' could disown a living objset without calling dmu_objset_disown(). This will cause the problem that the objset would be released while upgrading thread is still running. This patch calls dmu_objset_upgrade_stop() in dmu_recv_end(). ZoL-bug-id: openzfs#5295 Signed-off-by: Jinshan Xiong <[email protected]>

'zfs recv' could disown a living objset without calling dmu_objset_disown(). This will cause the problem that the objset would be released while upgrading thread is still running. This patch avoids the problem by checking if a dataset is snapshot before calling dmu_objset_userobjspace_upgrade(). ZoL-bug-id: openzfs#5295 Signed-off-by: Jinshan Xiong <[email protected]>

ab-oe · 2017-10-27T10:29:43Z

I recently also hit this issue on the 0.7.2 and reproduced it with following scenario:

Create two pools, each containing one dataset on old ZFS filesystem. One for source and one for destination.
In loop create snapshots every 15s and send them to the destination. (I used znapzend as it do what I want. It executes send and receive with following parameters: zfs send -I src/vol00 autosnap_2017-10-27-094600 src/vol00 autosnap_2017-10-27-094615 | zfs recv -F dst/vol00).
Write data to source. (I used 30GB of small files. This is 1/3 of whole pool size).
Update ZFS.
Reboot the system.
Upgrade ZFS and reboot the system.
After the reboot z_upgrade will be running and znapzend starts to do it's send/receive. The system may crash.
If the z_upgrade is not running anymore and system didn't crash execute zfs umount -a && zfs mount -a.
Sometimes the 8th step has to be repeated several times before the system crash.

The dmu_recv_end_sync calls the dsl_dataset_clone_swap_sync_impl which evicts origin dataset's objset that can be used by running upgrade thread. This causes following NULL pointer dereference:

[  346.265216] BUG: unable to handle kernel NULL pointer dereference at 0000000000000238
[  346.265222] IP: [<ffffffffa03959d5>] dmu_objset_sync+0x35/0x400 [zfs]
[  346.265282] PGD bb0ac067 PUD b8c71067 PMD 0 
[  346.265286] Oops: 0000 [#1] SMP 
[  346.265289] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) sha256_generic drbg target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) crc32c_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd acpi_cpufreq processor ac button vmxnet3(O) nls_iso8859_1 nls_cp437 sg pvscsi(O) pata_acpi ata_generic vfat fat aufs
[  346.265318] CPU: 1 PID: 7987 Comm: txg_sync Tainted: P           O    4.4.45-oe64-gb9dcf0a #14
[  346.265321] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[  346.265323] task: ffff880219b8b200 ti: ffff8800acaf0000 task.ti: ffff8800acaf0000
[  346.265325] RIP: 0010:[<ffffffffa03959d5>]  [<ffffffffa03959d5>] dmu_objset_sync+0x35/0x400 [zfs]
[  346.265368] RSP: 0018:ffff8800acaf3bf8  EFLAGS: 00010282
[  346.265370] RAX: ffff8800754bf780 RBX: 0000000000000000 RCX: 000000000002a4c0
[  346.265372] RDX: 000000000002a4bf RSI: 0000000000000000 RDI: 000000000000000f
[  346.265373] RBP: ffff880235169f00 R08: 00000000000186c0 R09: 0000000000000000
[  346.265375] R10: ffff880200dc4a00 R11: ffff880221bdc000 R12: ffff8800754bf780
[  346.265376] R13: ffff8801e635ae40 R14: ffff8800acaf3d10 R15: ffff8801e635ae40
[  346.265379] FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[  346.265381] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  346.265382] CR2: 0000000000000238 CR3: 00000000bbabe000 CR4: 00000000000406e0
[  346.265455] Stack:
[  346.265457]  ffff8800aca1c370 ffff8800aca1c350 ffff880235169f00 ffff8800b8fdc000
[  346.265460]  ffff8800ab519a90 ffff8800aca1c448 ffffffff
[10:33:59 AM] ArturP: 346.265463]  ffff880200dc4200 ffff8800aca1c350 ffff880235169f00 ffff8800ab519998
[  346.265466] Call Trace:
[  346.265517]  [<ffffffffa03a3ca5>] ? dnode_setdirty+0xb5/0x100 [zfs]
[  346.265556]  [<ffffffffa038964e>] ? dbuf_dirty+0x40e/0x7f0 [zfs]
[  346.265599]  [<ffffffffa03a7467>] ? dsl_dataset_sync+0x67/0x300 [zfs]
[  346.265645]  [<ffffffffa03b7a5b>] ? dsl_pool_sync+0x8b/0x3c0 [zfs]
[  346.265697]  [<ffffffffa03d3569>] ? spa_sync+0x3d9/0xd00 [zfs]
[  346.265751]  [<ffffffffa03e4e51>] ? txg_sync_thread+0x301/0x4c0 [zfs]
[  346.265804]  [<ffffffffa03e4b50>] ? txg_delay+0x150/0x150 [zfs]
[  346.265832]  [<ffffffffa023f120>] ? thread_generic_wrapper+0x70/0x80 [spl]
[  346.265839]  [<ffffffffa023f0b0>] ? __thread_exit+0x10/0x10 [spl]
[  346.265843]  [<ffffffff81077b7a>] ? kthread+0xca/0xe0
[  346.265845]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  346.265850]  [<ffffffff8170ff8f>] ? ret_from_fork+0x3f/0x70
[  346.265852]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  346.265854] Code: 55 41 54 49 89 f5 55 53 48 89 d5 48 89 fb 48 c7 c2 20 5a 4a a0 31 f6 48 81 ec 90 00 00 00 bf 80 00 00 00 e8 ce 72 ea ff 49 89 c4 <48> 8b 83 38 02 00 00 48 8b 10 49 89 14 24 48 8b 50 08 49 89 54 
[  346.265887] RIP  [<ffffffffa03959d5>] dmu_objset_sync+0x35/0x400 [zfs]
[  346.265930]  RSP <ffff8800acaf3bf8>
[  346.265931] CR2: 0000000000000238
[  346.265934] ---[ end trace b713caff89008bb1 ]---

occasionally it causes the same error as reported by @leelists. This also may happen when the rollback is performed as it calls dsl_dataset_clone_swap_sync_impl.

For testing purposes I patched the ZoL source to return EBUSY when there is an upgradeable objset on dirty dataset and it resolved the issue. The upgrade task can't be stopped here because forcing synchronization causes a deadlock in this path. For the rollback code it seems to be safe to execute dmu_objset_upgrade_stop.

diff --git a/include/sys/dmu_objset.h b/include/sys/dmu_objset.h
index 11b8fc6..e53aa00 100644
--- a/include/sys/dmu_objset.h
+++ b/include/sys/dmu_objset.h
@@ -212,6 +212,7 @@ void dmu_objset_do_userquota_updates(objset_t *os, dmu_tx_t *tx);
 void dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before, dmu_tx_t *tx);
 boolean_t dmu_objset_userused_enabled(objset_t *os);
 int dmu_objset_userspace_upgrade(objset_t *os);
+void dmu_objset_upgrade_stop(objset_t *os);
 boolean_t dmu_objset_userspace_present(objset_t *os);
 boolean_t dmu_objset_userobjused_enabled(objset_t *os);
 boolean_t dmu_objset_userobjspace_upgradable(objset_t *os);
diff --git a/module/zfs/dmu_objset.c b/module/zfs/dmu_objset.c
index 609e43f..ab837ec 100644
--- a/module/zfs/dmu_objset.c
+++ b/module/zfs/dmu_objset.c
@@ -82,7 +82,6 @@ int dmu_rescan_dnode_threshold = 1 << DN_MAX_INDBLKSHIFT;
 static void dmu_objset_find_dp_cb(void *arg);
 
 static void dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb);
-static void dmu_objset_upgrade_stop(objset_t *os);
 
 void
 dmu_objset_init(void)
@@ -1334,7 +1333,7 @@ dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
 	mutex_exit(&os->os_upgrade_lock);
 }
 
-static void
+void
 dmu_objset_upgrade_stop(objset_t *os)
 {
 	mutex_enter(&os->os_upgrade_lock);
diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index 1984e71..dc2a49f 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -3831,10 +3831,18 @@ dmu_recv_end_check(void *arg, dmu_tx_t *tx)
 
 	if (!drc->drc_newfs) {
 		dsl_dataset_t *origin_head;
+		objset_t* os = NULL;
 
 		error = dsl_dataset_hold(dp, drc->drc_tofs, FTAG, &origin_head);
 		if (error != 0)
 			return (error);
+
+		os = origin_head->ds_objset;
+		if (os && dmu_objset_userobjspace_upgradable(os)
+		    && dsl_dataset_is_dirty(origin_head)) {
+			return (SET_ERROR(EBUSY));
+		}
+
 		if (drc->drc_force) {
 			/*
 			 * We will destroy any snapshots in tofs (i.e. before
diff --git a/module/zfs/zfs_ioctl.c b/module/zfs/zfs_ioctl.c
index 5f333a4..8c0b923 100644
--- a/module/zfs/zfs_ioctl.c
+++ b/module/zfs/zfs_ioctl.c
@@ -3751,6 +3751,7 @@ zfs_ioc_rollback(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
 		dsl_dataset_t *ds;
 
 		ds = dmu_objset_ds(zfsvfs->z_os);
+		dmu_objset_upgrade_stop(zfsvfs->z_os);
 		error = zfs_suspend_fs(zfsvfs);
 		if (error == 0) {
 			int resume_err;

behlendorf · 2017-10-27T17:55:30Z

@ab-oe thanks for investigating this. Since the upgrade only runs when a filesystem is mounted and thus has a long hold I would have thought this case would already be covered by the dmu_recv_end_check() -> dsl_dataset_clone_swap_check_impl() -> dsl_dataset_handoff_check() check. Were you able to determine why this isn't the case?

ab-oe · 2017-10-30T08:49:02Z

@behlendorf I checked the dsl_dataset_handoff_check() the dsl_dataset_long_held() always returns B_FALSE. I did some debug and saw that it goes through following path:

mount: zfsvfs_create() -> dmu_objset_own() - takes a long hold of a dataset with a tag ffff8801322aa000 (address of zfsvfs) and starts upgrade task.
receive: dsl_dataset_handoff_check() performs a dsl_dataset_long_rele() on this dataset with a tag ffff8801322aa000 so the reference counter for this tag is decreased and now is 0. The dsl_dataset_long_held(ds) returns B_FALSE . The dsl_dataset_long_hold() is called again for this dataset and tag.

There are no other long holds/releases for tested dataset.

The zfs_ioc_userobjspace_upgrade takes additional long hold for whole upgrade task but other places where the dmu_objset_userobjspace_upgrade() is called don't.

ab-oe · 2017-10-30T12:02:32Z

@behlendorf I secured the upgrade code with long hold instead of previous work-around and it also works well. Tests are running for 3h now without any crash. The previous work-around also led to some race conditions that ends with:

[  387.915300] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  387.915306] IP: [<ffffffff8170e10f>] __mutex_lock_slowpath+0x6f/0x100
[  387.915315] PGD 1f7686067 PUD 1f7a64067 PMD 0
[  387.915319] Oops: 0002 [#1] SMP
[  387.915460] Call Trace:
[  387.915542]  [<ffffffffa03bc8aa>] ? txg_wait_synced+0x10a/0x120 [zfs]
[  387.915547]  [<ffffffff8170e1ae>] ? mutex_lock+0xe/0x20
[  387.915590]  [<ffffffffa036b651>] ? dmu_objset_upgrade_task_cb+0x81/0xc0 [zfs]
[  387.915598]  [<ffffffffa015919e>] ? taskq_thread+0x24e/0x450 [spl]
[  387.915601]  [<ffffffff8107e880>] ? wake_up_q+0x60/0x60
[  387.915608]  [<ffffffffa0158f50>] ? taskq_cancel_id+0x100/0x100 [spl]
[  387.915611]  [<ffffffff81077b7a>] ? kthread+0xca/0xe0
[  387.915613]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  387.915616]  [<ffffffff8170ff8f>] ? ret_from_fork+0x3f/0x70
[  387.915619]  [<ffffffff81077ab0>] ? kthread_create_on_node+0x170/0x170
[  387.915620] Code: e8 37 17 00 00 8b 03 83 f8 01 0f 84 92 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c
89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04
[  387.915647] RIP  [<ffffffff8170e10f>] __mutex_lock_slowpath+0x6f/0x100
[  387.915651]  RSP <ffff88021cdd7db0>
[  387.915653] CR2: 0000000000000000
[  387.915656] ---[ end trace 65a12a065d544c34 ]---

diff --git a/module/zfs/dmu_objset.c b/module/zfs/dmu_objset.c
index 609e43f..3dc320f 100644
--- a/module/zfs/dmu_objset.c
+++ b/module/zfs/dmu_objset.c
@@ -64,6 +64,8 @@
  */
 krwlock_t os_lock;
 
+static char *upgrade_tag = "z_upgrade";
+
 /*
  * Tunable to overwrite the maximum number of threads for the parallelization
  * of dmu_objset_find_dp, needed to speed up the import of pools with many
@@ -1313,6 +1315,7 @@ dmu_objset_upgrade_task_cb(void *data)
        os->os_upgrade_exit = B_TRUE;
        os->os_upgrade_id = 0;
        mutex_exit(&os->os_upgrade_lock);
+       dsl_dataset_long_rele(dmu_objset_ds(os), upgrade_tag);
 }
 
 static void
@@ -1321,6 +1324,7 @@ dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
        if (os->os_upgrade_id != 0)
                return;
 
+       dsl_dataset_long_hold(dmu_objset_ds(os), upgrade_tag);
        mutex_enter(&os->os_upgrade_lock);
        if (os->os_upgrade_id == 0 && os->os_upgrade_status == 0) {
                os->os_upgrade_exit = B_FALSE;
@@ -1328,8 +1332,10 @@ dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
                os->os_upgrade_id = taskq_dispatch(
                    os->os_spa->spa_upgrade_taskq,
                    dmu_objset_upgrade_task_cb, os, TQ_SLEEP);
-               if (os->os_upgrade_id == TASKQID_INVALID)
+               if (os->os_upgrade_id == TASKQID_INVALID) {
+                       dsl_dataset_long_rele(dmu_objset_ds(os), upgrade_tag);
                        os->os_upgrade_status = ENOMEM;
+               }
        }
        mutex_exit(&os->os_upgrade_lock);
 }

behlendorf · 2017-10-30T19:07:43Z

@ab-oe OK, that makes sense. Once you're happy with your local tests it would be great if you could open a PR for review with the long hold version of the fix. There existing a similar test case in tests/functional/upgrade/ where you can add the test case you posted above.

ab-oe · 2017-10-31T09:35:03Z

Ok. I can prepare the PR with fix. Could you give me any tip how to make sure that the upgrade is running on objset during the send/receive?

behlendorf · 2017-10-31T16:35:22Z

I'd suggest using upgrade_userobj_001_pos which does this as an example. Although it looks like you may need to move the pool create/destroy out of setup/cleanup to make sure you're starting with a non-upgraded pool.

ab-oe · 2017-11-02T14:54:57Z

@behlendorf thank you but I still have problems with this test. I have no idea how to make upgrade task run for long enough with the testing environment restrictions. Could you give me more clues?

behlendorf · 2017-11-03T17:51:56Z

@ab-oe I see. We could potentially use zinject to artificially slow down the upgrade but I don't think it worth the added complexity. I'm OK with relying on the manual testing you've done which reproduced the issue since the root cause is clearly understand and addressed.

ab-oe · 2017-11-06T07:35:04Z

@behlendorf thank you. I already tried with delay but it seems that the upgrade starts much later and it doesn't affect time of upgrade itself or I can't catch this. However the receive/rollback always ended with success instead of returning busy error. I got one more question. This mechanism handles the case when we do rollback/receive while upgrade is running. What about opposite situation when the ugprade is started while receive/rollback is in progress?

behlendorf · 2017-11-07T00:00:49Z

@ab-oe good question. In the opposite case I don't believe there's an issue since an upgrade can't be started on a snapshot which is what both zfs recv and zfs rollback will be operating on. After the receive/rollback in complete they'll be promoted at which point the upgrade is possible and safe.

ab-oe · 2017-11-07T08:42:26Z

@behlendorf I thought about origin dataset here not about the snapshot. If the upgrade starts after the dsl_dataset_handoff_check() (for example after the remount) the origin object set will be evicted in dsl_dataset_clone_swap_sync_impl(). Am I right?

behlendorf · 2017-11-07T22:16:12Z

Yes, but I don't believe the origin dataset can have an upgrade running at this time so it should be safe to evict.

ab-oe · 2017-11-08T07:22:14Z

@behlendorf I understand. Thank you for clearing this.

If the receive or rollback is performed while filesystem is upgrading the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This will lead to NULL pointer dereference when upgrade tries to access evicted objset. This commit adds long hold of dataset during whole upgrade process. The receive and rollback will return an EBUSY error until the upgrade is not finished. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arkadiusz Bubała <[email protected]> Closes #5295 Closes #6837

If the receive or rollback is performed while filesystem is upgrading the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This will lead to NULL pointer dereference when upgrade tries to access evicted objset. This commit adds long hold of dataset during whole upgrade process. The receive and rollback will return an EBUSY error until the upgrade is not finished. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arkadiusz Bubała <[email protected]> Closes openzfs#5295 Closes openzfs#6837

After porting the fix for openzfs#5295 over to illumos, we started hitting an assertion failure when running the testsuite: assertion failed: rc->rc_count == number, file: .../refcount.c and the unexpected hold has this stack: dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73 dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f The simplest reproducer for this in illumos is zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool which is run as part of the zpool_create_tempname test, but I can't get this to trigger on FreeBSD. This appears to be because of the call to txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos), slows down dmu_objset_disown() enough to avoid the condition. Signed-off-by: Andy Fiddaman <[email protected]>

After porting the fix for #5295 over to illumos, we started hitting an assertion failure when running the testsuite: assertion failed: rc->rc_count == number, file: .../refcount.c and the unexpected hold has this stack: dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73 dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f The simplest reproducer for this in illumos is zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool which is run as part of the zpool_create_tempname test, but I can't get this to trigger on FreeBSD. This appears to be because of the call to txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos), slows down dmu_objset_disown() enough to avoid the condition. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andy Fiddaman <[email protected]> Closes #11368

After porting the fix for openzfs#5295 over to illumos, we started hitting an assertion failure when running the testsuite: assertion failed: rc->rc_count == number, file: .../refcount.c and the unexpected hold has this stack: dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73 dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f The simplest reproducer for this in illumos is zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool which is run as part of the zpool_create_tempname test, but I can't get this to trigger on FreeBSD. This appears to be because of the call to txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos), slows down dmu_objset_disown() enough to avoid the condition. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andy Fiddaman <[email protected]> Closes openzfs#11368

After porting the fix for #5295 over to illumos, we started hitting an assertion failure when running the testsuite: assertion failed: rc->rc_count == number, file: .../refcount.c and the unexpected hold has this stack: dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73 dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f The simplest reproducer for this in illumos is zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool which is run as part of the zpool_create_tempname test, but I can't get this to trigger on FreeBSD. This appears to be because of the call to txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos), slows down dmu_objset_disown() enough to avoid the condition. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andy Fiddaman <[email protected]> Closes #11368

After porting the fix for openzfs#5295 over to illumos, we started hitting an assertion failure when running the testsuite: assertion failed: rc->rc_count == number, file: .../refcount.c and the unexpected hold has this stack: dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73 dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f The simplest reproducer for this in illumos is zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool which is run as part of the zpool_create_tempname test, but I can't get this to trigger on FreeBSD. This appears to be because of the call to txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos), slows down dmu_objset_disown() enough to avoid the condition. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andy Fiddaman <[email protected]> Closes openzfs#11368

behlendorf added this to the 0.7.0 milestone Oct 18, 2016

jxiong mentioned this issue Oct 24, 2016

Stop upgrade thread at the end of 'zfs recv' #5328

Merged

behlendorf closed this as completed in 16fa68f Oct 25, 2016

ab-oe mentioned this issue Nov 7, 2017

Long hold the dataset during upgrade #6837

Merged

13 tasks

citrus-it mentioned this issue Dec 18, 2020

Dangling reference from dmu_objset_upgrade #11368

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to handle kernel NULL pointer #5295

Unable to handle kernel NULL pointer #5295

leelists commented Oct 18, 2016 •

edited

Loading

leelists commented Oct 18, 2016

leelists commented Oct 18, 2016

leelists commented Oct 18, 2016

leelists commented Oct 18, 2016 •

edited

Loading

behlendorf commented Oct 18, 2016

jxiong commented Oct 21, 2016

ab-oe commented Oct 27, 2017 •

edited by behlendorf

Loading

behlendorf commented Oct 27, 2017

ab-oe commented Oct 30, 2017

ab-oe commented Oct 30, 2017 •

edited

Loading

behlendorf commented Oct 30, 2017

ab-oe commented Oct 31, 2017

behlendorf commented Oct 31, 2017

ab-oe commented Nov 2, 2017

behlendorf commented Nov 3, 2017

ab-oe commented Nov 6, 2017

behlendorf commented Nov 7, 2017

ab-oe commented Nov 7, 2017

behlendorf commented Nov 7, 2017

ab-oe commented Nov 8, 2017

Unable to handle kernel NULL pointer #5295

Unable to handle kernel NULL pointer #5295

Comments

leelists commented Oct 18, 2016 • edited Loading

leelists commented Oct 18, 2016

leelists commented Oct 18, 2016

leelists commented Oct 18, 2016

leelists commented Oct 18, 2016 • edited Loading

behlendorf commented Oct 18, 2016

jxiong commented Oct 21, 2016

ab-oe commented Oct 27, 2017 • edited by behlendorf Loading

behlendorf commented Oct 27, 2017

ab-oe commented Oct 30, 2017

ab-oe commented Oct 30, 2017 • edited Loading

behlendorf commented Oct 30, 2017

ab-oe commented Oct 31, 2017

behlendorf commented Oct 31, 2017

ab-oe commented Nov 2, 2017

behlendorf commented Nov 3, 2017

ab-oe commented Nov 6, 2017

behlendorf commented Nov 7, 2017

ab-oe commented Nov 7, 2017

behlendorf commented Nov 7, 2017

ab-oe commented Nov 8, 2017

leelists commented Oct 18, 2016 •

edited

Loading

leelists commented Oct 18, 2016 •

edited

Loading

ab-oe commented Oct 27, 2017 •

edited by behlendorf

Loading

ab-oe commented Oct 30, 2017 •

edited

Loading