draid2 hang on rebuild #8

prod-feng · 2017-08-29T19:36:05Z

System information

Type	Version/Name
Distribution Name	CentOS
Distribution Version	7.3.1611
Linux Kernel	3.10.0-514.26.2.el7
Architecture	x86_64
ZFS Version	0.7.0-rc3_218_g6e39308
SPL Version	0.7.0-12_g9df9692

Describe the problem you're observing

Using 10 loop devices:

dd if=/dev/zero of=hd201.img bs=1024K count=1024
...

losetup  -o 1048576  /dev/loop201 /data/users/hd201.img

nvl file read as:

draidcfg -r 10.nvl 
dRAID2 vdev of 10 child drives: 1 x (7 data + 2 parity) and 1 distributed spare
Using 32 base permutations
   5, 9, 6, 1, 3, 2, 8, 7, 4, 0,
   7, 3, 2, 4, 9, 1, 8, 6, 5, 0,
   3, 8, 5, 9, 1, 6, 2, 0, 4, 7,
   4, 7, 3, 9, 2, 8, 1, 5, 6, 0,
   7, 6, 0, 1, 3, 4, 8, 2, 9, 5,
   1, 5, 6, 0, 8, 3, 4, 9, 7, 2,
   7, 8, 1, 6, 2, 0, 3, 5, 4, 9,
   7, 4, 1, 6, 2, 0, 8, 5, 9, 3,
   6, 8, 5, 1, 3, 7, 0, 2, 9, 4,
   1, 7, 8, 4, 9, 3, 0, 6, 2, 5,
   4, 8, 5, 1, 7, 6, 0, 3, 2, 9,
   8, 2, 3, 6, 4, 9, 5, 1, 7, 0,
   3, 8, 2, 6, 9, 0, 5, 7, 4, 1,
   0, 5, 8, 6, 3, 9, 7, 4, 1, 2,
   1, 9, 4, 6, 7, 3, 0, 8, 5, 2,
   1, 0, 2, 3, 8, 9, 5, 6, 7, 4,
   3, 5, 9, 1, 0, 7, 6, 4, 2, 8,
   2, 0, 9, 6, 1, 3, 7, 4, 5, 8,
   7, 6, 9, 3, 1, 0, 5, 2, 8, 4,
   6, 9, 4, 7, 3, 0, 2, 5, 1, 8,
   0, 2, 3, 7, 4, 9, 8, 5, 1, 6,
   3, 7, 1, 9, 0, 2, 4, 6, 8, 5,
   6, 5, 8, 9, 2, 0, 3, 4, 1, 7,
   0, 1, 7, 2, 9, 4, 3, 5, 6, 8,
   4, 0, 9, 2, 8, 5, 1, 6, 7, 3,
   3, 5, 6, 4, 2, 0, 8, 1, 9, 7,
   8, 5, 7, 0, 4, 1, 3, 9, 2, 6,
   5, 3, 6, 9, 7, 2, 4, 0, 8, 1,
   8, 5, 7, 0, 1, 3, 4, 9, 2, 6,
   5, 2, 9, 3, 4, 6, 8, 0, 7, 1,
   6, 9, 8, 1, 7, 0, 5, 4, 2, 3,
   1, 5, 4, 0, 2, 8, 7, 6, 3, 9,

Describe how to reproduce the problem

Running commands as below:

zpool create -f s10d draid2 cfg=10.nvl /dev/loop20{0,1,2,3,4,5,6,7,8,9}
zfs create s10d/test
rsync -av /var/log /s10d/test/
zpool offline s10d loop203
zpool replace s10d loop203 '$draid2-0-s0' -o ashift=9

then the error occurs. AS below.

While I also tried draid1 with the same loop devices, with the nvl file reads as:

draidcfg -r 10p1.nvl 
dRAID1 vdev of 10 child drives: 1 x (8 data + 1 parity) and 1 distributed spare
Using 32 base permutations
   5, 1, 3, 4, 9, 6, 8, 0, 2, 7,
   5, 4, 8, 1, 7, 2, 0, 3, 6, 9,
   6, 4, 0, 5, 8, 3, 7, 1, 2, 9,
   5, 3, 1, 7, 8, 9, 2, 0, 4, 6,
   8, 9, 4, 6, 3, 7, 5, 1, 0, 2,
   5, 3, 7, 2, 1, 9, 0, 8, 6, 4,
   5, 9, 6, 7, 0, 1, 2, 4, 3, 8,
   9, 8, 1, 6, 5, 7, 4, 2, 3, 0,
   9, 6, 3, 8, 5, 1, 4, 2, 7, 0,
   9, 1, 5, 8, 7, 0, 4, 2, 6, 3,
   0, 1, 4, 9, 8, 7, 5, 3, 2, 6,
   9, 1, 3, 5, 7, 4, 8, 6, 0, 2,
   8, 4, 6, 1, 7, 9, 0, 5, 2, 3,
   3, 4, 7, 1, 0, 6, 9, 8, 2, 5,
   1, 7, 8, 0, 6, 9, 5, 2, 4, 3,
   7, 5, 1, 9, 6, 2, 4, 0, 3, 8,
   6, 1, 0, 8, 7, 3, 9, 2, 4, 5,
   2, 5, 4, 9, 0, 6, 7, 3, 8, 1,
   8, 0, 6, 3, 7, 2, 5, 1, 4, 9,
   4, 5, 2, 6, 7, 3, 1, 9, 0, 8,
   5, 9, 1, 7, 6, 2, 8, 3, 0, 4,
   5, 6, 2, 9, 0, 3, 7, 1, 8, 4,
   5, 0, 6, 9, 1, 4, 2, 3, 8, 7,
   2, 8, 1, 0, 6, 3, 9, 4, 7, 5,
   9, 5, 8, 0, 4, 6, 3, 1, 2, 7,
   6, 3, 0, 1, 4, 5, 8, 9, 7, 2,
   8, 0, 5, 9, 4, 7, 3, 6, 2, 1,
   8, 6, 9, 4, 5, 1, 3, 7, 2, 0,
   4, 7, 0, 8, 6, 5, 3, 9, 2, 1,
   2, 6, 0, 4, 5, 3, 7, 9, 1, 8,
   2, 8, 7, 4, 1, 6, 0, 9, 3, 5,
   4, 0, 9, 2, 8, 5, 3, 7, 1, 6,

It works fine, without any issue.

Include any warning/errors/backtraces from the system logs

System panic message with popup message on terminal:

Message from syslogd@master at Aug 29 15:13:15 ...
 kernel:VERIFY3(size <= (1ULL << 24)) failed (16780288 <= 16777216)

Message from syslogd@master at Aug 29 15:13:15 ...
 kernel:PANIC at abd.c:591:abd_alloc()

dmesg shows:

[ 2017.861020] MS (0 at 0K) segment: 0K + 9K
[ 2017.861789] 	Fixing: 0K + 9K (dRAID)
[ 2017.861798] MS (0 at 0K) segment: 13K + 18K
[ 2017.862090] 	Fixing: 13K + 18K (dRAID)
[ 2017.862099] MS (0 at 0K) segment: 36K + 175K
[ 2017.862968] 	Fixing: 36K + 175K (dRAID)
[ 2017.862979] MS (0 at 0K) segment: 243K + 4K
[ 2017.863467] 	Fixing: 243K + 4K (dRAID)
[ 2017.863475] MS (0 at 0K) segment: 279K + 9K
[ 2017.863715] 	Fixing: 279K + 9K (dRAID)
[ 2017.863722] MS (0 at 0K) segment: 292K + 31K
[ 2017.864117] 	Fixing: 292K + 31K (dRAID)
[ 2017.864252] MS (0 at 0K) segment: 337K + 43312K
[ 2017.864302] VERIFY3(size <= (1ULL << 24)) failed (16780288 <= 16777216)
[ 2017.864314] PANIC at abd.c:591:abd_alloc()
[ 2017.864318] Showing stack for process 1435
[ 2017.864327] CPU: 2 PID: 1435 Comm: spa_scan Tainted: P           OE  ------------   3.10.0-514.26.2.el7.x86_64 #1
[ 2017.864332] Hardware name: HP HP Notebook/81F5, BIOS F.13 07/21/2016
[ 2017.864337]  ffffffffa1b0af30 0000000069c8e90e ffff880067a87ab8 ffffffff81687133
[ 2017.864348]  ffff880067a87ac8 ffffffffa0b3d284 ffff880067a87c50 ffffffffa0b3d359
[ 2017.864355]  0000000000000246 ffff880100000030 ffff880067a87c60 ffff880067a87c00
[ 2017.864363] Call Trace:
[ 2017.864380]  [<ffffffff81687133>] dump_stack+0x19/0x1b
[ 2017.864405]  [<ffffffffa0b3d284>] spl_dumpstack+0x44/0x50 [spl]
[ 2017.864425]  [<ffffffffa0b3d359>] spl_panic+0xc9/0x110 [spl]
[ 2017.864437]  [<ffffffff811dcab1>] ? __slab_free+0x81/0x2f0
[ 2017.864454]  [<ffffffffa0b3951d>] ? spl_kmem_cache_free+0x14d/0x1d0 [spl]
[ 2017.864463]  [<ffffffff811dcedb>] ? kmem_cache_free+0x1bb/0x1f0
[ 2017.864479]  [<ffffffffa0b3951d>] ? spl_kmem_cache_free+0x14d/0x1d0 [spl]
[ 2017.864573]  [<ffffffffa1961e23>] abd_alloc+0x4d3/0x500 [zfs]
[ 2017.864583]  [<ffffffff811ddabc>] ? __kmalloc_node+0x5c/0x2b0
[ 2017.864601]  [<ffffffffa0b38037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2017.864611]  [<ffffffff8168a982>] ? mutex_lock+0x12/0x2f
[ 2017.864711]  [<ffffffffa19d199c>] ? zfs_scan_delay+0xbc/0x160 [zfs]
[ 2017.864821]  [<ffffffffa1a00898>] spa_scan_rebuild+0x3c8/0x8c0 [zfs]
[ 2017.864932]  [<ffffffffa1a01375>] spa_scan_thread+0x5e5/0xac0 [zfs]
[ 2017.865043]  [<ffffffffa1a00d90>] ? spa_scan_rebuild+0x8c0/0x8c0 [zfs]
[ 2017.865060]  [<ffffffffa0b39f91>] thread_generic_wrapper+0x71/0x80 [spl]
[ 2017.865079]  [<ffffffffa0b39f20>] ? __thread_exit+0x20/0x20 [spl]
[ 2017.865090]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[ 2017.865098]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 2017.865106]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[ 2017.865115]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 2160.700164] INFO: task spa_scan:1435 blocked for more than 120 seconds.
[ 2160.700175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2160.700181] spa_scan        D ffff8800a58ac000     0  1435      2 0x00000080
[ 2160.700192]  ffff880067a87ab8 0000000000000046 ffff880114e24e70 ffff880067a87fd8
[ 2160.700199]  ffff880067a87fd8 ffff880067a87fd8 ffff880114e24e70 ffffffffa1b0af30
[ 2160.700206]  000000000000024f ffffffffa1af5401 ffffffffa1b0b398 ffff8800a58ac000
[ 2160.700214] Call Trace:
[ 2160.700236]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[ 2160.700263]  [<ffffffffa0b3d385>] spl_panic+0xf5/0x110 [spl]
[ 2160.700276]  [<ffffffff811dcab1>] ? __slab_free+0x81/0x2f0
[ 2160.700293]  [<ffffffffa0b3951d>] ? spl_kmem_cache_free+0x14d/0x1d0 [spl]
[ 2160.700302]  [<ffffffff811dcedb>] ? kmem_cache_free+0x1bb/0x1f0
[ 2160.700318]  [<ffffffffa0b3951d>] ? spl_kmem_cache_free+0x14d/0x1d0 [spl]
[ 2160.700413]  [<ffffffffa1961e23>] abd_alloc+0x4d3/0x500 [zfs]
[ 2160.700422]  [<ffffffff811ddabc>] ? __kmalloc_node+0x5c/0x2b0
[ 2160.700438]  [<ffffffffa0b38037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2160.700447]  [<ffffffff8168a982>] ? mutex_lock+0x12/0x2f
[ 2160.700546]  [<ffffffffa19d199c>] ? zfs_scan_delay+0xbc/0x160 [zfs]
[ 2160.700657]  [<ffffffffa1a00898>] spa_scan_rebuild+0x3c8/0x8c0 [zfs]
[ 2160.700764]  [<ffffffffa1a01375>] spa_scan_thread+0x5e5/0xac0 [zfs]
[ 2160.700872]  [<ffffffffa1a00d90>] ? spa_scan_rebuild+0x8c0/0x8c0 [zfs]
[ 2160.700889]  [<ffffffffa0b39f91>] thread_generic_wrapper+0x71/0x80 [spl]
[ 2160.700904]  [<ffffffffa0b39f20>] ? __thread_exit+0x20/0x20 [spl]
[ 2160.700914]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[ 2160.700923]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 2160.700931]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[ 2160.700939]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 2280.704932] INFO: task spa_scan:1435 blocked for more than 120 seconds.
[ 2280.704944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2280.704949] spa_scan        D ffff8800a58ac000     0  1435      2 0x00000080
[ 2280.704960]  ffff880067a87ab8 0000000000000046 ffff880114e24e70 ffff880067a87fd8
[ 2280.704969]  ffff880067a87fd8 ffff880067a87fd8 ffff880114e24e70 ffffffffa1b0af30
[ 2280.704976]  000000000000024f ffffffffa1af5401 ffffffffa1b0b398 ffff8800a58ac000
[ 2280.704984] Call Trace:
[ 2280.705005]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[ 2280.705031]  [<ffffffffa0b3d385>] spl_panic+0xf5/0x110 [spl]
[ 2280.705044]  [<ffffffff811dcab1>] ? __slab_free+0x81/0x2f0
[ 2280.705062]  [<ffffffffa0b3951d>] ? spl_kmem_cache_free+0x14d/0x1d0 [spl]
[ 2280.705071]  [<ffffffff811dcedb>] ? kmem_cache_free+0x1bb/0x1f0
[ 2280.705088]  [<ffffffffa0b3951d>] ? spl_kmem_cache_free+0x14d/0x1d0 [spl]
[ 2280.705180]  [<ffffffffa1961e23>] abd_alloc+0x4d3/0x500 [zfs]
[ 2280.705189]  [<ffffffff811ddabc>] ? __kmalloc_node+0x5c/0x2b0
[ 2280.705205]  [<ffffffffa0b38037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2280.705213]  [<ffffffff8168a982>] ? mutex_lock+0x12/0x2f
[ 2280.705313]  [<ffffffffa19d199c>] ? zfs_scan_delay+0xbc/0x160 [zfs]
[ 2280.705424]  [<ffffffffa1a00898>] spa_scan_rebuild+0x3c8/0x8c0 [zfs]
[ 2280.705532]  [<ffffffffa1a01375>] spa_scan_thread+0x5e5/0xac0 [zfs]
[ 2280.705640]  [<ffffffffa1a00d90>] ? spa_scan_rebuild+0x8c0/0x8c0 [zfs]
[ 2280.705657]  [<ffffffffa0b39f91>] thread_generic_wrapper+0x71/0x80 [spl]
[ 2280.705672]  [<ffffffffa0b39f20>] ? __thread_exit+0x20/0x20 [spl]
[ 2280.705683]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[ 2280.705691]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 2280.705700]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[ 2280.705708]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

zpool status shows:

# zpool status
  pool: s10d
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: rebuild in progress since Tue Aug 29 15:13:14 2017
	248K scanned out of 324M at 709/s, 133h3m to go
	27.5K rebuilt, 0.07% done
config:

	NAME                STATE     READ WRITE CKSUM
	s10d                DEGRADED     0     0     0
	  draid2-0          DEGRADED     0     0     0
	    loop200         ONLINE       0     0     0  (repairing)
	    loop201         ONLINE       0     0     0
	    loop202         ONLINE       0     0     0
	    spare-3         DEGRADED     0     0     0
	      loop203       OFFLINE      0     0     0
	      $draid2-0-s0  ONLINE       0     0     0  (repairing)
	    loop204         ONLINE       0     0     0
	    loop205         ONLINE       0     0     0
	    loop206         ONLINE       0     0     0
	    loop207         ONLINE       0     0     0
	    loop208         ONLINE       0     0     0
	    loop209         ONLINE       0     0     0
	spares
	  $draid2-0-s0      INUSE     currently in use

errors: No known data errors

At the moment, the file system still mounted and I can see the files there. But can not do anything using zpool or zfs(all hang).

While after I rebooted the computer, and then could import the pool successfully:

zpool import s10d

It can find it and mount it, and the continue to rebuild it without issue. It seems the spa_scan process hang the pool, but reboot can fix that.

The text was updated successfully, but these errors were encountered:

thegreatgazoo · 2017-08-30T15:57:23Z

This is a known bug we fixed already in our internal repo, and you've already found the workaround by using power-of-2 data drives in a redundancy group. But I can't update this PR with the latest code until openzfs#5182 has been merged.

prod-feng · 2017-08-31T14:19:50Z

Thanks, thegreatgazoo!

Some questions about draidcfg.

It looks like it only supports a few PDDL layouts, while the others are using random permutations.

When do random permutations, is it OK to let user to set the random seed? So it is easier to repeat the process(without share nvl file)?
Is it possible to let users to set the layout manually? For example, write the layout in a text file, and then using a tool(like draidcfg) to convert it to nvl, etc.?
Is there any way to assure the random permutation to be random enough? How can I monitor the usage of each underline disk of a pool? If one find out at some point that the data are distributed NOT so evenly, is there any way to re-distribute the data?

thegreatgazoo · 2018-01-29T04:10:03Z

I believe this has been fixed with the latest code openzfs#7078

thegreatgazoo closed this as completed Jan 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draid2 hang on rebuild #8

draid2 hang on rebuild #8

prod-feng commented Aug 29, 2017

thegreatgazoo commented Aug 30, 2017

prod-feng commented Aug 31, 2017 •

edited

Loading

thegreatgazoo commented Jan 29, 2018

draid2 hang on rebuild #8

draid2 hang on rebuild #8

Comments

prod-feng commented Aug 29, 2017

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

thegreatgazoo commented Aug 30, 2017

prod-feng commented Aug 31, 2017 • edited Loading

thegreatgazoo commented Jan 29, 2018

prod-feng commented Aug 31, 2017 •

edited

Loading