Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC at arc.c:2493:arc_evict_state_impl() VERIFY failed #4912

Closed
JuliaVixen opened this issue Aug 1, 2016 · 3 comments
Closed

PANIC at arc.c:2493:arc_evict_state_impl() VERIFY failed #4912

JuliaVixen opened this issue Aug 1, 2016 · 3 comments
Milestone

Comments

@JuliaVixen
Copy link

JuliaVixen commented Aug 1, 2016

Well, I did this:

zpool create -O atime=off -O compression=lz4 -O exec=off -O devices=off \
 -O recordsize=1M -O setuid=off -O checksum=sha256 \
 -o ashift=12 -o feature@lz4_compress=enabled -o feature@embedded_data=enabled \
swappy /dev/disk/by-id/ata-ST4000DM000-1F2168_W300KAQ5

And then I did this:

localhost ~ # zfs snapshot l/Lightroom@2016_Jul_30
localhost ~ # zfs snapshot l/Jul_29_2016@2016_Jul_30
localhost ~ # zfs send -eLv -R l/Lightroom@2016_Jul_30  | zfs recv -evsF swappy
full send of l/Lightroom@2016_Jul_30 estimated size is 357G
total estimated size is 357G
TIME        SENT   SNAPSHOT
receiving full stream of l/Lightroom@2016_Jul_30 into swappy/Lightroom@2016_Jul_30
09:22:10    771K   l/Lightroom@2016_Jul_30
[...]
09:23:05   1.89G   l/Lightroom@2016_Jul_30
09:23:06   1.93G   l/Lightroom@2016_Jul_30

And then while that zfs send|recv was going, I did this in another terminal:

localhost ~ # zfs create swappy/recovered
localhost ~ # zfs set copies=2 swappy/recovered
localhost ~ # cp -avi  "/mnt/temp/STUFF.ISO" /swappy/re <tab>

... And the tab completion never completed, and everything doing any I/O on either of my two pools was blocked.

[261664.320743] VERIFY(!(((*({ __attribute__((unused)) typeof((hash_lock)->m_owner) __var = ( typeof((hash_lock)->m_owner)) 0; (volatile typeof((hash_lock)->m_owner) *)&((hash_lock)->m_owner); }))) == get_current())) failed
[261664.320906] PANIC at arc.c:2493:arc_evict_state_impl()
[261664.321001] Showing stack for process 7428
[261664.321003] CPU: 2 PID: 7428 Comm: send_traverse Tainted: P           O    4.4.6-gentoo-debug #1
[261664.321004] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[261664.321005]  0000000000000000 ffff8804a2bceec8 ffffffff8130fa9d ffffffffa1325817
[261664.321007]  ffffffffa13257ff ffff8804a2bceed8 ffffffffa0a7c2d6 ffff8804a2bcf068
[261664.321009]  ffffffffa0a7c48e ffff8804a2bceef8 ffffffff00000028 ffff8804a2bcf078
[261664.321010] Call Trace:
[261664.321015]  [<ffffffff8130fa9d>] dump_stack+0x4d/0x63
[261664.321021]  [<ffffffffa0a7c2d6>] spl_dumpstack+0x3d/0x3f [spl]
[261664.321023]  [<ffffffffa0a7c48e>] spl_panic+0xa9/0xdc [spl]
[261664.321039]  [<ffffffffa1225420>] arc_evict_state+0x2df/0x9c4 [zfs]
[261664.321048]  [<ffffffffa1225b89>] arc_adjust_impl.constprop.25+0x2a/0x31 [zfs]
[261664.321056]  [<ffffffffa1225ed3>] arc_adjust+0x343/0x4b8 [zfs]
[261664.321064]  [<ffffffffa1227f96>] arc_shrink+0x10d/0x10f [zfs]
[261664.321072]  [<ffffffffa1228042>] __arc_shrinker_func.isra.24+0xaa/0x118 [zfs]
[261664.321079]  [<ffffffffa12280bc>] arc_shrinker_func_scan_objects+0xc/0x1d [zfs]
[261664.321082]  [<ffffffff810c5962>] shrink_slab.part.56.constprop.61+0x16b/0x1fc
[261664.321083]  [<ffffffff810c787e>] shrink_zone+0x6e/0x144
[261664.321085]  [<ffffffff810c8173>] do_try_to_free_pages+0x1b3/0x2bc
[261664.321086]  [<ffffffff810c8380>] try_to_free_pages+0x89/0x90
[261664.321088]  [<ffffffff810c000f>] __alloc_pages_nodemask+0x57c/0x8d4
[261664.321090]  [<ffffffff81059377>] ? ttwu_do_wakeup+0x12/0x7f
[261664.321092]  [<ffffffff8105f7ea>] ? set_next_entity+0x26/0x52f
[261664.321094]  [<ffffffff810e9b58>] cache_alloc_refill+0x26d/0x4a3
[261664.321096]  [<ffffffff810e9de6>] kmem_cache_alloc+0x58/0xae
[261664.321098]  [<ffffffffa0a793e9>] spl_kmem_cache_alloc+0x6f/0x65a [spl]
[261664.321101]  [<ffffffff814d0c84>] ? mutex_lock+0xf/0x21
[261664.321103]  [<ffffffffa0a7def0>] ? cv_wait_common+0xcc/0x108 [spl]
[261664.321111]  [<ffffffffa1221cb4>] ? arc_space_consume+0x8f/0x8f [zfs]
[261664.321123]  [<ffffffffa12e9575>] zio_buf_alloc+0x53/0x57 [zfs]
[261664.321130]  [<ffffffffa122211b>] arc_get_data_buf+0x355/0x47c [zfs]
[261664.321137]  [<ffffffffa1220c7f>] ? add_reference+0xef/0x1ed [zfs]
[261664.321145]  [<ffffffffa1227b42>] ? arc_read+0xcc6/0xcc6 [zfs]
[261664.321152]  [<ffffffffa122762a>] arc_read+0x7ae/0xcc6 [zfs]
[261664.321154]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[261664.321164]  [<ffffffffa124b6ad>] traverse_visitbp+0x4ab/0x783 [zfs]
[261664.321174]  [<ffffffffa124b600>] traverse_visitbp+0x3fe/0x783 [zfs]
[261664.321184]  [<ffffffffa124b600>] traverse_visitbp+0x3fe/0x783 [zfs]
[261664.321194]  [<ffffffffa124b600>] traverse_visitbp+0x3fe/0x783 [zfs]
[261664.321204]  [<ffffffffa124b600>] traverse_visitbp+0x3fe/0x783 [zfs]
[261664.321213]  [<ffffffffa124b600>] traverse_visitbp+0x3fe/0x783 [zfs]
[261664.321222]  [<ffffffffa124b600>] traverse_visitbp+0x3fe/0x783 [zfs]
[261664.321232]  [<ffffffffa124be90>] traverse_dnode+0xb3/0x161 [zfs]
[261664.321241]  [<ffffffffa124b843>] traverse_visitbp+0x641/0x783 [zfs]
[261664.321243]  [<ffffffffa0a7a27f>] ? taskq_dispatch+0x16b/0x17d [spl]
[261664.321252]  [<ffffffffa124bcd4>] traverse_impl+0x34f/0x458 [zfs]
[261664.321254]  [<ffffffff8106176d>] ? enqueue_task_fair+0xb83/0xbe3
[261664.321264]  [<ffffffffa124729b>] ? dump_bytes_cb+0x13e/0x13e [zfs]
[261664.321273]  [<ffffffffa124c15b>] traverse_dataset_resume+0x4d/0x4f [zfs]
[261664.321282]  [<ffffffffa1247403>] ? dmu_recv_existing_end+0x6f/0x6f [zfs]
[261664.321291]  [<ffffffffa12472d3>] send_traverse_thread+0x38/0x75 [zfs]
[261664.321294]  [<ffffffffa0a79cf0>] thread_generic_wrapper+0x6c/0x79 [spl]
[261664.321296]  [<ffffffffa0a79c84>] ? __thread_create+0x112/0x112 [spl]
[261664.321298]  [<ffffffff8105533a>] kthread+0xcd/0xd5
[261664.321299]  [<ffffffff8105526d>] ? kthread_freezable_should_stop+0x43/0x43
[261664.321300]  [<ffffffff814d265f>] ret_from_fork+0x3f/0x70
[261664.321302]  [<ffffffff8105526d>] ? kthread_freezable_should_stop+0x43/0x43

This is the GIT version from around July 4, 2016. I have ECC memory in this computer, and no disk errors have been reported.

@JuliaVixen
Copy link
Author

I rebooted, and then did this:

localhost ~ # zfs send -eLv -t 1-c23ec650b-d8-789c636064000310a500c4ec50360710e72765a526973030ec1581a8c1904f4b2b4e2d618003903c1b927c5265496a3190be11b7a50a9bfe92fcf4d2cc14060683dd4cde26fff31b0c90e439c1f27989b9a90c0c39fa3e99e9192545f9f9b90e46068666f15ea539f1c6060c12507360ee4fcd4d4a4dc9cf06f3015cdc1e64  | zfs recv -evsF swappy
resume token contents:
nvlist version: 0
    object = 0x14bd
    offset = 0x0
    bytes = 0x7ab45ed8
    toguid = 0x806fff344b02bb30
    toname = l/Lightroom@2016_Jul_30
    embedok = 1
full send of l/Lightroom@2016_Jul_30 estimated size is 355G
TIME        SENT   SNAPSHOT
receiving full stream of l/Lightroom@2016_Jul_30 into swappy/Lightroom@2016_Jul_30
10:07:14    157M   l/Lightroom@2016_Jul_30
10:07:15    376M   l/Lightroom@2016_Jul_30
10:07:16    615M   l/Lightroom@2016_Jul_30
10:07:17    847M   l/Lightroom@2016_Jul_30
10:07:18   1.04G   l/Lightroom@2016_Jul_30
10:07:19   1.25G   l/Lightroom@2016_Jul_30
[...]

And everything seemed to be working fine... until...

[...]
10:54:28    258G   l/Lightroom@2016_Jul_30
10:54:29    258G   l/Lightroom@2016_Jul_30
10:54:30    258G   l/Lightroom@2016_Jul_30
10:54:31    258G   l/Lightroom@2016_Jul_30
10:54:32    258G   l/Lightroom@2016_Jul_30

... And then this happened, and all I/O blocked again.

[ 4356.035411] VERIFY(!(((*({ __attribute__((unused)) typeof((hash_lock)->m_owner) __var = ( typeof((hash_lock)->m_owner)) 0; (volatile typeof((hash_lock)->m_owner) *)&((hash_lock)->m_owner); }))) == get_current())) failed
[ 4356.035588] PANIC at arc.c:2493:arc_evict_state_impl()
[ 4356.035687] Showing stack for process 11485
[ 4356.035689] CPU: 1 PID: 11485 Comm: send_traverse Tainted: P           O    4.4.6-gentoo-debug #1
[ 4356.035690] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013
[ 4356.035691]  0000000000000000 ffff8804ecc9ae28 ffffffff8130fa9d ffffffffa0fc2817
[ 4356.035693]  ffffffffa0fc27ff ffff8804ecc9ae38 ffffffffa0a692d6 ffff8804ecc9afc8
[ 4356.035694]  ffffffffa0a6948e ffff88082c1fe800 ffff880800000028 ffff8804ecc9afd8
[ 4356.035695] Call Trace:
[ 4356.035699]  [<ffffffff8130fa9d>] dump_stack+0x4d/0x63
[ 4356.035705]  [<ffffffffa0a692d6>] spl_dumpstack+0x3d/0x3f [spl]
[ 4356.035708]  [<ffffffffa0a6948e>] spl_panic+0xa9/0xdc [spl]
[ 4356.035723]  [<ffffffffa0ec2420>] arc_evict_state+0x2df/0x9c4 [zfs]
[ 4356.035726]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[ 4356.035733]  [<ffffffffa0ec2b89>] arc_adjust_impl.constprop.25+0x2a/0x31 [zfs]
[ 4356.035741]  [<ffffffffa0ec2eea>] arc_adjust+0x35a/0x4b8 [zfs]
[ 4356.035748]  [<ffffffffa0ec4f96>] arc_shrink+0x10d/0x10f [zfs]
[ 4356.035755]  [<ffffffffa0ec5042>] __arc_shrinker_func.isra.24+0xaa/0x118 [zfs]
[ 4356.035762]  [<ffffffffa0ec50bc>] arc_shrinker_func_scan_objects+0xc/0x1d [zfs]
[ 4356.035765]  [<ffffffff810c5962>] shrink_slab.part.56.constprop.61+0x16b/0x1fc
[ 4356.035766]  [<ffffffff810c787e>] shrink_zone+0x6e/0x144
[ 4356.035767]  [<ffffffff810c8173>] do_try_to_free_pages+0x1b3/0x2bc
[ 4356.035768]  [<ffffffff810c8380>] try_to_free_pages+0x89/0x90
[ 4356.035770]  [<ffffffff810c000f>] __alloc_pages_nodemask+0x57c/0x8d4
[ 4356.035783]  [<ffffffffa0f46db8>] ? vdev_raidz_io_start+0x1f6/0x33f [zfs]
[ 4356.035785]  [<ffffffff810e9b58>] cache_alloc_refill+0x26d/0x4a3
[ 4356.035787]  [<ffffffff810e9de6>] kmem_cache_alloc+0x58/0xae
[ 4356.035789]  [<ffffffffa0a663e9>] spl_kmem_cache_alloc+0x6f/0x65a [spl]
[ 4356.035802]  [<ffffffffa0f43ea8>] ? vdev_mirror_io_start+0xda/0x1ac [zfs]
[ 4356.035813]  [<ffffffffa0f435f4>] ? vdev_config_sync+0x1b5/0x1b5 [zfs]
[ 4356.035820]  [<ffffffffa0ebecfb>] ? buf_cons+0x47/0x4d [zfs]
[ 4356.035827]  [<ffffffffa0ebecb4>] ? arc_space_consume+0x8f/0x8f [zfs]
[ 4356.035839]  [<ffffffffa0f86575>] zio_buf_alloc+0x53/0x57 [zfs]
[ 4356.035846]  [<ffffffffa0ebf11b>] arc_get_data_buf+0x355/0x47c [zfs]
[ 4356.035853]  [<ffffffffa0ec462a>] arc_read+0x7ae/0xcc6 [zfs]
[ 4356.035860]  [<ffffffffa0ec4b42>] ? arc_read+0xcc6/0xcc6 [zfs]
[ 4356.035871]  [<ffffffffa0ee7e13>] traverse_prefetch_metadata+0xa5/0xa7 [zfs]
[ 4356.035882]  [<ffffffffa0ee7e6a>] prefetch_dnode_metadata+0x55/0xa7 [zfs]
[ 4356.035891]  [<ffffffffa0ee8700>] traverse_visitbp+0x4fe/0x783 [zfs]
[ 4356.035901]  [<ffffffffa0ee8600>] traverse_visitbp+0x3fe/0x783 [zfs]
[ 4356.035910]  [<ffffffffa0ee8600>] traverse_visitbp+0x3fe/0x783 [zfs]
[ 4356.035912]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[ 4356.035921]  [<ffffffffa0ee8600>] traverse_visitbp+0x3fe/0x783 [zfs]
[ 4356.035923]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[ 4356.035932]  [<ffffffffa0ee8600>] traverse_visitbp+0x3fe/0x783 [zfs]
[ 4356.035933]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[ 4356.035943]  [<ffffffffa0ee8600>] traverse_visitbp+0x3fe/0x783 [zfs]
[ 4356.035944]  [<ffffffff814d0a10>] ? mutex_unlock+0x9/0xb
[ 4356.035953]  [<ffffffffa0ee8600>] traverse_visitbp+0x3fe/0x783 [zfs]
[ 4356.035960]  [<ffffffffa0ec4b2d>] ? arc_read+0xcb1/0xcc6 [zfs]
[ 4356.035970]  [<ffffffffa0ee8e90>] traverse_dnode+0xb3/0x161 [zfs]
[ 4356.035979]  [<ffffffffa0ee8843>] traverse_visitbp+0x641/0x783 [zfs]
[ 4356.035981]  [<ffffffffa0a6727f>] ? taskq_dispatch+0x16b/0x17d [spl]
[ 4356.035990]  [<ffffffffa0ee8cd4>] traverse_impl+0x34f/0x458 [zfs]
[ 4356.035992]  [<ffffffff8106176d>] ? enqueue_task_fair+0xb83/0xbe3
[ 4356.036001]  [<ffffffffa0ee429b>] ? dump_bytes_cb+0x13e/0x13e [zfs]
[ 4356.036010]  [<ffffffffa0ee915b>] traverse_dataset_resume+0x4d/0x4f [zfs]
[ 4356.036020]  [<ffffffffa0ee4403>] ? dmu_recv_existing_end+0x6f/0x6f [zfs]
[ 4356.036029]  [<ffffffffa0ee42d3>] send_traverse_thread+0x38/0x75 [zfs]
[ 4356.036031]  [<ffffffffa0a66cf0>] thread_generic_wrapper+0x6c/0x79 [spl]
[ 4356.036033]  [<ffffffffa0a66c84>] ? __thread_create+0x112/0x112 [spl]
[ 4356.036035]  [<ffffffff8105533a>] kthread+0xcd/0xd5
[ 4356.036036]  [<ffffffff8105526d>] ? kthread_freezable_should_stop+0x43/0x43
[ 4356.036038]  [<ffffffff814d265f>] ret_from_fork+0x3f/0x70
[ 4356.036039]  [<ffffffff8105526d>] ? kthread_freezable_should_stop+0x43/0x43

I've been doing this kind of stuff for a month, and there have been no crashes until now. (Although I was using raidz pools, and not a single drive like this time. I'm trying to move some data between two large pools, which I don't have enough SATA ports to plug everything in altogether at the same time.)

@tuxoko
Copy link
Contributor

tuxoko commented Aug 1, 2016

send_traverse_thread need to set spl_fstrans_mark.

@behlendorf
Copy link
Contributor

Agreed. Just like traverse_prefetch_thread.

@behlendorf behlendorf added this to the 0.7.0 milestone Aug 1, 2016
dweeezil added a commit to dweeezil/zfs that referenced this issue Aug 21, 2016
As is the case with traverse_prefetch_thread(), the deep stacks caused
by traversal require disabling reclaim in the send traverse thread.

Fixes: openzfs#4912
dweeezil added a commit to dweeezil/zfs that referenced this issue Aug 22, 2016
As is the case with traverse_prefetch_thread(), the deep stacks caused
by traversal require disabling reclaim in the send traverse thread.

Also, do the same for receive_writer_thread() in which similar problems
have been observed.

Fixes: openzfs#4912
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 19, 2016
As is the case with traverse_prefetch_thread(), the deep stacks caused
by traversal require disabling reclaim in the send traverse thread.

Also, do the same for receive_writer_thread() in which similar problems
have been observed.

Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#4912
Closes openzfs#4998
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 29, 2016
As is the case with traverse_prefetch_thread(), the deep stacks caused
by traversal require disabling reclaim in the send traverse thread.

Also, do the same for receive_writer_thread() in which similar problems
have been observed.

Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#4912
Closes openzfs#4998
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants