-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPL PANIC when deleting a snapshot #1499
Comments
The pool is now unimportable due to the same kind of panic |
zdb -lu /some/disk requested by ryao at also additional screencaps |
ZoL is quite dynamic in development. 4-5 months old might be too old. Give developers a chance to help you by updating first to latest -daily code! |
I had a chat with @duidalus and one of his colleges about this issue last night. They were able to do an import using an OmniOS LiveCD. They used either OmniOS_Text_r151006c.iso or OmniOS_Text_bloody_20130208.iso; I am not sure which (clarification by @duidalus would be helpful). They did a readonly import without issue on OmniOS. I am not sure if that is because OmniOS was built without assertions or because Illumos has made improvements to its import code. The last we spoke, they were preparing to copy their data off the pool. From our chat last night, I am under the impression that the system crashed (or hung and was rebooted) while snapshots were being deleted in rapid succession. If the deletion was concurrent, then this might have been related to issue #1495. @duidalus would need to clarify these things. My guess is that the barrier regression fixed by d9b0ebb caused pool corruption following a crash or hard reboot. |
Basically the flow of events started from deleting snapshots in rapid succession and then the SPL panic hit. I am moderately sure that our ZFS version was recent enough to include d9b0ebb but cannot confirm it just yet. We were able to mount the pool in OmniOS as read-only (read-write caused same kind of assertion like the one we had prior to the SPL panic). |
Depending on how old the code was this might also be related to #541 which was fixed. Were you able to import and mount the pool read-only in Linux as well, or just on OmniOS? |
Only on OmniOS, SPL seemed to have an issue mounting the pool read-only. The ASSERT that triggered on readonly import is in this screencap: http://d.adm.fi/ro.png |
Okay, after unpacking the zimage of the used kernel I am fairly certain the SPL version was v0.6.0-rc14 which may have also been the ZFS version (the tag was before the barrier fix came in). In any case we were able to recover our pool by moving the data into a newly created one. Only real issue with this operation was the analysis of what happened and how to recover (which was partially made harder due to the failed ro-import in Linux). |
@duidalus The assert triggered when importing the pool on Linux read-only was fixed 4 months ago, see issue #1332. The fix is in the 0.6.1 tag, were you using an older version? |
Yeah we were using version prior to that. I, however, examined the sources for the kernel used and found out that they did include d9b0ebb fix so it is doubtful to be the cause of the initial problem. I guess #1495 could be the source of the problems but I have no real idea on that. I'm not sure if it is all relevant but the pool did have some corrupt data from it's solaris days before we moved it onto zfsonlinux. |
@duidalus Depending on the corruption it could be responsible for the |
@behlendorf Considering we haven't heard from @atonkyra in a year, close this as stale? |
I'm OK with that since the pool is long gone (thus impossible to debug any more). |
Encountered a SPL panic when snapshot was deleted, Platform is on Ubuntu 12.04 and zfsonlinux version is some 4/5 months old, I can add details later. Dmesg:
[8419543.548971] SPLError: 4802:0:(space_map.c:109:space_map_add()) SPL PANIC [8419543.548972] SPL: Showing stack for process 4802 [8419543.548974] Pid: 4802, comm: z_fr_iss/8 Tainted: G W 3.7.9-fsolstorage+2 #4 [8419543.548974] Call Trace: [8419543.548980] [] ? spl_debug_dumpstack+0x1d/0x40 [8419543.548982] [] ? spl_debug_bug+0x73/0xd0 [8419543.548987] [] ? space_map_add+0xee/0x3b0 [8419543.548991] [] ? __mutex_lock_slowpath+0x56/0x150 [8419543.548993] [] ? __mutex_lock_slowpath+0x56/0x150 [8419543.548997] [] ? metaslab_free_dva+0x125/0x200 [8419543.548999] [] ? metaslab_free+0x84/0xb0 [8419543.549002] [] ? zio_dva_free+0x17/0x30 [8419543.549004] [] ? zio_execute+0x95/0x100 [8419543.549006] [] ? taskq_thread+0x216/0x4c0 [8419543.549009] [] ? try_to_wake_up+0x2a0/0x2a0 [8419543.549013] [] ? task_expire+0x110/0x110 [8419543.549015] [] ? task_expire+0x110/0x110 [8419543.549018] [] ? kthread+0xce/0xe0 [8419543.549020] [] ? kthread_parkme+0x30/0x30 [8419543.549023] [] ? ret_from_fork+0x7c/0xb0 [8419543.549025] [] ? kthread_parkme+0x30/0x30 [8419543.549100] SPLError: 4801:0:(space_map.c:95:space_map_add()) SPL PANIC [8419543.549143] SPL: Showing stack for process 4801 [8419543.549145] Pid: 4801, comm: z_fr_iss/7 Tainted: G W 3.7.9-fsolstorage+2 #4 [8419543.549147] Call Trace: [8419543.549151] [] ? spl_debug_dumpstack+0x1d/0x40 [8419543.549154] [] ? spl_debug_bug+0x73/0xd0 [8419543.549157] [] ? space_map_add+0x2ce/0x3b0 [8419543.549160] [] ? kmalloc_nofail+0x28/0xc0 [8419543.549163] [] ? __mutex_lock_slowpath+0x56/0x150 [8419543.549165] [] ? __mutex_lock_slowpath+0x56/0x150 [8419543.549168] [] ? metaslab_free_dva+0x125/0x200 [8419543.549170] [] ? metaslab_free+0x84/0xb0 [8419543.549173] [] ? zio_dva_free+0x17/0x30 [8419543.549175] [] ? zio_execute+0x95/0x100 [8419543.549177] [] ? taskq_thread+0x216/0x4c0 [8419543.549181] [] ? try_to_wake_up+0x2a0/0x2a0 [8419543.549183] [] ? task_expire+0x110/0x110 [8419543.549185] [] ? task_expire+0x110/0x110 [8419543.549188] [] ? kthread+0xce/0xe0 [8419543.549190] [] ? kthread_parkme+0x30/0x30 [8419543.549193] [] ? ret_from_fork+0x7c/0xb0 [8419543.549195] [] ? kthread_parkme+0x30/0x30
The text was updated successfully, but these errors were encountered: