-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC at btree.c:1780:zfs_btree_remove() #10989
Comments
@pcd1193182 @sdimitro any thoughts how this might be possible. It's the first report I've seen like it. |
@hicotton02 Could you make a shortlist of the services you are running and the load on said server? |
It is an Unraid Server running a file share, docker with about 20 containers. KVM running 6 Virtual machines. the CPU is 64 core, 128 threads and is on average running 5-15% utilization overall. Here is the ZFS Pool status:
tower-diagnostics-20200928-1415.zip Attached is a diagnostic output of the system. feel free to use what you want from it. if there is some info that is still missing, let me know. |
It looks like the crash is happening as we try to move entries from the defer tree into the allocatable tree, during sync_done. It looks like btree_remove should only be called directly if we're in a gap supporting range tree, but it's possible there's another caller there that got omitted from the stack trace. |
I disassembled the source, and it looks like that's the only caller. So for whatever reason, it looks like it thinks we're processing a gap-supported range tree, which none of the metaslab trees are. Not sure how exactly that happened. |
@hicotton02 FYI 2.0.0-rc4 contains fix d8091c9 from @pcd1193182 for this issue. Assuming this issue doesn't repro easily enough to verify the fix, I think this issue can probably be closed until proven otherwise. |
@adamdmoss that fix, while in a similar area, shouldn't be a fix for this issue. That issue presents as a failure after the btree_remove call, not in the btree_remove call. |
Really? Darn, looked so close. Sorry. |
Heh.. think I hit this today. No idea why. Tried to do zpool sync but it hangs and only on one pool ? ZFS version: 2.1.99-1
Hmm. Non-fatal panic but IO is still dead to that pool. deadman kicked in eventually.
Last edit: For the sake of posterity I looked at the logs above from @hicotton02 , most of those drives are throwing errors no? I see errors even from the SSD's:
In my case I half suspect there was some other environment issue given the UPSs show a power dip around the same time. The cause of that and why it would have affected this host is unclear. I don't see errors from any devices though so it's not clear why zfs wouldn't be able to recover? |
sigh I couldn't unfreeze the pool, even zfs send hung. On reboot the host rolled back to some point I'm assuming is before the panic ... losing everything to that pool in the process. FUCK. The host appeared fine. |
This might be fixed by #13861. Edit: In fact, #13861 explains the |
Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#10989 Closes openzfs#13861
Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#10989 Closes openzfs#13861
Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#10989 Closes openzfs#13861
Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#10989 Closes openzfs#13861
Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#10989 Closes openzfs#13861
Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#10989 Closes openzfs#13861 (cherry picked from commit 13f2b8f)
System information
Describe the problem you're observing
zfs pools becoming unresponsive
Describe how to reproduce the problem
letting system run for a couple days
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: