-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel panics when i start incremental recv. #7097
Comments
Did you say on IRC you were using encryption on this pool? |
@beren12 No. I don't use encryption on both system. |
Hello again. I try to send with resume token and B server got panic again with same log.. After that I run a scrub on destination:
(The future are encryption which I don't use, nothing important) As you can see the pool have 0 issue. See you soon dear friends. |
Usually crashing like that is caused when the code hits a call to @morphinz and @kpande |
@kpande. With the configure options there is a good chance it might not hard panic like that and you won't need the cable. You can also get to a raw terminal (not blocked by X) with Ctrl + alt + f5 (or any other f-key). |
Sorry for the late answer. I ran Scrub on both pool "repaired 0B - with 0 errors" After that I tried few weird things; On A server "which is the source" I decide to try a FULL send-recv "In same pool". I'm confused. What is going on???
Gues what? Everything was JUST FINE... I didnt get panic. IF I clone first snapshot and write some file on it I was able to RESUME. @kpande Yes I don't use encryption. I was trying zfs-git and the feature was encryption. @tcaputi I have bigger dataset than the "xXx" but I only have the problem on 2 dataset. 1 of them 27t and the other was 80T. |
@morphinz I'm a little confused at the moment. Does this issue only happen when you use a resume token? In your comment from yesterday you don't mention the resume token in your commands. When the kernel panics is the stack trace the same as the one you posted, or are you unable to get a stack trace at all? @kpande any updates here? |
@tcaputi When I open the issue I was thinking this is all about with resume token but now %100 I'm sure its not. Also I don't use resume token anymore. I have the issue with or without resume token. Also today I create a bookmark from that snapshot and I delete the snapshot because I was thinking the snapshot will cause the problem and deleting will be solve my problem but it did not.. I got same result and same stack trace. As you can see at below my source is "Apool/xXx" and destination "Apool/xXx-testing" in same pool.
And I got panic. |
@tcaputi sorry for the outstanding delay. I was busy with other stuff. Also I thought I found a workaround for this problem and that bought me some time. However my workaround failed and this is a top priority for me again :( Since it's been a while I've setup a fresh new platform in order to make the problem perfectly reproduciable. I have tried a couple of distributions, kernel and zfs versions. Finally I've compiled the latest spl bits with DEBUG parameters ( First let me share my various distro and version results. Arch Linux
Arch Linux
Centos 7
Centos 7
And finally here is the results of the latest bits of spl/zfs with SPL DEBUG flags. Centos 7 - SPL DEBUG
As supplementary information here is my pool: (encryption & project_quota features are not enabled)
Here is my dataset:
Here is more info about the dataset
zfs get receive_resume_token fkmmedium/images
Finally I have a couple of things with DEBUG enabled latest bits:
This issue is perfectly reproduciable. Please let me know if I can provide more information. Also as this is an isoloted test environment I can provide private access if needed. |
@morphinz thank you very much for the detailed report. If access to the machine is available, that would probably be the most convenient thing for me to work with, since this bug is probably related to something about the send files you already have. You can email credentials to me at [email protected] and I will look into it today. Otherwise I can reproduce this myself tomorrow. |
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Fixes: openzfs#7097 Signed-off-by: Tom Caputi <[email protected]>
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Fixes: openzfs#7097 Signed-off-by: Tom Caputi <[email protected]>
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Fixes: openzfs#7097 Signed-off-by: Tom Caputi <[email protected]>
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7097 Closes #7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#7097 Closes openzfs#7433
System information
Describe the problem you're observing
When i start zfs send -recv with resume token kernel panics!..
I have 2 pool on 2 server and running replication on them via WAN.
On "A" pool to "B" pool i have 10++ dataset replication but only 1 of them has this problem.
When i start send-recv on this dataset with resume token to B, "B" node goes panic everytime.
Other datasets are just fine. I don't see any log when i start them.
I tried with older kernel and newer kernel or "pti=off" nothing changed.
And i have this problem only on 1 dataset. For this reason I think the problem is related to zfs.
Include any warning/errors/backtraces from the system logs
When i start zfs send with resume token kernel gives this log and panics:
The text was updated successfully, but these errors were encountered: