-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list_del corruption. next->prev should be XXX, but was dead000000000200 and crash in userquota_updates_task #7997
Comments
@verygreen to help narrow this down can you confirm you weren't observing this issue with say 0.7.9, or an earlier 0.7.x tag. |
Well, due to some oversight the previous release I was on was some checkout of 0.7.0 as it was reportign itself, though after some doublechecking the last commit was at end of May 2018, git hash is 1a5b96b - did not exhibit any problems on that checkout. I can go back to actual 0.7.9 to make sure if you think that would be helpful. |
just to make it more clear hopefully, I hit this issue on 0.7.11 tag, I did not hit this issue on the checkout of 1a5b96b we also don't seem to have any signs of this in our testing on 0.7.9, but that does not mean too much since apparently our test run for 0.7.11 in standard test environment did not hit this either. |
@verygreen could you give us some more detail on how to reproduce this? I'd like to see if I can hit it on our lustre test cluster. |
I have a bunch of VMs running various lustre tests in a loop. the ones that hit these failures are:
for best results have your kernel built with DEBUG_PAGEALLOC so accesses to freed memory lead to crash (and you don't need to just monitor for the error messages) |
ah, btw a bit of a setup is missing, the other variables I set are like below.
you do not really need a "lustre cluster" for these tests to run. Every VM is a stand-alone "mds+oss+client" kind of config. the more of them you run the more chances something will break. I have 120 instances in my testset. |
Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Requires-spl: spl-0.7-release Issue openzfs#7147 Issue openzfs#7388 Issue openzfs#7997
@verygreen can you apply #8005 to the ZFS version you're testing. It should resolve the list corruption you're seeing. This issue was accidentally introduced by the dnode send/recv fixes, this was fixed in master by edc1e71 which I've backported. 45f0437 Fix 'zfs recv' of non large_dnode send streams |
ok, perfect timing, I was just preparing to update my test setup for a new testing round. Will likely have some results in the morning. Thanks! |
@behlendorf On what ZoL releases the race was introduced? Is the 0.7.9 release affected? |
@shodanshok those 3 commits first landed in the ill-fated 0.7.10. So assuming that the bug is solely from those, then 0.7.9 would not be affected, and the eventual 0.7.12 would have the fix. |
Seen this first time with zfs 0.7.11 on linux 4.18.6 in Debian. No problems with 0.7.9. |
so far so good, 0 crashes in the past 9.5 hours. Without the patch I had at least one crash per hour on my setup. |
@rincebrain Interesting. In #7933 (comment) I tried to replicate it with a very heavy workload based on concurrent instances of |
@shodanshok in order to hit the bug you need to write to a file and then unlink the file while it's being actively written to disk. Once the data's been sync'ed out you won't be able to hit it. You might be able to reproduce it with something like |
Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Requires-spl: spl-0.7-release Issue openzfs#7147 Issue openzfs#7388 Issue openzfs#7997
~24 hours and still no crashes so from my perspective this seems to be fixed now. Thanks! |
The latest comment suggests this was indeed fixed by b32f127 (zfs-0.7.12), closing. |
For the record, I still saw list_del corruption a few times in 7.13 as reported in #9068. The message was slightly different however. |
This is similar to #7933 om latest RHEL7.5 kernel and 0.7.11 checked out from the git tree, but I am hittign with with Lustre testing so I guess it is materially different.
Typically this hits in recovery testing shortly after the FS is brought up after recovery - first list corruption and then crash in userquota_updates_task stepping on a bad pointer:
I have crashdumps and can reproduce this reasonably easy if there are any ideas for fixes.
The text was updated successfully, but these errors were encountered: