-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Revert "Revert "Fix unlink/xattr deadlock""" #2408
Conversation
This reverts commit 7973e46. That had been intended to workaround a deadlock issue involving zfs_zget(), which was fixed by 6f9548c. The workaround had the side effect of causing zfs_zinactive() to cause excessive cpu utilization in zfs_iput_taskq by queuing an iteration of all objects in a dataset on every unlink on a directory that had extended attributes. That resulted in many issue reports about iput_taskq spinning. Since the original rationale for the change is no longer valid, we can safely revert it to resolve all of those issue reports. Conflicts: module/zfs/zfs_dir.c Closes: openzfs#457 openzfs#2058 openzfs#2128 openzfs#2240
After updating my Gentoo Linux system from ZFS on Linux 0.6.2 to ZoL 0.6.3, I noticed that ryao/zfs@08c0a3d reduces this time to only around 2–5 seconds with an empty cache, and to less than 1 second with a non-empty cache. |
@ryao My concern with reverting this change is that we'll reintroduce the deadlock described in commit b00131d. It's certainly possible that the 6f9548c fix resolves this case but I haven't spent enough time refreshing my memory to say for certain. If you could verify this is no longer possible and explain why that would help speed up getting this merged. Here's the original reproducer.
|
FYI, I've not been able to reproduce the original reproducer during my work on #1548. I'm still in the middle of my testing against a handful of different kernels (thinking the post-3.0 shrinker kernel changes may be involved). It would be interesting to know what kernel versions were involved. |
@behlendorf Issue #266 has a stack trace for that deadlock. 6f9548c includes a stack trace that is essentially the same deadlock. The only difference is that we enter this code path through With that in mind, I believe that 6f9548c was the right fix for #266 and that 7973e46 was not. While 7973e46 did help, it did not cover all cases and the improvements in the cases where it did help came at the expense of a serious regression in the form of #457. Now that 6f9548c is in the tree, there is no reason to keep 7973e46 any longer. |
@dweeezil I know I was able to reproduce the original issue under RHEL6, so 2.6.32. @ryao OK, I've refreshed my memory on this issue. However, there's a lot of history we need to cover so let me try and summarize the critical points.
So from my point of view we can only safely merge this patch once we get to the bottom of #1176. If we can convince ourselves that case is no longer possible then this patch can be merged. |
@behlendorf I finally got 2.6.32 (actually, a stock 2.6.32.61) running on my testing rig and, with an appropriately old spl (openzfs/spl@372c257) and zfs (6f0cf71) and a few manual patches to get it to compile with gcc 4.8.2 (it's static analysis finds some old sizeof bugs, etc.). The original #266 reproducer listed above still deadlocks. Updating spl to master and zfs to master and reverting b00131d and 6f9548c and running the test on my 2.6.32 kernel works just fine (no deadlock). Now that I've got an old kernel working on my otherwise new-ish test system, I'll do some targeted bisection and see if I can find out where the problem was fixed or worked around. |
@dweeezil If you're going to bisect start with e89260a. This is the commit which should cleanly solve the original issue by ensuring iprune_super() never tries to evict a xattr directory inode while it still has cached xattr directory file inode. The lingering question in my mind is if we reintroduce #1176 if we revert these changes. |
@behlendorf Yep, it was e89260a. For reference, I pushed dweeezil/zfs@9f5dfac to document the commit history of the "working" state. If dweeezil/zfs@e2a6ebc is reverted from that branch (which re-introduces b00131d), the deadlock occurs. I'm going to try to hack up a pool with an unlinked set (which we ought to add to zfsonlinux/zfs-images) in an attempt to create a reproducer for #1176. |
@dweeezil Great. It's nice to have that fix confirmed. It should be pretty easy to create a large unlinked set and I agree it would be great to add a pool like this to the zfs-images for testing. |
@dweeezil Why did you revert 6f9548c in your testing? Did you hit the infinite loop @behlendorf predicted might happen? |
@ryao No, I didn't hit it. I already knew the #226 reproducer didn't deadlock with recent master code and I simply wanted to make sure that neither of b00131d or 6f9548c were the reason (it didn't deadlock). In retrospect, I had no reason to touch 6f9548c since my previous attempts at trying the #226 reproducer were done long before that patch was committed. Now I'm trying to get a solid reproducer for the #1176 issue by creating pools (actually, filesystems) with unlinked sets but it looks like I need to do it a bit more cleverly so the unlinked set isn't in just the right order to avoid the deadlock. |
Here's git master with this patch http://bpaste.net/show/420834/ doesn't mount /var/tmp/portage, thinks it's mounted when it isn't (at least not according to zfs mount) 3.15.2 |
more debug http://bpaste.net/show/wkTQ7igorQb7IA5zJYiF/ |
@ryao said that this confirmed @behlendorf's opinion that it could cause the look, so yay. I'm on the second boot now, and it seems fine (until it's not) |
@prometheanfire Yes, http://bpaste.net/show/420834/ was exactly what I was concerned about. We need to explain exactly how that lockup is possible and address it before we can revert this patch. |
this was 'zfs mount -a'. |
@behlendorf I suspect that the right thing to do here is to make each active SA handles hold a reference on the corresponding struct inode. I will take time to play with that idea after I get a handle on another mystery that I am debugging. |
Just for future reference. Fully reverting to the Illumos code where
|
This reverts commit 7973e46 which brings the basic flow of the code back inline with the other ZFS implementations. This was possible due to the work done in these in previous commits. e89260a Directory xattr znodes hold a reference on their parent 26cb948 Avoid 128K kmem allocations in mzap_upgrade() 4acaaf7 Add zfs_iput_async() interface Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#2408 Issue openzfs#457 Issue openzfs#2058 Issue openzfs#2128 Issue openzfs#2240
This reverts commit 7973e46. That had
been intended to workaround a deadlock issue involving zfs_zget(), which
was fixed by 6f9548c. The workaround
had the side effect of causing zfs_zinactive() to cause excessive cpu
utilization in zfs_iput_taskq by queuing an iteration of all objects in
a dataset on every unlink on a directory that had extended attributes.
That resulted in many issue reports about iput_taskq spinning. Since the
original rationale for the change is no longer valid, we can safely
revert it to resolve all of those issue reports.
Conflicts:
module/zfs/zfs_dir.c
Closes:
#457
#2058
#2128
#2240