-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zil_commit_waiter can stall forever #3
Comments
Adding some classy debugging printfs:
We get the following output:
There is no signal to 0xffffff90ac9b3ee8 |
Removing some 'static' in front of those two functions, we can indeed confirm neither is called:
|
Removing all
We end up with this log in its entirety, from start to hang: dtrace zil*
|
The closest we come to |
or, the call for |
They turn out to to skip due to:
and
Might have to call upon @behlendorf to see if he has some insight. |
Ok, it would seem related to:
if replaced with:
Perhaps we have some signed vs unsigned issue going for it with Should I |
It looks like upstream may suffer from this same issue, openzfs/zfs#10440. It may be just a little harder to hit. Would you mind upstreaming your fix? |
It's in the queue, but will move it forward and push out today |
I don't see how it could affect Linux, but doesn't hurt to try |
Me either. It's a signed type on Linux so it should be fine, but we'll need this change anyway for macos. |
Added a review comment, but in case it gets lost I have the following concern:
|
Likewise:
|
Did this PR pass the FreeBSD bots? After rebasing I get tests hangs. |
The CI did in fact catch this, but due to the other recent issues with FreeBSD head I didn't catch this at the time and it slipped in. Sorry! |
@lundman - thanks for the explanation, I think 'git grep' failed me when looking for what was the normal behavior. Glad I sort of accidentally found a bug in the freebsd SPL though. 🤣 |
Mixing ZIL and normal allocations has several problems: 1. The ZIL allocations are allocated, written to disk, and then a few seconds later freed. This leaves behind holes (free segments) where the ZIL blocks used to be, which increases fragmentation, which negatively impacts performance. 2. When under moderate load, ZIL allocations are of 128KB. If the pool is fairly fragmented, there may not be many free chunks of that size. This causes ZFS to load more metaslabs to locate free segments of 128KB or more. The loading happens synchronously (from zil_commit()), and can take around a second even if the metaslab's spacemap is cached in the ARC. All concurrent synchronous operations on this filesystem must wait while the metaslab is loading. This can cause a significant performance impact. 3. If the pool is very fragmented, there may be zero free chunks of 128KB or more. In this case, the ZIL falls back to txg_wait_synced(), which has an enormous performance impact. These problems can be eliminated by using a dedicated log device ("slog"), even one with the same performance characteristics as the normal devices. This change sets aside one metaslab from each top-level vdev that is preferentially used for ZIL allocations (vdev_log_mg, spa_embedded_log_class). From an allocation perspective, this is similar to having a dedicated log device, and it eliminates the above-mentioned performance problems. Log (ZIL) blocks can be allocated from the following locations. Each one is tried in order until the allocation succeeds: 1. dedicated log vdevs, aka "slog" (spa_log_class) 2. embedded slog metaslabs (spa_embedded_log_class) 3. other metaslabs in normal vdevs (spa_normal_class) The space required for the embedded slog metaslabs is usually between 0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop" space that is not available for user data. On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity, and recordsize=8k, testing shows a ~50% performance increase on random 8k sync writes. On even more fragmented systems (which hit problem #3 above and call txg_wait_synced()), the performance improvement can be arbitrarily large (>100x). Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Don Brady <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11389
System information
git HEAD
Describe the problem you're observing
a call to
fsync
can stall forever inzil_commit_waiter
.Describe how to reproduce the problem
Easily repeatable, first test encountered with zfs-tests.
Include any warning/errors/backtraces from the system logs
The stalled command:
The main stack:
Duration: 9.98s
Steps: 999 (10ms sampling interval)
zfs stack
There does not appear to be any other threads in play, generally they are all idle. For example;
other stacks
The code that is stuck is here:
https://github.com/openzfsonosx/openzfs/blob/master/module/zfs/zil.c#L2690
and it is spinning rather quickly, easily taking out a "core" on its own. But increasing the timeout does not fix the problem, just lowers the loadavg. It is simply not signalled, or if it is, it is missed.
The text was updated successfully, but these errors were encountered: