-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Busy hang #539
Comments
Another spike, almost exactly 60 minutes after the previous one:
And:
|
Can you retest with commit ec2b410 reverted from the spl. It looks like we accidentally introduced a small race which might be part of your problem. We'll be fixing it today, but if you want to test sooner just. git revert ec2b410 |
Yup, that's still ok after an hour: a lot longer than before. linux-3.1.10, zfs-b4b599d, spl-87d1123 w/ ec2b410 reverted |
Sorry, you just happened to catch an regression which slipped in for 24-hours. It's been fixed properly now by commits openzfs/spl@0bb43ca and openzfs/spl@3c6ed54. Go ahead and update to the latest spl master if you like. |
Great! With those 2 commits it's been up and under load for 5 hours with no busy hangs. |
Note: this whole thing issue may have been to do with #513 rather than openzfs/spl@ec2b410. See that issue for more... |
DOSE-716 zdb_args_neg and zdb_args_pos test need updating
…ention error: Err(RequestError(InvalidCredentials))' (openzfs#539)
Hi,
I'm getting a lot of "busy hangs" in ZoL. The symptoms are that the file system activity stops (as seen by 'zpool iostat'), no progress is being made by userland processes writing to ZFS (as seen by 'ps' time and 'strace') and the system load stays up around 2 (with no userland activity). The only way I've found out of this state is to reboot.
This has been occurring with at least these combinations:
linux-3.0.14, zfs-30a9524, spl-e05bec8
linux-3.0.14, zfs-b4b599d, spl-87d1123
linux-3.1.10, zfs-b4b599d, spl-87d1123 <<< currently on this
What can I do to help further diagnose the problem?
The load is an rsync of a large directory structure writing to the ZFS.
The iostat looks like this:
The spl and zfs stats in /proc are pretty quiet:
And in case there's something interesting in there:
...oh, hold on, whilst putting this together I'd left the 'zpool iostat' going from above and I just noticed that after 216 lines (2160 seconds, or 36 minutes) of zeros, I got a single non-zeros line before reverting back to zeros:
And the /proc stats have changed a bit more:
...and there's been no further iostat activity in the 48 minutes since that previous single line.
Cheers,
Chris
The text was updated successfully, but these errors were encountered: