-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS hanging, eventually leading to system lockup #11754
Comments
what about tracking disk latency with zpool iostat latency counters ( like -vlTd or -w ) to see if this is related (or not related) to disks, which may announce themself failing this way.... from my experience, hung task/latency issues may also be related to disk issues or storage driver/attachment issues. |
I had been watching the aggregate counters for the pool, already, for a little while, to see if there was anything building up:
When running with -vlTd, the individual drives do not deviate from those numbers.
Smart status across all drives show no errors, and all look exactly like this:
|
The two main services running on this are NFS and SCST (for the iSCSI target). For what it's worth: Other stuff that may or may not be useful to know:
|
Strike that. The ashift is 13 on this pool, so no compression makes sense with default volblocksize of 8k. |
This has been reported at least twice by now, possible duplicate of issue #11480? |
Worth to be noted: ZSTD is not adviced to be used with deduplication, due to it being build with updating in mind. |
Good to know. A question: |
Update: Additional stuff to note that I hadn't mentioned before: |
And here's the latest stack trace: INFO: task z_wr_iss:9171 blocked for more than 122 seconds. |
One out of three dedis of ours is repeatedly experiencing the same symptoms, although I can't even get the trace due to a lack of physical access/IPMI/serial. They're all Intel quadcores with 64GB of RAM, running Debian 10, Linux 4.19, ZFS 0.8.6-1 from buster-backports, no dedup, no ZSTD, no regular pool or SMART errors. Backports moved to 2.0.3-1 by now, which an apt upgrade just installed on the third, but I can roll back to 0.8.6-1 easily still, I haven't even rebooted into the new modules yet. I suspect either way I'll experience another lockup in about 3 weeks or less. |
Is this addressed in 2.0.5? |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Closing. I believe this was resolved in the 2.1.0 release by a377bde. That said, if this has been observed in the 2.1.x releases please let me know and I'll reopen this. |
System information
Describe the problem you're observing
After an unspecified amount of time, under seemingly random conditions (ie not particularly loaded, and I don't know of any specific activity going on to prompt it), I start getting a ton of these in the system logs, and eventually it gets bad enough to cause iSCSI commands to time out (this system is an iSCSI target) and, after a while longer, the systems becomes completely unresponsive.
dmesg output from when it starts:
This occurs on every boot of the machine, eventually.
Describe how to reproduce the problem
I am unsure what the trigger is. I do not see it on an identical system attached to the same SAS backplane, using 10 identical drives.
The text was updated successfully, but these errors were encountered: