-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txg_sync blocked for more than 120 seconds #3835
Comments
Any ideas regarding this issue? Do you think it is related to #3834? Is there a possible workaround. |
@dweeezil Is it possible to set spl_taskq_thread_dynamic=0 on a running system, or is it a boot option only? |
@tjikkun It should be set when the spl module is loaded ( |
Just got this on 0.6.5.2:
|
@tjikkun Is this with (the default) |
Yes, this is with the default setting for spl_taskq_thread_dynamic, I have rebooted this server with spl_taskq_thread_dynamic=0 after the crash. Output of ps aufx | egrep 'z_|spl' from time of crash:
As additional info, at that time a zfs send was running, and it looks like the receiving end was rebooted at about the time of the hang. |
@tjikkun Nothing terribly interesting looking there. What's the memory situation when this is happening (arcstats /proc/vmstat etc.)? I'll try to reproduce this locally on a memory-constrained system. In the mean time, does disabling dynamic taskqs avoid the problem? |
@dweeezil Next time it happens I'll try and get the contents of /proc/vmstat and the output of arcstat.py |
Ok, now had a crash where spl_taskq_thread_dynamic=0
|
@tjikkun There's plenty of free memory but the arc is overshooting its meta limit. Are any of the system tasks such as |
@dweeezil
|
@tjikkun If you weren't having problems in 0.6.4, I suspect this is related to the SB shrinker changes. For the 2.6 EL kernels, 0.6.4 used |
@tjikkun It might be interesting to |
@dweeezil In our production environment we had to rollback to 0.6.4 because of this issue. Thus far I haven't been able to reproduce it in a test setup.
|
@tjikkun The large directory issue should be fixed by 5592404 which will be in the next tag. The space accounting bug should not cause any crashes, however, a3000f9 has been reverted in current master code which will restore the proper space accounting. #3874 has been proposed to fix the underlying problem which a3000f9 attempted to solve without breaking the space accounting. The hang condition you've described in this issue appears to be something completely separate. Were you ever able to get the "pruning, nr_to_scan" debugging messages while this problem was happening? |
@dweeezil |
I also had |
@muzyk10 this issue should have been resolved in the more recent 0.6.5.x tags. I'm closing this for now but I'm happy to reopen it if you're still having issues. |
Just wanted to add that we are seeing a similar issue in 0.6.5.7-1 also with the following trace : |
I got this too with 4.9.15-040915-generic, zfs 0.6.5.9-1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. |
Possibly related to #3834
We are seeing regular hangs. Sometimes the system recovers, sometimes we need to reboot.
ZFS version is zfs-0.6.5-1.el6.x86_64
I am 90% sure that this only happens when a zfs send is in progress. (but in our case more often then not a send is in progress)
Example 1:
Example 2:
If you need anything more please let me know!
The text was updated successfully, but these errors were encountered: