-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list_del corruption with latest Centos 7 kernel and 0.7.13-1 #9068
Comments
After reboot we were able to entirely populate the data and it didn't crash. So this is not very reproducible. In addition, it turns out that at the same time that we wiped and reinstalled everything on this machine last week, the system administrator also decided to try enabling hyperthreading. This was not the case previously (confirmed by the monitoring history) nor on the new machine. So we suspect that hyperthreading was somehow involved in the crash. We do not have logs going back far enough to see if hyperthreading was enabled last year when we had crashes; it's possible, but we don't have a memory of it or proof. We are now disabling hyperthreading. The machines each have 16 physical cores. |
Since we are no longer having the problem, I close this issue. |
This now happened again on the same machine. |
I've experienced the same. Will the #8005 fix go into 0.7.14? |
According to a comment in #8005 that fix was already in 7.12. It must not have completely fixed the problem however. |
Since this occurs only rarely (but not rarely enough), I note that it happened again today. It wasn't a crash but it froze up all zfs accesses until a reboot. I confirm that hyperthreading is still disabled. The zfs and spl versions haven't changed. The kernel had been upgraded, to 3.10.0-957.27.2.el7.x86_64.
|
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
kernel panic with message
This is the same message I reported with a previous kernel and 0.7.11-1 in #7933. It was supposed to have been fixed by #8005 in 0.7.12-1. This time the system didn't completely crash but it spewed the same message over and over, making systemd-journald compute bound, slowing interactive response time and stopping all activity to the zfs volume.
Describe how to reproduce the problem
Unfortunately I don't have a recipe to reproduce. We were doing lots of write activity to the volume from parallel processes. It ran for several hours before getting the panic. I haven't seen it fail again after several hours with reduced write activity, so I'm ramping it back up again.
We had been running this system since last October on the older kernel and 0.7.9 without a problem. We recently got a new identical system and went through the same process of repopulating all the data (22T of it, mostly small files, takes about a day and half) on a slightly older kernel 3.10.0-957.21.2.el7 and the same zfs/spl version 0.7.13-1.el7 without a problem.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: