-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
most I/O hang #610
Comments
Thanks for the additional debugging. It appears the following is happening (why I'm not sure)
It would be interested to see both |
Thanks Brian, Could be a regression introduced by a recent change (I'm on the daily ppa) because it never happened before, and after a reboot, the system is back into that state. The other change (appart from the upgrade from ppa 51 to ppa 53 (now 54) is me changing zfs_arc_max from 2GiB to 4GiB (on a system with 16GiB of RAM). cetautomatix# zpool events
errors: No known data errors
|
This may be different this time, the zvol tasks are in D state as well, so is
(811 is zfs_iput_taskq)
|
Sorry, I'm all confused now. There were disk events. I'm not sure how I missed those from a number of disks. Things like:
(same for sdc, sdd, sdh at different times) Followed by
Even though the disks don't match (SAMSUNG_HD203WI_S1UYJ1MZ102100 is sdg, not sdj Today, for that new occurrence, there's no disk error though. |
The first batch of errors are clearly read errors which would go a long way to explain why zfs was unable to read back the required space maps. Or perhaps why it was just taking so long since the second zfs ereport.fs.zfs.delay indicates that the read I/Os were completing successfully (zio_err=0x0) after at least 30 seconds. Today's occurrence may be more of the same, the zvol threads are all stuck of a txg again. You'll want to check what is blocking the txg_sync_thread. As for a regression there's nothing which has recently been merged in to the zfs code which I can see would cause this sort of problem. I would strongly suspect you may be having some hardware trouble. |
For today's occurrence, no disk error. I've reverted to 51. We'll see if I still get the issue. If yes, I'll revert the zfs_arc_max and see how it goes. |
Reverted to 51. Now, those nightly backup went a lot further (this is a backup server backing up data from about 15 other machines, and 80% went through, while the last two days we got stuck after none got through). We're stuck again, but not completely. No disk error. Most applications (mostly rsync) are "stuck", there's a lot of reading from the disks (5MB/s) mostly from z_rd_int's. Some applications occasionaly manage to read some data. Only one application (ntfsclone onto a zvol, patched to only write modified clusters) manages to occasionally write data). It should be noted that it's the one application that is writing to some area that is not dedupped (zvol with volblocksize=4k). Here's a iotop -a ouput run over a few minutes, sorted by accumulated read:
The zpool is getting pretty full, maybe that's part of the problem. |
Can you try 0.6.0.54 with the following patch applied, behlendorf/zfs@1813cc0 . This fixes a slight defect in the zio handling which was accidentally introduced in 0.6.0.53. I expect to be merging it in to master shortly and it may fix your issue. Oh, and it would be more likely to cause trouble on a full pool when logical IOs need to be constructed due to a lack of free space. |
Sorry Brian, I'm just back from vacation. I've updated to ppa-daily 55 which seems to have that patch and will keep you posted. |
Nope, no luck with 55 either. All rsync processes blocked, only ntfsclone ones (writing on non-dedup zvols) occasionaly manage to get anything written to the pool. One processor (not always the same) has some iowait time:
Another thing I notice (but it's the same with 51) is that there are a lot of zil_clean processes:
Which is not far from the number of mounted zfs datasets:
There still is a lot of reading from the disks.
I'll boot back on 51. |
Forget about the "regression" aspect. Same happened this morning with 51. |
Good it have something to do with the size of the dedup table or something like that? |
Now, I've removed 2 old snapshots (of a 6TB sparse compressed but not dedupped zvol) which freed 600GB (going from 330GB free to 988GB on that 10TB zpool) and it seems to have kicked it out of that stuck state. |
Thanks for the update, perhaps it's simply related to available free space. Once in that state you will be using different aspects on the code base more, such as gang blocks, which are no doubt less tested. I haven't made any changes in this regard for the Linux port, but certainly even the upstream code is known to behave badly when your pool is very full. |
@stephane-chazelas It looks like this might be a preemption glitch. Is your kernel compiled with CONFIG_PREEMPT_VOLUNTARY=y? |
Nope. Note that it only happened when the zpool usage was over 90%. |
I'm also seeing this on the stock Ubuntu kernel. CONFIG_PREEMPT_VOLUNTARY is set to y, and I do have a pool over 90% full. I'll try to move some data around and see if the problem stops happening if I can unload the pool a bit. |
I just had a lockup with 12.04 stock ubuntu kernel with what looks like this issue. CONFIG_PREEMPT_VOLUNTARY=y I had to back out to using my old opensolaris VM (this is a virtualized SAN/NAS) to avoid hassles. Trace is: Oct 18 09:30:07 ubuntusan1 kernel: [124637.468979] INFO: task txg_sync:2241 blo |
Closing as stale. This hasn't been observed is quite a while and there have been numerous fixes since 0.6.0.53 which could have been the root cause for this. If your able to reproduce in the issue with the latest code we'll take another hard look at this. |
I think I'm seeing something very similar with 0.6.1-1~wheezy (kernel 3.7.2). I have a hardware RAID1 array with LUKS (dm-crypt) on it, LVM in that and zfs in that (yes, I know this is bad). The pool has had dedup enabled for some filesystems but I turned it off a while ago. Performance used to be fair if not stellar, but since yesterday I'm seeing huge load spikes (100+). I once had a pool corrupted on this box; my suspect was the RAID controller, so I disabled write-back caching on it a few months ago. I tried both the cfq and the deadline i/o scheduler on the RAID to see if it made a difference, but it didn't really (currently it's set to deadline). The box only has 4G RAM, but the pool is tiny (only 80G, with less than 60 used). iostat -x reports:
iotop (In accumulated mode, sorted by reads):
I also have exactly the same amount of zil_clean threads as the number of mounted zfs filesystems (39), fwiw. At first I thought the problem was maybe that I had too many snapshots (I noticed that the zfs list command zfs-auto-snapshot issued would take more than 30 minutes to complete during these load spikes), so I reduced the number of snapshots from 3200 to about 650, but this didn't really help. At first the pool was pretty full (around 3G free of 60G), but I gave it an additional 20G and it didn't seem to have any effect. top, fwiw:
zpool list:
zpool status:
zpool events:
A typical zfs.delay event looks like this:
|
sysrq-w, part 1:
|
sysrq-w, part 2:
|
The assert() related definitions in glibc 2.25 were altered to warn about assert(X=Y) when -Wparentheses is used. See https://abi-laboratory.pro/tracker/changelog/glibc/2.25/log.html lib/list.c used this construct to set the value of a magic field which is defined only when debugging. Replaced the assert()s with #ifndef/#endifs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes openzfs#610
v0.6.0.53 on debian 3.0.0-16 amd64
First time occurrence of this kind of problem.
Today, I've got a bunch of processes hung in "D" state (though the kernel doesn't report the tasks being hung).
Most I/Os to zfs (raidz1, compress, dedup) hang (system calls don't return) except for a few files (possible cached).
There is disk activity. dstat report heavy read access to all 6 drives in the zpool, though no end-user application seems to be doing any I/O. No write I/O to the disks.
No write I/O spotted.
The above sorted by cumulative read I/O over a period of time. I've been stracing 27852 all along, and no system call ever returned (it's been killed (SIGTERM) long before that but hasn't terminated yet)
There's no special zfs operation like removal of volumes ongoing.
a sysrq-w shows
for all the z_wr_iss threads.
and for blocked userspace application:
The text was updated successfully, but these errors were encountered: