-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TXG syncing non-stop #3811
Comments
@DeHackEd are you saying this is a new issue with 0.6.5.1? I noticed similar behavior on my desktop which is running 0.6.5.1 but I was able to determine that I/O was legitimately caused by a particularly busy firefox tab. In my case it wasn't a regression, just behavior I hadn't noticed before. However, one thing I noticed in the process of investigating was that |
I don't think it's a new issue, but I'm now able to reproduce it pretty much on demand. It's like this system's native load just brings it to the surface. I've seen this happen before but never could pin down a cause or a reproducer. I don't think it's caused by load - these sites are not particularly busy... Once I saw a system spinning through TXGs at ludicrous speed. No disk IO whatsoever but /proc/spl/kstat/zfs/$POOL/txgs were just going like nobody's business. Sadly I have no proof of that. Now I have a system that is kept busy, but also seems to be jamming on I/O outright even when I relieve the load. For experimentation I lxc-freeze'd the container and SIGSTOP'd the host nginx process. That should completely bring all IO to a standstill but the contents of |
I seem to have found a way to reproduce this problem on demand. While one web hosting container is running, fire up another. nginx may be the culprit... which is strange. But flipping it on and off several times has consistently reproduced the issue. During a test run under strace it did no file I/O that was on a ZFS filesystem. A few unusual system calls were executed, including accept4, readv/writev on sockets, the epoll_* family, |
Grabbed a debug dump from the system while it was spinning. Download at http://dehacked.2y.net/zfs-spin-debug.txt I started the trouble shortly after timestamp 1443209435 and it persists for a little while. |
I was able to reproduce the spinning-txg problem on a system with no I/O whatsoever # cat /proc/spl/kstat/zfs/aurora2new/txgs 9 0 0x01 50 5600 274570222798398 9351395002199088 txg birth state ndirty nread nwritten reads writes otime qtime wtime stime 7853674 9351394998774304 C 0 0 0 0 0 56650 5843 46243 55814 7853675 9351394998830954 C 0 0 0 0 0 59590 5676 45771 55814 7853676 9351394998890544 C 0 0 0 0 0 57250 5673 47542 54931 7853677 9351394998947794 C 0 0 0 0 0 61030 6403 44139 55614 7853678 9351394999008824 C 0 0 0 0 0 57806 5907 45529 55736 7853679 9351394999066630 C 0 0 0 0 0 59590 6034 44783 55846 7853680 9351394999126220 C 0 0 0 0 0 58274 5553 45916 54930 7853681 9351394999184494 C 0 0 0 0 0 58943 5763 44897 55272 7853682 9351394999243437 C 0 0 0 0 0 57443 5424 46082 54802 7853683 9351394999300880 C 0 0 0 0 0 60644 5723 43330 55239 7853684 9351394999361524 C 0 0 0 0 0 57526 7357 42959 54363 7853685 9351394999419050 C 0 0 0 0 0 72296 7050 28625 54434 7853686 9351394999491346 C 0 0 0 0 0 42653 5987 44246 62312 7853687 9351394999533999 C 0 0 0 0 0 58747 5893 51297 55783 7853688 9351394999592746 C 0 0 0 0 0 62700 5900 47750 54453 7853689 9351394999655446 C 0 0 0 0 0 62120 6920 42446 55634 7853690 9351394999717566 C 0 0 0 0 0 56380 5897 45820 56546 7853691 9351394999773946 C 0 0 0 0 0 59210 5663 46467 55024 7853692 9351394999833156 C 0 0 0 0 0 60133 7297 42987 55794 7853693 9351394999893289 C 0 0 0 0 0 57687 5933 45243 56008 7853694 9351394999950976 C 0 0 0 0 0 59173 6034 44933 106928 7853695 9351395000010149 C 0 0 0 0 0 59314 5860 95877 55912 7853696 9351395000069463 C 0 0 0 0 0 110257 7126 43691 55630 7853697 9351395000179720 C 0 0 0 0 0 58850 6233 44577 56412 7853698 9351395000238570 C 0 0 0 0 0 57937 5780 46746 55859 7853699 9351395000296507 C 0 0 0 0 0 60400 5823 45374 56050 7853700 9351395000356907 C 0 0 0 0 0 58856 5557 45763 56699 7853702 9351395000483429 C 0 0 0 0 0 51360 6210 43634 57357 7853703 9351395000534789 C 0 0 0 0 0 58474 5960 46196 55961 7853704 9351395000593263 C 0 0 0 0 0 60586 6857 43893 545268 7853705 9351395000653849 C 0 0 0 0 0 59337 7503 532187 53169 7853706 9351395000713186 C 0 0 0 0 0 550193 6467 39475 51553 7853707 9351395001263379 C 0 0 0 0 0 54520 5907 40227 51938 7853708 9351395001317899 C 0 0 0 0 0 54354 6933 40042 52286 7853709 9351395001372253 C 0 0 0 0 0 54293 5527 42666 51062 7853710 9351395001426546 C 0 0 0 0 0 63022 5944 33497 51528 7853711 9351395001489568 C 0 0 0 0 0 48044 5760 40366 61483 7853712 9351395001537612 C 0 0 0 0 0 54990 6990 48811 52955 7853713 9351395001592602 C 0 0 0 0 0 63466 5717 42833 57574 7853714 9351395001656068 C 0 0 0 0 0 56270 6134 46131 52875 7853715 9351395001712338 C 0 0 0 0 0 58240 6030 43950 52893 7853716 9351395001770578 C 0 0 0 0 0 59080 7194 40088 51701 7853717 9351395001829658 C 0 0 0 0 0 56210 7270 38882 51744 7853718 9351395001885868 C 0 0 0 0 0 53534 5770 41637 53282 7853719 9351395001939402 C 0 0 0 0 0 56293 5830 41790 51109 7853720 9351395001995695 C 0 0 0 0 0 54670 6600 40807 52394 7853721 9351395002050365 C 0 0 0 0 0 56073 6854 40462 125774 7853722 9351395002106438 C 0 0 0 0 0 55150 5787 115443 52760 7853723 9351395002161588 S 0 0 0 0 0 129560 5897 42047 0 7853724 9351395002291148 W 0 0 0 0 0 56584 7120 0 0 The system had been run pretty deep into swap before I killed the process responsible. That would seem to have been the trigger. Job was a mysqldump but the tables were being memory buffered... bad idea. This system was running: 2.6.32-504.23.4.el6.x86_64, ZFS version 65037d9 plus ABD plus a few of my own patches. |
I think we need to do something like extend the |
@DeHackEd Could you please find out what object 28 in the MOS is ( The reason I find object 28 interesting is because 1 block is being freed from it fairly repeatedly in your debugging output. |
It's |
@DeHackEd That's what I suspected; it's the list of deferred frees, which are processed in the sync task. I find this particularly interesting that we also have #3870 (involving freeing space). I think the next best step as @behlendorf suggested would be to add additional dprintf telemetry to txg_sync_thread when it doesn't sleep. |
Just looking at the changes between 0.6.4 and 0.6.5 commit 4bda3bd is the most relevant but it's not immediately clear how it would cause either of these issues. Still it would be an easy test to try reverting it along with adding some debugging to see what impact it has. |
I think I found an alternative reproducer. I use a tool called Torrus as a mass RRDtool based grapher. I noticed my desktop doing a single cycle of the spin every ~5 minutes, so I recompiled rrdtool with --disable-mmap and it seems to have improved. I haven't tested with your above suggestions, will give it a spin tomorrow. With a reproducible test case I can migrate it into a VM and try it there. |
I was not able to reproduce this problem. My suspicion is that something's wrong with the processing of the deferred free list and that the sync task is staying awake trying to process it. Normally, there's a handful of entries left in the list which are typically picked up in later sync passes but I wonder if it's trying too hard, for some reason, to process the list in this case. It might be interesting to fiddle with the setting of |
A random conversation in IRC made me clue in to what might be the issue. There's a quota on the filesystem which is nearly full, so ZFS is aggressive about flushing out data to prevent the quota from being overflowed. Unlike user quotas where ZFS doesn't care about going over by a bit, dataset quotas are very strictly enforced. Closing and grabbing brown paper bag. |
I have a system that has for reasons not fully understood found itself caught in an infinite transaction commit loop. There is never a transaction not flushing. Eg:
Note that wait time is basically the same as sync time indicating it's always flushing.
== Hardware ==
Mobo: Supermicro X8SIE
CPU: Xeon X3430
RAM: 8 GB, ECC enabled
Hard drives: 3x WD Reds, partitioned /boot (mdadm RAID-1), LVM for system (mdadm RAID-5), ZFS in RAID-Z1
== Software ==
SPL: Loaded module v0.6.5-1
ZFS: Loaded module v0.6.4-240_g31f76e2, ZFS pool version 5000, ZFS filesystem version 5
ZFS version is master commit c938adbdbedc44b05eaf862ee0099561c6aafa6b + ABD merge
Server runs 2 web hosting platforms via LXC with Apache + MySQL, InnoDB has been configured to disable cache flushes for debugging purposes. The container is root ZFS, the host is root ext4.
Even if I freeze the container and shutdown hosting on the host ZFS is still stuck in a commit cycle.
The text was updated successfully, but these errors were encountered: