-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soft lockup errors writing to zvol #922
Comments
This might be more meaningfull: [ 208.481594] BUG: soft lockup - CPU#9 stuck for 23s! [z_wr_iss/9:2337] |
That's a new one. What version of the SPL/ZFS were you using? |
commit 2b28613 Should I git pull and retest? It's as simple as create a pool with a single (in this case 1TB SATA) disk, create a zvol then copy data to it. |
I just wanted to make sure it was something current. This may be an issue with Linux 3.5.3 we'll have to investigate further, but this code has been working well with older kernels for a while now. |
Should I retest with 3.4.x or older still? |
Yes 3.4 would be a good start. I regularly test with zvols and this kernel so if it doesn't work then we'll have to figure out what's going on in your environment. |
3.4.10 works flawlessly under the same conditions. I guess this is a 3.5.x issue :-( |
That at least narrows down where to look. There must have been another API change in the 3.5 kernel. |
On 3.4.10 this happens as well:
As before it's so bad I have to reset the machine hard. |
@behlendorf it seems this exists in 3.4.x as well, though 3.5.x triggers it more easily i'm guessing it's a low memory / memory reclaim issue to trigger this you need to write more data than there is memory in the system (1.5x suffices in my case)
if you avoid memory pressure by doing:
or
if the arc size is limited to under half the total memory of the system things behave much better, the suggestion by @byteharmony in issue #944 (#944 (comment)) seems to help greatly here i wonder if the logic that arc can be 1/2 the system memory is a bit agressive on a dual socket (numa) system because it's possible (for whatever reason) to exhast all the memory in a node; it might be more reasonable to limit the arc size to some fraction of the smallest node if you're testing this on a large system and can't make it show up, try allocating a bunch of large (2M) pages in each node ot artificially constrain the amount of generically available memory fe:
(adjust as needed) |
@cwedgwood Interesting. Have you looked at the maximum number of emergency objects kicked in? You can check this in lat few columns of |
There is a lot of info in that proc file, What are we looking for there? My KVM test / devel environment has yet to fail on anything (Hyper-V didn't make it out of the gates and physical servers apear to be second best). I've now got serial configured for physical boxes. I have one I'm detailing crash info for you on right now. (Need to do it again, the 200 line limit isn't quite long enough ;). BK |
In particular I'm interested in the last three columns of the zio_buf_131072 and zio_data_buf_131072 rows. They will show the worst case usage for these large allocations. I'm wondering if the need for them spiked at some point and we have several 100 or 1000 outstanding. |
@behlendorf updated to rc11 and it still happens i have:
I get 1000s of traces like: [ 529.026866] [] ? kmalloc_nofail+0x2c/0x3e [spl] that crush the machine and require a reset watching the spl kmem stats until this happes (one second intervals): zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 |
Adding to this. It's trivial to break things with a local XFS filesystem on a ZVOL and some moderate write pressure. Typically things break in seconds but minutes at most. This actually isn't even a contrived situation, I use ZVOLs for testing/development often and have basically has to abandon this as it's unusable. |
@cwedgwood Have you tried explicitly setting both the ZVOL block size and XFS block size to 4k. This should result in the easiest workload for the Linux VM and VFS. If the block sizes are mismatched it will result in lots of extra read+modify+write operations which might cause performance problems. |
@behlendorf yes, I explicitly use 4K blocks in all cases |
testing with numa disabled (that is disabled in the chipset, so the ram is interleaved between nodes on cacheline boundaries, there is no SRAT, etc.. everything is flat) it still beaks, hard, exactly the same way that is when memory is low it wedges up solid watching /proc/meminfo the last output i get before it dies
with most stack traces in the form of:
@behlendorf given how easy this is to break would it be useful to reproduce this in a kvm vm/instance which i can attach to the nearest pigeon heading you way? |
similar/'related issue on the list: https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!topic/zfs-discuss/o26bGiNQ2z0 |
@cwedgwood OK, I give. I'll see if I can recreate it. If I can do that, it'll be a lot easier to kill. Can you post a trivial recipe for recreating this including:
|
Try echo 3 >/proc/sys/vm/drop_caches. |
I'm also having this problem (I believe); you can see my conversation with Fajar in zfs-discuss about it here: https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!topic/zfs-discuss/o26bGiNQ2z0 I see this problem in Gentoo (running Linux 3.4.9) and CentOS 6.3 (running Linux 2.6.32-279.9.1.el6) in (at least) versions 0.6.0-rc10 and 0.6.0-rc11 of spl/zfs. With any level of zfs_arc_max restriction (as much as 50% of RAM; 4GB in one case, 16GB in another, and down to 512MB), persistent writing to zvols drives the memory usage up until the system softlocks. The CentOS machine managed to kill arc_adapt and blkid at one point, there's a log here: Let me know what else I can give you. |
@drukargin The most useful thing you could post would be a trivial reproducer for CentOS 6.3 which I can run in a VM. If I can reproduce the issue with a ZVOL in a VM then there's a good chance we can make short work of it. |
As an update: I'm having trouble reproducing the failure in a CentOS VM. It's possible the failure in my CentOS box was something similar to #1036, as it has been greatly helped since I tweaked zfs_arc_max down to about 1/3 of the RAM after my VM commitments.(32 GB total, 8 GB VMs, 8 GB zfs_arc_max). The Gentoo box still has the problem; I'm going to try a few things on it today, and rebuild the my VM with Gentoo to see if I can reproduce it there. |
Further update: I tried downgrading the Gentoo box from 3.4.9 to 3.3.8 to 3.2.12; all continued to exhibit the same problem. I booted up a CentOS LiveCD, installed zfs and ran my test case with no problems. I'll retask my VM to work with Gentoo, and see if I can make you a case (I'll give you the root OS so you don't need to build it yourself). |
Rereading this whole thread I see this is an issue with kernel 3.x. I'm only on 2.6 so I'm not of much help here. When centos moves to 3.x I'll be in the game! I have no issues using dd to move data on zvol with 2.6 BK |
@cwedgwood OK, I think I've been able to reproduce this issue and I'm starting to dig in to it. It certainly looks like some sort of thrashing in the VM as you described. However, I'm only able to recreate the problem when I use a small zvol block size such, like the default 8K, when I increase the block size to 128K it's much better behaved. Does this match your experience? |
I have identified the same issue, https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!searchin/zfs-discuss/lockups/zfs-discuss/yxi45H3kywQ/mws1rqdOicwJ |
@mauricev I'm actually looking at this issue now. And despite @cwedgwood's ability to hit it easily I'm not having much luck reproducing it. I can certainly make the VM work hard and get relatively poor I/O performance but nothing resembling an actually failure. Can you describe your hardware (cpus, memory, disk), exact test case (zvol block size, workload), and what it looks like when you get a failure? |
You can try setting up a Windows system with a hard drive filled with a few hundred gigs. I set up Gentoo Linux (given 4 cores, 8192MB for RAM and kernel 3.4.10, although you can also use 3.6.1) in VMWare Fusion and I created three 200 GB vmdks, split into 2 GB files and I created a raidz pool with them. Then I created a zvol with the most space, 375G. I then exported this out using the built-in iSCSI (with targetcli). Then on the Windows W2K8sp1) system, I copy the filled hard drive to the zvol using a program HDClone. It runs for a short while and then HDClone begins reporting errors and soft-lockups start. Unfortunately, the kernel doesn't give any more debugging info and the ssh console is locked up. My Mac is a Mac Pro with dual quad-cores 2.8 Ghz and the W2K8 is a VM running on ESXi 4.1 (dual quad core 2.93 GHz). It's assigned 2 cores and 4.5 GB RAM. Oddly, I just tried a pool of one drive (with volblocksize=128k to avoid the lockup) and the performance was even worse, barely making a 1MB/second. |
So to make sure I understand, you are using ZOL to share out an iSCSI LUN which is having data pushed onto it via the HDClone program which is sending the iSCSI mounted ZOL Lun on the windows machine. We do something very similar with windows backups sent to an iSCSI LUN on CENTOS6.3 using ZOL. No stability problems. It’s possible your problem lies between your ISCSI target software and ZOL. My .02 |
Yep, that it. If I use volblocksize=128K, it runs fine, but very slowly. I don't yet know whether the slowness is due to zfs, iSCSI or somewhere else in the path. |
I’ve attempted to do work with the vol block size in windows to see what is fastest with different speed tests, but the layers of software don’t allow for effective testing (iSCSI and windows). I may have a new way to do better testing. I connect the ZVOL direct to a KVM linux based guest OS. Haven’t done the tests yet though... Will post results when I get around to that. A few things to consider: NTFS block size: a 64K size would make sense on a 128K block size. Perhaps doing 64k? and 64k? BK |
For those chasing this issue you could try running with the following branch. It includes the following patches:
https://github.com/behlendorf/spl/branches/stats At the moment I'm unable to recreate anything more serious than a sluggish performance on my test system. |
After several hours of making copies of the kernel source tree in an xfs file system layered on top of a zvol with 4k block size I was able to make my test system quite laggy. Performance to the pool was very bad and I was seeing kswapd and arc_adapt take a large amount of cpu time. I was never able to produce any "blocked thread" warnings, but I did profile the system (using oprofile) and determined that all the cpu time was going to The profile points to an issue which I was originally concerned about when writing the emergency slab object code, commit e2dcc6e. However, at the time I wasn't able to come up with a real workload which caused the problem so I opted against prematurely optimizing for this case. The issue is that all the emergency objects which get created are tracked on a single list protected by a spin lock. Now as long as we never create to many of these objects it's not an issue. And really these should only be created under unlikely circumstances. However, in this test case that's just not true. My system wracked up ~150,000 of these objects as arc buf headers which are quite long lived. That drug performance way down and could easily lead to starvation the more CPUs you have contending on this lock. The good news is that I put a patch together months ago to improve this by moving the emergency objects to a hash. As I said, I just never merged it because it made the code more complicated and I wasn't 100% sure we needed it. https://github.com/behlendorf/spl/tree/slab_hash I've merged that patch in to the previous "stats" branches and I expect it will help but I haven't had a chance to seriously test it just yet. If someone gets a chance to that would be great. What you want to look for is that the cpu usage drops considerably and the max depth show in |
You might want to hold of testing the slab hash patch. It looks like it needs a little more polish I managed to crash my system with it twice. I'll try and run down any lingering issues this weekend. |
@cwedgwood I have a set of patches which should significantly improve your use case. They resolve a number of performance issues in the slab implementation which were causing the system to thrash. Please try the code on the following branches and let me know. https://github.com/behlendorf/spl/branches/stats |
@behlendorf it seems a lot better, but with enough effort on the serial console:
at which point i had to reboot to get the machine back |
Well that's some progress at least. What kernel version were you using? |
That last test was 3.6.4 but the problem exists in 3.4.x and 3.5.x as well. I had earlier switched to 3.6.x in the hope it would help. |
OK, I only ask because there was a serious vmalloc() performance regression between 2.6.31-2.6.38 which I suspect is causing some of my slowness in my testing. I'm patching the RHEL kernel now with the upstream fix to see how much it improves things. Before your system hangs do you notice the [events/x] threads starting to take a majority of the available CPU? |
for reference
log and cache devices don't seem to affect things ashift=12 sde...sdi are 2TiB AF (4k sector) 'green' devices (ie. slow) |
a tiny amount of IO (500MB of small files) ... worked fine, then about 5s later (guessing) the machine wedged up with:
busy processes
wedged processes
meminfo
spl slab
arc stats
|
with the latest 'stats spl i get further but still things explode
recent slab details (1s apart)
|
@cwedgwood I was able to reproduce a failure in my Fedora 17 VM with the 3.6.3-1 debug kernel. While different from your previous debug messages it would explain the hard lockup your seeing. The following GPF when hit multiple times in the kernel context would absolutely cause real hardware to hang as you've described. I suspect the GPF may be caused by the condition variable in the range lock being destroyed and the memory being freed before the various waiters can be notified. We had similar issue with this exact lock some time ago and I thought I resolved them but perhaps there's still a tiny race. This is actually an upstream bug it just happens that the Solaris condition variable implementation isn't particularly sensitive to this kind of misuse typically because the condition variable is embedded in a fairly long lived structure. No patch yet, but I'll chew on it now that I've found a concrete issue. general protection fault: 0000 [#1] SMP CPU 6 Pid: 1388, comm: zvol/14 Tainted: G O 3.6.3-1.fc17.x86_64.debug #1 Bochs Bochs RIP: 0010:[] [] __wake_up_common+0x2b/0x90 RSP: 0018:ffff880051809ab0 EFLAGS: 00010082 RAX: 0000000000000286 RBX: ffff88004c423298 RCX: 0000000000000000 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000003 RDI: ffff88004c423298 RBP: ffff880051809af0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004c4232e0 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f9e38112000 CR3: 000000001c077000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process zvol/14 (pid: 1388, threadinfo ffff880051808000, task ffff880051800000) Stack: 0000000151809af0 0000000000000000 ffffffff810a1172 ffff88004c423298 0000000000000286 0000000000000003 0000000000000001 0000000000000000 ffff880051809b30 ffffffff810a1188 ffff880051a8a5b0 ffff88004c423238 Call Trace: [] ? __wake_up+0x32/0x70 [] __wake_up+0x48/0x70 [] cv_wait_common+0x1b8/0x3d0 [spl] [] ? wake_up_bit+0x40/0x40 [] __cv_wait+0x13/0x20 [spl] [] zfs_range_lock+0x4d6/0x620 [zfs] [] zvol_get_data+0x89/0x150 [zfs] [] zil_commit+0x5a2/0x770 [zfs] [] zvol_write+0x1b2/0x480 [zfs] [] taskq_thread+0x250/0x820 [spl] [] ? finish_task_switch+0x3f/0x120 [] ? try_to_wake_up+0x340/0x340 [] ? __taskq_create+0x6e0/0x6e0 [spl] [] kthread+0xb7/0xc0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x70/0x70 [] ? gs_change+0x13/0x13 Code: 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 e8 2a 3c 64 00 89 55 c4 48 8b 57 48 4c 8d 67 48 41 89 f7 41 89 ce 4c 89 45 c8 <4c> 8b 2a 48 8d 42 e8 49 83 ed 18 49 39 d4 RIP [] __wake_up_common+0x2b/0x90 RSP general protection fault: 0000 [#1] SMP CPU 6 Pid: 1388, comm: zvol/14 Tainted: G O 3.6.3-1.fc17.x86_64.debug #1 Bochs Bochs RIP: 0010:[] [] __wake_up_common+0x2b/0x90 RSP: 0018:ffff880051809ab0 EFLAGS: 00010082 RAX: 0000000000000286 RBX: ffff88004c423298 RCX: 0000000000000000 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000003 RDI: ffff88004c423298 RBP: ffff880051809af0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004c4232e0 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f9e38112000 CR3: 000000001c077000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process zvol/14 (pid: 1388, threadinfo ffff880051808000, task ffff880051800000) Stack: 0000000151809af0 0000000000000000 ffffffff810a1172 ffff88004c423298 0000000000000286 0000000000000003 0000000000000001 0000000000000000 ffff880051809b30 ffffffff810a1188 ffff880051a8a5b0 ffff88004c423238 Call Trace: [] ? __wake_up+0x32/0x70 [] __wake_up+0x48/0x70 [] cv_wait_common+0x1b8/0x3d0 [spl] [] ? wake_up_bit+0x40/0x40 [] __cv_wait+0x13/0x20 [spl] [] zfs_range_lock+0x4d6/0x620 [zfs] [] zvol_get_data+0x89/0x150 [zfs] [] zil_commit+0x5a2/0x770 [zfs] [] zvol_write+0x1b2/0x480 [zfs] [] taskq_thread+0x250/0x820 [spl] [] ? finish_task_switch+0x3f/0x120 [] ? try_to_wake_up+0x340/0x340 [] ? __taskq_create+0x6e0/0x6e0 [spl] [] kthread+0xb7/0xc0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x70/0x70 [] ? gs_change+0x13/0x13 Code: 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 e8 2a 3c 64 00 89 55 c4 48 8b 57 48 4c 8d 67 48 41 89 f7 41 89 ce 4c 89 45 c8 <4c> 8b 2a 48 8d 42 e8 49 83 ed 18 49 39 d4 RIP [] __wake_up_common+0x2b/0x90 RSP BUG: scheduling while atomic: zvol/14/1388/0x10000002 INFO: lockdep is turned off. Modules linked in: xfs zfs(O) zcommon(O) zunicode(O) znvpair(O) zavl(O) splat(O) spl(O) zlib_deflate lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_conntrack Pid: 1388, comm: zvol/14 Tainted: G D O 3.6.3-1.fc17.x86_64.debug #1 Call Trace: [] __schedule_bug+0x67/0x75 [] __schedule+0x98b/0x9f0 [] __cond_resched+0x2a/0x40 [] _cond_resched+0x30/0x40 [] mutex_lock_nested+0x33/0x390 [] ? exit_fs+0x47/0xa0 [] perf_event_exit_task+0x30/0x220 [] do_exit+0x1d5/0xb00 [] ? kmsg_dump+0x1b8/0x240 [] ? kmsg_dump+0x25/0x240 [] oops_end+0x9d/0xe0 [] die+0x58/0x90 [] do_general_protection+0x162/0x170 [] ? restore_args+0x30/0x30 [] general_protection+0x25/0x30 [] ? __wake_up_common+0x2b/0x90 [] ? __wake_up+0x32/0x70 [] __wake_up+0x48/0x70 [] cv_wait_common+0x1b8/0x3d0 [spl] [] ? wake_up_bit+0x40/0x40 [] __cv_wait+0x13/0x20 [spl] [] zfs_range_lock+0x4d6/0x620 [zfs] [] zvol_get_data+0x89/0x150 [zfs] [] zil_commit+0x5a2/0x770 [zfs] [] zvol_write+0x1b2/0x480 [zfs] [] taskq_thread+0x250/0x820 [spl] [] ? finish_task_switch+0x3f/0x120 [] ? try_to_wake_up+0x340/0x340 [] ? __taskq_create+0x6e0/0x6e0 [spl] [] kthread+0xb7/0xc0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x70/0x70 [] ? gs_change+0x13/0x13 |
@cwedgwood Another possibility occurred to me which I think fits all the facts. I'm still going to verify the CVs are 100% solid, but the crash might also be the result of a stack overflow. In the Linux kernel we only have 8k of stack space which isn't a lot of elbow room. Under Solaris/FreeBSD the default in 24k and ZFS was originally written with that limit in mind. Now I've gone through a lot of effort to bring the ZFS stack usage down, and for all the workloads I'm aware of stack overruns never occur. The only remaining overrun I'm positive about occurs when running ZFS over multipath devices (#675). The biggest consumer is usually the stack needed to recursively traverse an objects block pointers. However, your zvol+xfs workload may just push things over the edge too. Running xfs on a zvol is probably close to the worst case, you have a single object and your allocating small blocks all over the entire virtual device. Add in to that the usual xfs stack overhead (which historically has been substantial) and you may trash the stack. We can test if this is the problem by increasing the default linux stack size to 16k. This can be done safely by recompiling the kernel with
|
all recent 'stats stuff, 16K kernel stacks, wedged hard
|
process use when it died
|
This branch contains kmem cache optimizations designed to resolve the lockups reported in openzfs/zfs#922. The lockups were largely the result of spin lock contention in the slab under low memory conditions. Fundamentally, these changes are all designed to minimize that contention though a variety of methods. * Improved vmem cached deadlock detection * Track emergency objects in rbtree * Optimize spl_kmem_cache_free() * Never spin in kmem_cache_alloc() Signed-off-by: Brian Behlendorf <[email protected]> openzfs/zfs#922
The kmem cache improvements which prevent the system from thrashing and hanging have been merged in to master. For those impacted by this issue if you could test the latest code it would be appreciated. |
create a zvol, dd data into it and fairly quickly we get:
[ 622.083345] INFO: task flush-230:0:4232 blocked for more than 20 seconds.
[ 622.090558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 622.098746] flush-230:0 D ffff8803318503a0 4224 4232 2 0x00000000
[ 622.106315] ffff88060bda37d0 0000000000000046 ffff880331850000 ffff88060bda3fd8
[ 622.114453] ffff88060bda3fd8 0000000000012780 ffffffff817633f0 ffff880331850000
[ 622.122625] ffff88060bda37a0 ffff880333c12780 ffff880331850000 ffff8802c88aec00
[ 622.136105] Call Trace:
[ 622.138728] [] schedule+0x65/0x67
[ 622.144026] [] io_schedule+0x60/0x7a
[ 622.149601] [] get_request_wait+0xbd/0x166
[ 622.155723] [] ? cfq_merge+0x72/0xa1
[ 622.161295] [] ? abort_exclusive_wait+0x8f/0x8f
[ 622.167828] [] blk_queue_bio+0x193/0x2d6
[ 622.173722] [] generic_make_request+0x9c/0xdd
[ 622.179999] [] submit_bio+0xbb/0xd4
[ 622.185405] [] ? inc_zone_page_state+0x27/0x29
[ 622.191903] [] submit_bh+0xf6/0x116
[ 622.197283] [] __block_write_full_page+0x200/0x2fd
[ 622.204185] [] ? blkdev_get_blocks+0x93/0x93
[ 622.210460] [] ? drop_buffers+0x96/0x96
[ 622.216355] [] ? blkdev_get_blocks+0x93/0x93
[ 622.222542] [] ? drop_buffers+0x96/0x96
[ 622.228452] [] block_write_full_page_endio+0x89/0x95
[ 622.235593] [] block_write_full_page+0x15/0x17
[ 622.242034] [] blkdev_writepage+0x18/0x1a
[ 622.248060] [] __writepage+0x14/0x2d
[ 622.253732] [] ? page_index+0x1a/0x1a
[ 622.259396] [] write_cache_pages+0x22e/0x366
[ 622.265710] [] ? page_index+0x1a/0x1a
[ 622.271408] [] generic_writepages+0x3e/0x58
[ 622.277606] [] do_writepages+0x1e/0x2b
[ 622.283350] [] __writeback_single_inode.isra.31+0x4c/0x123
[ 622.290891] [] writeback_sb_inodes+0x1d3/0x310
[ 622.297388] [] __writeback_inodes_wb+0x74/0xb9
[ 622.303833] [] wb_writeback+0x136/0x26c
[ 622.309663] [] ? global_dirty_limits+0x2f/0x10e
[ 622.316218] [] wb_do_writeback+0x185/0x1bb
[ 622.322283] [] bdi_writeback_thread+0xa5/0x1ce
[ 622.328859] [] ? wb_do_writeback+0x1bb/0x1bb
[ 622.335219] [] kthread+0x8b/0x93
[ 622.340398] [] kernel_thread_helper+0x4/0x10
[ 622.346784] [] ? retint_restore_args+0x13/0x13
[ 622.353261] [] ? kthread_worker_fn+0x149/0x149
[ 622.359759] [] ? gs_change+0x13/0x13
[ 622.365390] INFO: task blkid:4280 blocked for more than 20 seconds.
[ 622.372076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 622.380445] blkid D ffff8806313f03a0 5720 4280 4221 0x00000000
[ 622.387989] ffff8804dcd37ac8 0000000000000082 ffff8806313f0000 ffff8804dcd37fd8
[ 622.396128] ffff8804dcd37fd8 0000000000012780 ffff880630e38000 ffff8806313f0000
[ 622.404338] ffffffff810608f2 ffff88062f038918 ffff8806313f0000 ffff88062f03891c
[ 622.412436] Call Trace:
[ 622.415116] [] ? need_resched+0x11/0x1d
[ 622.420913] [] schedule+0x65/0x67
[ 622.426238] [] schedule_preempt_disabled+0xe/0x10
[ 622.433118] [] __mutex_lock_common.isra.7+0x14a/0x166
[ 622.440210] [] __mutex_lock_slowpath+0x13/0x15
[ 622.446826] [] mutex_lock+0x18/0x29
[ 622.452350] [] __blkdev_get+0x9c/0x3da
[ 622.458013] [] ? blkdev_get+0x2ce/0x2ce
[ 622.463767] [] blkdev_get+0x189/0x2ce
[ 622.469289] [] ? find_get_page+0x4a/0x6a
[ 622.475095] [] ? __d_lookup_rcu+0xa2/0xc9
[ 622.480979] [] ? blkdev_get+0x2ce/0x2ce
[ 622.486742] [] blkdev_open+0x64/0x70
[ 622.492188] [] do_dentry_open.isra.17+0x16e/0x21d
[ 622.498880] [] nameidata_to_filp+0x42/0x84
[ 622.504952] [] do_last.isra.47+0x625/0x64b
[ 622.510996] [] path_openat+0xc5/0x2f4
[ 622.516569] [] do_filp_open+0x38/0x86
[ 622.522157] [] ? getname_flags+0x2a/0xa2
[ 622.528120] [] ? alloc_fd+0xe5/0xf7
[ 622.533606] [] do_sys_open+0x6e/0x102
[ 622.539253] [] sys_open+0x21/0x23
[ 622.544665] [] system_call_fastpath+0x16/0x1b
[ 642.511953] INFO: task flush-230:0:4232 blocked for more than 20 seconds.
[ 642.519226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 642.527644] flush-230:0 D ffff8803318503a0 4224 4232 2 0x00000000
[ 642.535269] ffff88060bda37d0 0000000000000046 ffff880331850000 ffff88060bda3fd8
[ 642.543417] ffff88060bda3fd8 0000000000012780 ffffffff817633f0 ffff880331850000
[ 642.551527] ffff88060bda37a0 ffff880333c12780 ffff880331850000 ffff8802c88aec00
[ 642.559693] Call Trace:
[ 642.562439] [] schedule+0x65/0x67
[ 642.567711] [] io_schedule+0x60/0x7a
[ 642.573354] [] get_request_wait+0xbd/0x166
[ 642.579492] [] ? cfq_merge+0x72/0xa1
[ 642.585158] [] ? abort_exclusive_wait+0x8f/0x8f
[ 642.591849] [] blk_queue_bio+0x193/0x2d6
[ 642.597774] [] generic_make_request+0x9c/0xdd
[ 642.604279] [] submit_bio+0xbb/0xd4
[ 642.609739] [] ? inc_zone_page_state+0x27/0x29
[ 642.616262] [] submit_bh+0xf6/0x116
[ 642.621806] [] __block_write_full_page+0x200/0x2fd
[ 642.628676] [] ? blkdev_get_blocks+0x93/0x93
[ 642.635033] [] ? drop_buffers+0x96/0x96
[ 642.640850] [] ? blkdev_get_blocks+0x93/0x93
[ 642.647224] [] ? drop_buffers+0x96/0x96
[ 642.653218] [] block_write_full_page_endio+0x89/0x95
[ 642.660238] [] block_write_full_page+0x15/0x17
[ 642.666827] [] blkdev_writepage+0x18/0x1a
[ 642.672929] [] __writepage+0x14/0x2d
[ 642.678471] [] ? page_index+0x1a/0x1a
[ 642.684215] [] write_cache_pages+0x22e/0x366
[ 642.690480] [] ? page_index+0x1a/0x1a
[ 642.696224] [] generic_writepages+0x3e/0x58
[ 642.702510] [] do_writepages+0x1e/0x2b
[ 642.708236] [] __writeback_single_inode.isra.31+0x4c/0x123
[ 642.715938] [] writeback_sb_inodes+0x1d3/0x310
[ 642.722505] [] __writeback_inodes_wb+0x74/0xb9
[ 642.728979] [] wb_writeback+0x136/0x26c
[ 642.734905] [] ? global_dirty_limits+0x2f/0x10e
[ 642.741572] [] wb_do_writeback+0x185/0x1bb
[ 642.747656] [] bdi_writeback_thread+0xa5/0x1ce
[ 642.754191] [] ? wb_do_writeback+0x1bb/0x1bb
[ 642.760487] [] kthread+0x8b/0x93
[ 642.765725] [] kernel_thread_helper+0x4/0x10
[ 642.772058] [] ? retint_restore_args+0x13/0x13
[ 642.778559] [] ? kthread_worker_fn+0x149/0x149
[ 642.784956] [] ? gs_change+0x13/0x13
[ 642.790429] INFO: task blkid:4280 blocked for more than 20 seconds.
[ 642.796923] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 642.805005] blkid D ffff8806313f03a0 5720 4280 4221 0x00000000
[ 642.812464] ffff8804dcd37ac8 0000000000000082 ffff8806313f0000 ffff8804dcd37fd8
[ 642.820356] ffff8804dcd37fd8 0000000000012780 ffff880630e38000 ffff8806313f0000
[ 642.828250] ffffffff810608f2 ffff88062f038918 ffff8806313f0000 ffff88062f03891c
[ 642.836148] Call Trace:
[ 642.838712] [] ? need_resched+0x11/0x1d
[ 642.844313] [] schedule+0x65/0x67
[ 642.849405] [] schedule_preempt_disabled+0xe/0x10
[ 642.855865] [] __mutex_lock_common.isra.7+0x14a/0x166
[ 642.862884] [] __mutex_lock_slowpath+0x13/0x15
[ 642.869304] [] mutex_lock+0x18/0x29
[ 642.874805] [] __blkdev_get+0x9c/0x3da
[ 642.880574] [] ? blkdev_get+0x2ce/0x2ce
[ 642.886406] [] blkdev_get+0x189/0x2ce
[ 642.892103] [] ? find_get_page+0x4a/0x6a
[ 642.898052] [] ? __d_lookup_rcu+0xa2/0xc9
[ 642.904063] [] ? blkdev_get+0x2ce/0x2ce
[ 642.909908] [] blkdev_open+0x64/0x70
[ 642.915489] [] do_dentry_open.isra.17+0x16e/0x21d
[ 642.922308] [] nameidata_to_filp+0x42/0x84
[ 642.928422] [] do_last.isra.47+0x625/0x64b
[ 642.934490] [] path_openat+0xc5/0x2f4
[ 642.940187] [] do_filp_open+0x38/0x86
[ 642.945894] [] ? getname_flags+0x2a/0xa2
[ 642.951825] [] ? alloc_fd+0xe5/0xf7
[ 642.957334] [] do_sys_open+0x6e/0x102
[ 642.962969] [] sys_open+0x21/0x23
[ 642.968315] [] system_call_fastpath+0x16/0x1b
The text was updated successfully, but these errors were encountered: