Page fault in uiomove() causes deadlock #1689

ryao · 2013-09-02T03:35:42Z

I can reproducibly deadlock my desktop by running multiple instances of python -c 'print 2**100**100'. I did magic system request + t to get backtraces from my system's threads. In these backtraces, there are a few noticeable things.

Most threads are blocked in wait_for_discard(), which is done by swap operations.
txg_sync() and txg_quisce() are both waiting to be signaled.
A page fault occurred in uiomove() when we were holding a transaction, which then
blocked on page-in.

[ 1876.448014] Chrome_CacheThr D ffff88022fd92f80     0  4852   4288 0x00000000
[ 1876.448014]  ffff880094c4b848 0000000000000002 ffff88022601de40 ffff880094c4bfd8
[ 1876.448014]  ffff880094c4bfd8 ffff880094c4bfd8 ffff88022601de40 ffff88022fd937e8
[ 1876.448014]  ffff88022ffce458 0000000000000082 ffffffff810f1370 ffff880094c4b8c0
[ 1876.448014] Call Trace:
[ 1876.448014]  [<ffffffff810f1370>] ? sleep_on_page+0x10/0x10
[ 1876.448014]  [<ffffffff8144a2a8>] io_schedule+0x98/0x120
[ 1876.448014]  [<ffffffff810f1379>] sleep_on_page_killable+0x9/0x40
[ 1876.448014]  [<ffffffff814465d8>] __wait_on_bit+0x58/0x90
[ 1876.448014]  [<ffffffff810f3b6f>] wait_on_page_bit_killable+0x7f/0x90
[ 1876.448014]  [<ffffffff810656a0>] ? wake_atomic_t_function+0x30/0x30
[ 1876.448014]  [<ffffffff810f3c2a>] __lock_page_or_retry+0xaa/0xb0
[ 1876.448014]  [<ffffffff811166c8>] handle_pte_fault+0x6f8/0x740
[ 1876.448014]  [<ffffffffa02fd784>] ? dbuf_dirty+0x674/0xac0 [zfs]
[ 1876.448014]  [<ffffffff81117310>] handle_mm_fault+0x1f0/0x2e0
[ 1876.448014]  [<ffffffff8144e954>] __do_page_fault+0x154/0x5b0
[ 1876.448014]  [<ffffffffa0308b25>] ? dmu_objset_userquota_get_ids+0x35/0x470 [zfs]
[ 1876.448014]  [<ffffffff81448492>] ? __mutex_lock_slowpath+0x242/0x350
[ 1876.448014]  [<ffffffffa0308b25>] ? dmu_objset_userquota_get_ids+0x35/0x470 [zfs]
[ 1876.448014]  [<ffffffff8144edd7>] do_page_fault+0x27/0x50
[ 1876.448014]  [<ffffffff8144bde2>] page_fault+0x22/0x30
[ 1876.448014]  [<ffffffff81290030>] ? copy_user_generic_string+0x30/0x40
[ 1876.448014]  [<ffffffffa029147c>] ? uiomove+0x11c/0x130 [zcommon]
[ 1876.448014]  [<ffffffffa0304be0>] dmu_read+0x210/0x2d0 [zfs]
[ 1876.448014]  [<ffffffffa0305d24>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
[ 1876.448014]  [<ffffffffa038002d>] zfs_write+0xc9d/0xd50 [zfs]
[ 1876.448014]  [<ffffffffa0393b2d>] zpl_write_common+0x4d/0x110 [zfs]
[ 1876.448014]  [<ffffffffa0393bbb>] zpl_write_common+0xdb/0x110 [zfs]
[ 1876.448014]  [<ffffffff8113db38>] vfs_write+0xb8/0x1e0
[ 1876.448014]  [<ffffffff8113e6ca>] SyS_pwrite64+0x6a/0xa0
[ 1876.448014]  [<ffffffff8145168b>] tracesys+0xdd/0xe2

This could be fixed by reading from userland before we open the transaction.

The text was updated successfully, but these errors were encountered:

ryao · 2013-09-02T04:17:14Z

It looks like reading from userland before we open the transaction will require a fair amount of refactoring. Solaris avoided the refactoring by implementing a hack that prefaults the pages before opening the transaction. ZFSOnLinux has that code commented out. Pre-faulting the pages causes a race between zfs_write and pageout where zfs_write must read the pages before page out removes them. It is quite likely that zfs_write() always wins the race on Solaris, but direct reclaim likely makes it easier to lose the race on Linux than on Solaris.

Anyway, refactoring this to read from userland before opening the transaction is likely the right solution.

behlendorf · 2013-09-04T04:03:58Z

This looks like yet another good reason to abandon using the virtual address space. If these buffers were mapped just as pages and not as part of the virtual address space we'd never take a page fault. @ryao have you tried just pulling the uio_prefaultpages() call out of the HAVE_UIO_ZEROCOPY block. The function should be fully functional, it was just a victim of some over zealous #ifdef's.

ryao · 2013-09-04T04:20:43Z

@behlendorf I have not yet had a chance to examine the backtraces to verify that the same lockup did not happen upon testing that deadlocked, but I can say that I already tried that and the system deadlocked anyway.

What we have here a is kernel fault because a userland program's pages had been paged out. This is by design. We happen to be holding a lock that needs to be free to handle this fault, which what causes the problem. Our usage of the kernel virtual address space has no effect on this issue. In addition, we cannot wire down pages used in syscalls. I have been trying to get in touch with you to discuss this in IRC to discuss this.

…IO writes; Apply locking fixes from Illumos; Reintroduce uio_prefaultpages() to minimize the possibility of hitting openzfs/zfs#1689; Add Linux 3.12 Support (Portage version: 2.2.7/cvs/Linux x86_64, signed Manifest commit with key 0xBEE84C64)

behlendorf · 2016-03-25T20:18:45Z

Closing as stale.

ryao mentioned this issue Sep 2, 2013

0.6.1 on Debian Squeeze Crashing on high load #1418

Closed

ryao mentioned this issue Sep 7, 2013

ZFS random lockup #1701

Closed

behlendorf modified the milestones: 0.7.0, 0.6.4 Oct 16, 2014

behlendorf added Bug - Minor and removed Bug labels Oct 16, 2014

behlendorf closed this as completed Mar 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page fault in uiomove() causes deadlock #1689

Page fault in uiomove() causes deadlock #1689

ryao commented Sep 2, 2013

ryao commented Sep 2, 2013

behlendorf commented Sep 4, 2013

ryao commented Sep 4, 2013

behlendorf commented Mar 25, 2016

Page fault in uiomove() causes deadlock #1689

Page fault in uiomove() causes deadlock #1689

Comments

ryao commented Sep 2, 2013

ryao commented Sep 2, 2013

behlendorf commented Sep 4, 2013

ryao commented Sep 4, 2013

behlendorf commented Mar 25, 2016