Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page fault in uiomove() causes deadlock #1689

Closed
ryao opened this issue Sep 2, 2013 · 4 comments
Closed

Page fault in uiomove() causes deadlock #1689

ryao opened this issue Sep 2, 2013 · 4 comments
Milestone

Comments

@ryao
Copy link
Contributor

ryao commented Sep 2, 2013

I can reproducibly deadlock my desktop by running multiple instances of python -c 'print 2**100**100'. I did magic system request + t to get backtraces from my system's threads. In these backtraces, there are a few noticeable things.

  1. Most threads are blocked in wait_for_discard(), which is done by swap operations.
  2. txg_sync() and txg_quisce() are both waiting to be signaled.
  3. A page fault occurred in uiomove() when we were holding a transaction, which then
    blocked on page-in.
[ 1876.448014] Chrome_CacheThr D ffff88022fd92f80     0  4852   4288 0x00000000
[ 1876.448014]  ffff880094c4b848 0000000000000002 ffff88022601de40 ffff880094c4bfd8
[ 1876.448014]  ffff880094c4bfd8 ffff880094c4bfd8 ffff88022601de40 ffff88022fd937e8
[ 1876.448014]  ffff88022ffce458 0000000000000082 ffffffff810f1370 ffff880094c4b8c0
[ 1876.448014] Call Trace:
[ 1876.448014]  [<ffffffff810f1370>] ? sleep_on_page+0x10/0x10
[ 1876.448014]  [<ffffffff8144a2a8>] io_schedule+0x98/0x120
[ 1876.448014]  [<ffffffff810f1379>] sleep_on_page_killable+0x9/0x40
[ 1876.448014]  [<ffffffff814465d8>] __wait_on_bit+0x58/0x90
[ 1876.448014]  [<ffffffff810f3b6f>] wait_on_page_bit_killable+0x7f/0x90
[ 1876.448014]  [<ffffffff810656a0>] ? wake_atomic_t_function+0x30/0x30
[ 1876.448014]  [<ffffffff810f3c2a>] __lock_page_or_retry+0xaa/0xb0
[ 1876.448014]  [<ffffffff811166c8>] handle_pte_fault+0x6f8/0x740
[ 1876.448014]  [<ffffffffa02fd784>] ? dbuf_dirty+0x674/0xac0 [zfs]
[ 1876.448014]  [<ffffffff81117310>] handle_mm_fault+0x1f0/0x2e0
[ 1876.448014]  [<ffffffff8144e954>] __do_page_fault+0x154/0x5b0
[ 1876.448014]  [<ffffffffa0308b25>] ? dmu_objset_userquota_get_ids+0x35/0x470 [zfs]
[ 1876.448014]  [<ffffffff81448492>] ? __mutex_lock_slowpath+0x242/0x350
[ 1876.448014]  [<ffffffffa0308b25>] ? dmu_objset_userquota_get_ids+0x35/0x470 [zfs]
[ 1876.448014]  [<ffffffff8144edd7>] do_page_fault+0x27/0x50
[ 1876.448014]  [<ffffffff8144bde2>] page_fault+0x22/0x30
[ 1876.448014]  [<ffffffff81290030>] ? copy_user_generic_string+0x30/0x40
[ 1876.448014]  [<ffffffffa029147c>] ? uiomove+0x11c/0x130 [zcommon]
[ 1876.448014]  [<ffffffffa0304be0>] dmu_read+0x210/0x2d0 [zfs]
[ 1876.448014]  [<ffffffffa0305d24>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
[ 1876.448014]  [<ffffffffa038002d>] zfs_write+0xc9d/0xd50 [zfs]
[ 1876.448014]  [<ffffffffa0393b2d>] zpl_write_common+0x4d/0x110 [zfs]
[ 1876.448014]  [<ffffffffa0393bbb>] zpl_write_common+0xdb/0x110 [zfs]
[ 1876.448014]  [<ffffffff8113db38>] vfs_write+0xb8/0x1e0
[ 1876.448014]  [<ffffffff8113e6ca>] SyS_pwrite64+0x6a/0xa0
[ 1876.448014]  [<ffffffff8145168b>] tracesys+0xdd/0xe2

This could be fixed by reading from userland before we open the transaction.

@ryao
Copy link
Contributor Author

ryao commented Sep 2, 2013

It looks like reading from userland before we open the transaction will require a fair amount of refactoring. Solaris avoided the refactoring by implementing a hack that prefaults the pages before opening the transaction. ZFSOnLinux has that code commented out. Pre-faulting the pages causes a race between zfs_write and pageout where zfs_write must read the pages before page out removes them. It is quite likely that zfs_write() always wins the race on Solaris, but direct reclaim likely makes it easier to lose the race on Linux than on Solaris.

Anyway, refactoring this to read from userland before opening the transaction is likely the right solution.

@behlendorf
Copy link
Contributor

This looks like yet another good reason to abandon using the virtual address space. If these buffers were mapped just as pages and not as part of the virtual address space we'd never take a page fault. @ryao have you tried just pulling the uio_prefaultpages() call out of the HAVE_UIO_ZEROCOPY block. The function should be fully functional, it was just a victim of some over zealous #ifdef's.

@ryao
Copy link
Contributor Author

ryao commented Sep 4, 2013

@behlendorf I have not yet had a chance to examine the backtraces to verify that the same lockup did not happen upon testing that deadlocked, but I can say that I already tried that and the system deadlocked anyway.

What we have here a is kernel fault because a userland program's pages had been paged out. This is by design. We happen to be holding a lock that needs to be free to handle this fault, which what causes the problem. Our usage of the kernel virtual address space has no effect on this issue. In addition, we cannot wire down pages used in syscalls. I have been trying to get in touch with you to discuss this in IRC to discuss this.

@ryao ryao mentioned this issue Sep 7, 2013
minaco2 pushed a commit to git-portage/git-portage that referenced this issue Nov 21, 2013
…IO writes; Apply locking fixes from Illumos; Reintroduce uio_prefaultpages() to minimize the possibility of hitting openzfs/zfs#1689; Add Linux 3.12 Support

(Portage version: 2.2.7/cvs/Linux x86_64, signed Manifest commit with key 0xBEE84C64)
@behlendorf behlendorf modified the milestones: 0.7.0, 0.6.4 Oct 16, 2014
@behlendorf
Copy link
Contributor

Closing as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants