Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel deadlock when zfs recv is interrupted by a signal #785

Open
jesboat opened this issue Feb 8, 2021 · 2 comments
Open

kernel deadlock when zfs recv is interrupted by a signal #785

jesboat opened this issue Feb 8, 2021 · 2 comments

Comments

@jesboat
Copy link

jesboat commented Feb 8, 2021

The user thread doing the ioctl for zfs recv eventually gets (on the kernel side) to dmu_recv_stream which calls bqueue_enqueue for each block. bqueue_enqueue has:

	while (q->bq_size + item_size > q->bq_maxsize) {
		cv_wait_sig(&q->bq_add_cv, &q->bq_lock);
	}

cv_wait_sig is defined by (debug ifdefs elided)

#define cv_wait_sig(cvp, mp)        \
	spl_cv_wait((cvp), (mp), PRIBIO|PCATCH, #cvp)

void
spl_cv_wait(kcondvar_t *cvp, kmutex_t *mp, int flags, const char *msg)
{
    if (msg != NULL && msg[0] == '&')
        ++msg;  /* skip over '&' prefixes */

	mp->m_owner = NULL;
    (void) msleep(cvp, (lck_mtx_t *)&mp->m_lock, flags, msg, 0);
    mp->m_owner = current_thread();
}

This percolates down through xnu until it reaches lck_mtx_sleep (e.g. https://opensource.apple.com/source/xnu/xnu-2050.18.24/osfmk/kern/locks.c.auto.html) which does

	res = assert_wait(event, interruptible);
	if (res == THREAD_WAITING) {
		lck_mtx_unlock(lck);
		res = thread_block(THREAD_CONTINUE_NULL);
		if (!(lck_sleep_action & LCK_SLEEP_UNLOCK)) {
			if ((lck_sleep_action & LCK_SLEEP_SPIN))
				lck_mtx_lock_spin(lck);
			else
				lck_mtx_lock(lck);
		}
	}
	else
	if (lck_sleep_action & LCK_SLEEP_UNLOCK)
		lck_mtx_unlock(lck);

	KERNEL_DEBUG(MACHDBG_CODE(DBG_MACH_LOCKS, LCK_MTX_SLEEP_CODE) | DBG_FUNC_END, (int)res, 0, 0, 0, 0);

	return res;

If the current thread has already received a signal, then assert_wait returns THREAD_INTERRUPTED, lck_mtx_sleep never releases the mutex and returns, eventually all the way back to bqueue_enqueue's call to cv_wait_sig. But, since the mutex was never released, no bqueue_dequeue calls will have been able to make progress, and bqueue_enqueue will repeat this process forever.

I think the correct fix here would be to modify bqueue_enqueue (and bqueue_dequeue, which has the same problem) to use the uninterruptible versions; since there's nothing they can do (without changing their API) except retry, they might as well use the uninterruptible versions.

I have not looked to see if there are other uses of cv_wait_interruptible/cv_timedwait_interruptible/cv_wait_sig/cv_timedwait_sig which have issues.

@lundman
Copy link
Contributor

lundman commented Feb 9, 2021

Thanks for the issue and the details of it. We pass PCATCH to msleep() so we can detect the user hitting ^C for the signal, and have the kernel parts undo. I don't recall specifically which part detects that, but I don't think it was bqueue_dequeue(), so potentially what you suggest could work.

Potentially we could also detect THREAD_INTERRUPTED and do a little manual mutex release for progress.

@lundman
Copy link
Contributor

lundman commented Mar 12, 2021

OK, so yes, you can definitely just change it to cv_wait() (without the _sig). But presumably upstream picked the _sig for a reason relevant to their platform.

I can also add, at the end of spl_cv_wait():

+    if (result == EINTR &&
+        (mp->m_waiters > 0 || mp->m_sleepers > 0)) {
+        printf("avoiding hang\n");
+        mutex_exit(mp);
+        (void) thread_block(THREAD_CONTINUE_NULL);
+        mutex_enter(mp);
+    }

    /*
     * 1 - condvar got cv_signal()/cv_broadcast()
     * 0 - received signal (kill -signal)
     */
    return (result == EINTR ? 0 : 1);

I added waiters, and sleepers just to avoid doing it everytime. Now to decide which fix to go with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants