Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible txg locking improvements #4333

Closed
wants to merge 4 commits into from
Closed

Possible txg locking improvements #4333

wants to merge 4 commits into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Feb 13, 2016

This contains a few ideas that I had when looking at code related to txg_quiesce().

  1. We can avoid some overhead by using atomic instructions on ->tc_count rather than protecting it with a mutex. We have fewer points in the code where a mutex would block us and use 1 less atomic instruction in each.
  2. We really do not need to hold ->tc_open_lock when processing a DMU transaction provided that we we have incremented ->tc_count, so we let it go early.
  3. Calling kpreempt_enable() defeats the purpose of using CPU_SEQID to index into a per-CPU structure because we are then preempted, allowing another thread to contend with us on the per-CPU structure we reference. This behavior was introduced by 15a9e03 as a means to work around scary warnings from Linux, but we can workaround them by mapping things to raw_smp_processor_id(), so lets do that to avoid unnecessary preemption.

@ryao ryao force-pushed the locking branch 6 times, most recently from 7c52fff to aafb5d6 Compare February 17, 2016 02:06
Using a mutex to protect increment and decrement operations is
inefficient. It uses two atomic instructions, will spin when a single
waiter is present (under Linux with typical configuration options) and
if two waiters occur, will actually deschedule them, which is far slower
than waiting. Using atomic instructions allows hardware to handle the
blocking, which is much faster than software.

Signed-off-by: Richard Yao <[email protected]>
There does not appear to be anything protected by ->tc_open_lock that
callers use while holding ->tc_open_lock means that anything that
preempts us would block on us when the only things that really need
protection are ->tx_open_txg and ->tc_count.

Signed-off-by: Richard Yao <[email protected]>
@ryao ryao force-pushed the locking branch 2 times, most recently from 16db03c to 70d70fe Compare February 17, 2016 21:20
@ryao
Copy link
Contributor Author

ryao commented Feb 17, 2016

@behlendorf Thinking some more about the "txg visibility code should not execute under tc_open_lock" patch some more, direct reclaim in the memory allocation in spa_txg_history_add() could deadlock when it tries to grab ->tc_open_lock in txg_hold_open() because the thread would likely already be holding it.

Moving it out of the critical section is an improvement, although I suspect it does not go far enough as blocking here means that the sync thread cannot get new transactions to sync. I suspect that the right thing to do might be to change the code to use KM_NOSLEEP and handle NULL return values by allowing the code to skip reporting.

ryao and others added 2 commits February 17, 2016 21:26
The memory allocation and locking in `spa_txg_history_*()` can
potentially block txg_hold_open for arbitrarily long periods of time.

Signed-off-by: Richard Yao <[email protected]>
…SEQID use in preemptible context"

This reverts commit 15a9e03 and commit
8878261.

Wrapping CPU_SEQID with kpreempt_disable() and kpreempt_enable() causes
us to call preempt_schedule(). This allows the thread to be rescheduled
on another CPU, allowing the vacated CPU to begin using the per-CPU
structure that we still reference on the CPU we just vacated. This is an
unnecessary source of contention that is a regression from the illumos
behavior. We can turn off the Linux warnings by calling
`raw_smp_processor_id()` instead of wrapping things with
kpreempt_disable() and kpreempt_enable(), so lets do that.

Requires-spl: refs/pull/532/head
Signed-off-by: Richard Yao <[email protected]>
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 28, 2016
The memory allocation and locking in `spa_txg_history_*()` can
potentially block txg_hold_open for arbitrarily long periods of time.

Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4333
@behlendorf
Copy link
Contributor

I've merged f26b4b3 which is straight forward but would prefer not merge the other changes until we have some evidence that they improve performance.

f26b4b3 txg visibility code should not execute under tc_open_lock

@behlendorf behlendorf closed this Jul 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants