Skip to content

Commit

Permalink
[DNM]: experiment for contending 1PC txns
Browse files Browse the repository at this point in the history
See cockroachdb#15797 for context.

The main problem here are `WriteTooOldError`s, which lay down an intent. If
this happens frequently, performance suffers dramatically since there is almost
always an intent on the key, forcing everyone into conflict resolution (which
is sure to lay down another intent thanks to `WriteTooOld`).

With this change we don't lay down intents for this workload, but the 1PC
transactions could be starved.

We could do even better (making the hack in cockroachdb#15863 correct): If we knew that
the 1PC transaction wasn't previously used for any reads (or it's SNAPSHOT), we
could in theory commit it directly again from the Store (saving the roundtrip
to the gateway/client). If it's SERIALIZABLE and there were previous reads, we
can't simulate a restart internally and the client must do it properly.

It also stands to reason that on a restart, the client shouldn't take the
existing timestamp but simply use `hlc.Now()` which is less stale.

Neither of the last two paragraphs are explored in this PR. Here, we should
discuss whether it's OK to expose 1PC txns to starvation in the first place.

```
go run $GOPATH/src/github.com/cockroachdb/loadgen/kv/main.go --cycle-length 1 --concurrency $c --duration 30s
// this PR, c=10
_elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end)
   30.0s        0          14905          496.8     16.3     17.8     29.4              0 / 0
// this PR and master, c=1
_elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end)
   30.0s        0          22204          740.1      1.6      3.5     15.7              0 / 0
// master, c=10
_elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end)
   30.0s        0           3185          106.2    369.1    604.0   1811.9              0 / 0
```
  • Loading branch information
tbg committed Jul 19, 2017
1 parent 0d083fa commit c224e60
Showing 1 changed file with 20 additions and 1 deletion.
21 changes: 20 additions & 1 deletion pkg/storage/replica.go
Original file line number Diff line number Diff line change
Expand Up @@ -4563,7 +4563,22 @@ func (r *Replica) evaluateTxnWriteBatch(
}
rec := ReplicaEvalContext{r, spans}
br, result, pErr := evaluateBatch(ctx, idKey, batch, rec, &ms, strippedBa)
if pErr == nil && ba.Timestamp == br.Timestamp {
if pErr != nil {
log.Event(ctx, pErr.String())
if wtoErr, ok := pErr.GetDetail().(*roachpb.WriteTooOldError); ok {
rTxn := ba.Txn.Clone()
rTxn.Timestamp.Forward(wtoErr.ActualTimestamp)
log.Eventf(ctx, "eager WriteTooOld restart from %s to %s", ba.Txn.Timestamp, rTxn.Timestamp)

rErr := roachpb.NewError(roachpb.NewTransactionRetryError(roachpb.RETRY_SERIALIZABLE))
rErr.SetTxn(&rTxn)

// Don't actually lay down intents.
batch.Close()
batch = r.store.Engine().NewBatch()
return batch, enginepb.MVCCStats{}, nil, EvalResult{}, rErr
}
} else if ba.Timestamp == br.Timestamp {
clonedTxn := ba.Txn.Clone()
clonedTxn.Writing = true
clonedTxn.Status = roachpb.COMMITTED
Expand Down Expand Up @@ -4609,6 +4624,10 @@ func (r *Replica) evaluateTxnWriteBatch(
log.VEventf(ctx, 2, "1PC execution failed, reverting to regular execution for batch")
}

if ba.Summary() == "1 Put, 1 BeginTxn, 1 EndTxn" {
log.Fatalf(ctx, "Not using 1PC path: %t %s", isOnePhaseCommit(ba), ba.Txn)
}

batch := r.store.Engine().NewBatch()
if util.RaceEnabled && spans != nil {
batch = makeSpanSetBatch(batch, spans)
Expand Down

0 comments on commit c224e60

Please sign in to comment.