[WIP] storage: retry 1PC transactions on WriteTooOld errors #15863

petermattis · 2017-05-10T21:06:25Z

As @tschottdorf points out, this likely isn't safe. Seems like we'd want
to retry the 1PC at a higher level so that we go through the timestamp
cache again.

Fixes #15797

@tschottdorf

As @tschottdorf points out, this likely isn't safe. Seems like we'd want to retry the 1PC at a higher level so that we go through the timestamp cache again. Fixes cockroachdb#15797

cockroach-teamcity · 2017-05-10T21:06:31Z

This change is

tbg · 2017-07-17T15:49:30Z

Closing as not the right approach. PR is referenced from #15797, so we won't forget about it.

See cockroachdb#15797 for context. The main problem here are `WriteTooOldError`s, which lay down an intent. If this happens frequently, performance suffers dramatically since there is almost always an intent on the key, forcing everyone into conflict resolution (which is sure to lay down another intent thanks to `WriteTooOld`). With this change we don't lay down intents for this workload, but the 1PC transactions could be starved. We could do even better (making the hack in cockroachdb#15863 correct): If we knew that the 1PC transaction wasn't previously used for any reads (or it's SNAPSHOT), we could in theory commit it directly again from the Store (saving the roundtrip to the gateway/client). If it's SERIALIZABLE and there were previous reads, we can't simulate a restart internally and the client must do it properly. It also stands to reason that on a restart, the client shouldn't take the existing timestamp but simply use `hlc.Now()` which is less stale. Neither of the last two paragraphs are explored in this PR. Here, we should discuss whether it's OK to expose 1PC txns to starvation in the first place. ``` go run $GOPATH/src/github.com/cockroachdb/loadgen/kv/main.go --cycle-length 1 --concurrency $c --duration 30s // this PR, c=10 _elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end) 30.0s 0 14905 496.8 16.3 17.8 29.4 0 / 0 // this PR and master, c=1 _elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end) 30.0s 0 22204 740.1 1.6 3.5 15.7 0 / 0 // master, c=10 _elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end) 30.0s 0 3185 106.2 369.1 604.0 1811.9 0 / 0 ```

[WIP] storage: retry 1PC transactions on WriteTooOld errors

5701511

As @tschottdorf points out, this likely isn't safe. Seems like we'd want to retry the 1PC at a higher level so that we go through the timestamp cache again. Fixes cockroachdb#15797

bdarnell mentioned this pull request May 14, 2017

perf: investigate loadgen/kv slowdown #15797

Closed

tbg closed this Jul 17, 2017

tbg mentioned this pull request Jul 19, 2017

[DNM]: experiment for contending 1PC txns #17121

Closed

petermattis deleted the pmattis/1pc-write-too-old-retry branch August 3, 2017 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] storage: retry 1PC transactions on WriteTooOld errors #15863

[WIP] storage: retry 1PC transactions on WriteTooOld errors #15863

petermattis commented May 10, 2017

cockroach-teamcity commented May 10, 2017

tbg commented Jul 17, 2017

[WIP] storage: retry 1PC transactions on WriteTooOld errors #15863

[WIP] storage: retry 1PC transactions on WriteTooOld errors #15863

Conversation

petermattis commented May 10, 2017

cockroach-teamcity commented May 10, 2017

tbg commented Jul 17, 2017