supply local time to InternalPushTxn when initiating the request #1102

tbg · 2015-05-21T22:20:19Z

previously args.Timestamp was used, but there are situations
in which that never changes, meaning that infinite retry loops
on an abandoned Txn descriptor would ensue in storage/store,
for instance in #877 and the bank example.

This needs a test. Should be fairly straightforward, so if anybody wants to help out the debugging bonanza, that would be appreciated (but probably want to wait for feedback to see if this would actually land).

After this change, the following hums along relatively nicely on a three node cluster:

/bank --num-accounts=2 --num-parallel-transfers=2 --db-name=http://root@localhost:8001

Every couple of minutes one of the writers' transactions will still deadlock, but times out after
10 seconds (after which process resumes). When that happens, the busy retry loop will run against it
approximately 14000 times until the 10s are over (yeah, it's really busy - 0s backoff, I think). I've checked all the pushes in such an example, and none of them go through since the priority is pretty high:

# count, pusher prio, pushee prio
6297 21781967 < 2132569132
   1 271073717 < 2132569132
3816 1372547470 < 2132569132
3815 2132569131 < 2132569132

So we only have to figure out why the occasional transaction in the bank example doesn't finish - it clearly seems to be pretty high priority wise. I've checked the logs, it doesn't seem to ever have unsuccessfully pushed someone during my run:

6297 21781967 < 2132569132
   1 271073717 < 2132569132
3816 1372547470 < 2132569132
3815 2132569131 < 2132569132

bdarnell · 2015-05-22T14:02:31Z

storage/range_command.go

-	expiry := args.Timestamp
+	// Compute heartbeat expiration (all replicas must see the same result).
+	expiry := args.Now             // caller can set this to his wall time.
+	expiry.Forward(args.Timestamp) // if Now is not set, fallback


Is there any reason to send a push txn request without setting Now? It might be simpler to require that Now be set in all cases. (On the other hand, this does ensure that we use max(Now, Timestamp) if there is any risk that the pusher's clock could be behind)

bdarnell · 2015-05-22T14:10:03Z

LGTM. Do we also need to set Now when pushing from gc_queue.go? (which looks like the only other non-test use of InternalPushTxnRequest)

previously args.Timestamp was used, but there are situations in which that never changes, meaning that infinite retry loops on an abandoned Txn descriptor would ensue in storage/store, for instance in cockroachdb#877 and the bank example.

tbg · 2015-05-26T19:30:27Z

I've made Now mandatory and updated the code. PTAL.

bdarnell · 2015-05-26T19:38:42Z

storage/range_test.go

-		{nil, 0, proto.PUSH_TIMESTAMP, false},
-		{nil, 0, proto.ABORT_TXN, false},
-		{nil, 0, proto.CLEANUP_TXN, false},
+		{nil, 1, proto.PUSH_TIMESTAMP, false}, // using 0 is awkward


This comment won't make sense in the future: using 0 for what?

added to the comment.

bdarnell · 2015-05-26T19:39:08Z

LGTM

supply local time to InternalPushTxn when initiating the request

This commit deprecates PushTxnRequest.Now and gives its responsibility to the batch header timestamp. The biggest reason to do this is so that PushTxn requests properly update their receiver's clock. This is critical because a PushTxn request can result in a timestamp cache entry to be created with a value up to this time, so for safety, we need to ensure that the leaseholder updates its clock at least to this time _before_ evaluating the request. Otherwise, a lease transfer could miss the request's effect on the timestamp cache and result in a lost push/abort. The comment on PushTxnRequest.Now mentioned that the header timestamp couldn't be used because the header's timestamp "does not necessarily advance with the node clock across retries and hence cannot detect abandoned transactions." This dates back all the way to cockroachdb#1102. I haven't been able to piece together what kind of retries this is referring to, but I'm almost positive that they don't still apply. Release note: None

raft: add a one node bench

tbg added the PTAL label May 21, 2015

bdarnell reviewed May 22, 2015
View reviewed changes

tbg added 2 commits May 26, 2015 14:51

supply local time with InternalPushTxn

8132eaf

previously args.Timestamp was used, but there are situations in which that never changes, meaning that infinite retry loops on an abandoned Txn descriptor would ensue in storage/store, for instance in cockroachdb#877 and the bank example.

make the InternalPushTxnRequest.Now field required

e9a9ca1

tbg force-pushed the push_expired branch from 3ffff4e to e9a9ca1 Compare May 26, 2015 19:26

bdarnell reviewed May 26, 2015
View reviewed changes

feedback by @bdarnell

36de048

tbg force-pushed the push_expired branch from c35901b to 36de048 Compare May 26, 2015 20:00

tbg added a commit that referenced this pull request May 26, 2015

Merge pull request #1102 from tschottdorf/push_expired

c562014

supply local time to InternalPushTxn when initiating the request

tbg merged commit c562014 into cockroachdb:master May 26, 2015

tbg removed the PTAL label May 26, 2015

tbg deleted the push_expired branch May 26, 2015 20:14

pav-kv pushed a commit to pav-kv/cockroach that referenced this pull request Mar 5, 2024

Merge pull request cockroachdb#1102 from coreos/node_bench

db7e272

raft: add a one node bench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supply local time to InternalPushTxn when initiating the request #1102

supply local time to InternalPushTxn when initiating the request #1102

tbg commented May 21, 2015

bdarnell May 22, 2015

bdarnell commented May 22, 2015

tbg commented May 26, 2015

bdarnell May 26, 2015

tbg May 26, 2015

bdarnell commented May 26, 2015

supply local time to InternalPushTxn when initiating the request #1102

supply local time to InternalPushTxn when initiating the request #1102

Conversation

tbg commented May 21, 2015

bdarnell May 22, 2015

Choose a reason for hiding this comment

bdarnell commented May 22, 2015

tbg commented May 26, 2015

bdarnell May 26, 2015

Choose a reason for hiding this comment

tbg May 26, 2015

Choose a reason for hiding this comment

bdarnell commented May 26, 2015