Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard clocks to avoid write bottlenecks #1243

Closed
adamcfraser opened this issue Oct 27, 2015 · 1 comment
Closed

Shard clocks to avoid write bottlenecks #1243

adamcfraser opened this issue Oct 27, 2015 · 1 comment
Assignees
Milestone

Comments

@adamcfraser
Copy link
Collaborator

Based on latest perf test runs against distributed_index_1140, between 55% and 60% of batch processing time is spent updating channel clocks and the stable sequence clock:

image

The cas retry rate is also high for clock updates - for 8195 attempted updates of the stable sequence, there were 6991 retries.

Sharding the channel clocks will improve write throughput, but with some cost on the read side to rebuild the clocks. Based on the current write contention (which are based on only three SG nodes), we should see an overall benefit when the clocks are sharded.

@adamcfraser
Copy link
Collaborator Author

Completed for stable clock on feature/distributed_index_1243.

With the change, P95 TimeToSubscriberInteractive for 20 minute 1K/1K run is reduced from ~19s to ~11s.

Based on expvars, the average time taken to update the stable clock doesn't seem to have been reduced significantly, but the variance has been brought down, as we avoid the intermittent long CAS retry loops. This seems reasonable - we're updating multiple kv entries for the clock now (instead of one), so the average time without contention is going to be higher. With high write contention, though, updating multiple docs becomes more efficient.

We should see even more benefit as we scale out the number of SG writer nodes. Each node is updating fewer clock shards, and we won't be impacted by the increased writer contention.

Putting up a PR for this change, and then will file a new ticket to evaluate uptake by the channel clocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant