Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation in distributed index testing post-access work #1423

Closed
adamcfraser opened this issue Jan 9, 2016 · 3 comments
Closed
Assignees
Milestone

Comments

@adamcfraser
Copy link
Collaborator

After making the changes to add access support and merging distributed index down to master, performance results degraded significantly.

It looks like this may be caused by a change in sequence hashing:

Pre-access expvars (for SG writer):

seqHashCacheCache_hits: 3.3M,
seqHashCache_misses: 330K,
seqHash_getHash_hits: 3.3M,
seqHash_getHash_misses: 268K,

Post-access expvars:

seqHashCacheCache_hits: 26K,
seqHashCache_misses: 833K,
seqHash_getHash_hits: 6K,
seqHash_getHash_misses: 854K,

Post-access we're very rarely getting cache or index matches on the hash values.

Investigating to determine whether this is a regression, or a correctness fix that's caused a performance degradation.

@adamcfraser
Copy link
Collaborator Author

The addition of the user doc to the clock (as a result of the access changes) would explain some additional initial cache misses. However, we'd expect this effect to disappear over time, as channeled docs get written to the same vbuckets as the user docs. This doesn't seem to be happening.

Running a simple gateload test against master locally, I'm getting the expected hit/miss ratios:

seqHashCacheCache_hits: 283927,
seqHashCache_misses: 30559,
seqHash_getHash_hits: 267923,
seqHash_getHash_misses: 46563,

@adamcfraser
Copy link
Collaborator Author

It looks like there are two main factors contributing to the performance degradation:

  1. Inclusion of the user record in the _changes response (required as part of access handling) results each user initially having different since clocks, even if they are in the same channels. This continues until the user docs aren't the most recent document in the vbucket for the channel.
  2. The different since clocks result in different sequence hashes - resulting in fewer hits on the hash cache and generally more work being done for each user's changes feed. The additional changes feed work is resulting in users not completing changes feed processing within a single polling window.
  3. Because changes processing is spanning more than one polling window, the user's feed isn't able to use the cached most recent poll results. This results in each user doing their own DB lookup for changes. This has a cascading effect - user changes feed processing is even slower, and more likely to span multiple polling windows.

Fix for this will be in three parts:

  1. Don't hash every sequence in a continuous changes response. Use the 'low' sequence approach to ensure that any sequence can still be used as a since value in a subsequent changes request. Will be done for 1.2, tracked with this ticket.
  2. Cache polling results for a longer period of time (i.e. over multiple polling windows), to avoid the cascading performance degradation when changes processing takes longer than one polling window. Probably a 1.3 fix, assuming that the previous fix addresses the issue for now.
  3. Expose the polling frequency as an index writer configuration option, to give users the ability to address these kind of performance issues.

@adamcfraser
Copy link
Collaborator Author

Latest fixes appear to have addressed the issue:

8 readers, 2 writers:

gateload_expvars

sync_gateway_machine_stats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant