Performance degradation in distributed index testing post-access work #1423

adamcfraser · 2016-01-09T00:08:48Z

After making the changes to add access support and merging distributed index down to master, performance results degraded significantly.

It looks like this may be caused by a change in sequence hashing:

Pre-access expvars (for SG writer):

seqHashCacheCache_hits: 3.3M,
seqHashCache_misses: 330K,
seqHash_getHash_hits: 3.3M,
seqHash_getHash_misses: 268K,

Post-access expvars:

seqHashCacheCache_hits: 26K,
seqHashCache_misses: 833K,
seqHash_getHash_hits: 6K,
seqHash_getHash_misses: 854K,

Post-access we're very rarely getting cache or index matches on the hash values.

Investigating to determine whether this is a regression, or a correctness fix that's caused a performance degradation.

The text was updated successfully, but these errors were encountered:

adamcfraser · 2016-01-09T00:52:18Z

The addition of the user doc to the clock (as a result of the access changes) would explain some additional initial cache misses. However, we'd expect this effect to disappear over time, as channeled docs get written to the same vbuckets as the user docs. This doesn't seem to be happening.

Running a simple gateload test against master locally, I'm getting the expected hit/miss ratios:

seqHashCacheCache_hits: 283927,
seqHashCache_misses: 30559,
seqHash_getHash_hits: 267923,
seqHash_getHash_misses: 46563,

adamcfraser · 2016-01-11T17:14:52Z

It looks like there are two main factors contributing to the performance degradation:

Inclusion of the user record in the _changes response (required as part of access handling) results each user initially having different since clocks, even if they are in the same channels. This continues until the user docs aren't the most recent document in the vbucket for the channel.
The different since clocks result in different sequence hashes - resulting in fewer hits on the hash cache and generally more work being done for each user's changes feed. The additional changes feed work is resulting in users not completing changes feed processing within a single polling window.
Because changes processing is spanning more than one polling window, the user's feed isn't able to use the cached most recent poll results. This results in each user doing their own DB lookup for changes. This has a cascading effect - user changes feed processing is even slower, and more likely to span multiple polling windows.

Fix for this will be in three parts:

Don't hash every sequence in a continuous changes response. Use the 'low' sequence approach to ensure that any sequence can still be used as a since value in a subsequent changes request. Will be done for 1.2, tracked with this ticket.
Cache polling results for a longer period of time (i.e. over multiple polling windows), to avoid the cascading performance degradation when changes processing takes longer than one polling window. Probably a 1.3 fix, assuming that the previous fix addresses the issue for now.
Expose the polling frequency as an index writer configuration option, to give users the ability to address these kind of performance issues.

adamcfraser · 2016-01-12T10:59:42Z

Latest fixes appear to have addressed the issue:

8 readers, 2 writers:

adamcfraser added bug performance release-prep labels Jan 9, 2016

adamcfraser self-assigned this Jan 9, 2016

adamcfraser modified the milestones: 1.3, 1.2.0 Jan 9, 2016

adamcfraser mentioned this issue Jan 9, 2016

Index writer panics under heavy load #1405

Closed

adamcfraser added the distributed-index label Jan 9, 2016

adamcfraser mentioned this issue Jan 12, 2016

Reduce frequency of sequence hashing in changes response #1439

Merged

tleyden closed this as completed in #1439 Jan 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation in distributed index testing post-access work #1423

Performance degradation in distributed index testing post-access work #1423

adamcfraser commented Jan 9, 2016

adamcfraser commented Jan 9, 2016

adamcfraser commented Jan 11, 2016

adamcfraser commented Jan 12, 2016

Performance degradation in distributed index testing post-access work #1423

Performance degradation in distributed index testing post-access work #1423

Comments

adamcfraser commented Jan 9, 2016

adamcfraser commented Jan 9, 2016

adamcfraser commented Jan 11, 2016

adamcfraser commented Jan 12, 2016