Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] - Remove redundant replica doc parsing on writes. #7279

Merged
merged 1 commit into from
Apr 24, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Apr 23, 2023

Description

This change removes unnecessary doc parsing currently performed on replicas.
Updates applyIndexOperationOnReplicas and applyDeleteOperationOnReplica to pass a doc id from the primary and construct a ParsedDocument manually with only the fields required to write to translog from NRTReplicationEngine.

The following benchmark was run on a 3 node cluster, SO dataset 60 shards 1 replica. Contender is with this patch.

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                        Metric |                     Task |    Baseline |   Contender |     Diff |   Unit |
|--------------------------------------------------------------:|-------------------------:|------------:|------------:|---------:|-------:|
|                    Cumulative indexing time of primary shards |                          |     122.171 |     128.365 |  6.19367 |    min |
|             Min cumulative indexing time across primary shard |                          |      1.6275 |     1.86793 |  0.24043 |    min |
|          Median cumulative indexing time across primary shard |                          |     2.04355 |     2.16063 |  0.11708 |    min |
|             Max cumulative indexing time across primary shard |                          |     2.33748 |     2.41813 |  0.08065 |    min |
|           Cumulative indexing throttle time of primary shards |                          |           0 |           0 |        0 |    min |
|    Min cumulative indexing throttle time across primary shard |                          |           0 |           0 |        0 |    min |
| Median cumulative indexing throttle time across primary shard |                          |           0 |           0 |        0 |    min |
|    Max cumulative indexing throttle time across primary shard |                          |           0 |           0 |        0 |    min |
|                       Cumulative merge time of primary shards |                          |     66.6873 |     64.0442 | -2.64312 |    min |
|                      Cumulative merge count of primary shards |                          |        1861 |        1871 |       10 |        |
|                Min cumulative merge time across primary shard |                          |    0.858683 |    0.847383 |  -0.0113 |    min |
|             Median cumulative merge time across primary shard |                          |     1.08508 |     1.06556 | -0.01953 |    min |
|                Max cumulative merge time across primary shard |                          |     1.46848 |     1.34543 | -0.12305 |    min |
|              Cumulative merge throttle time of primary shards |                          |     15.7166 |      14.726 | -0.99063 |    min |
|       Min cumulative merge throttle time across primary shard |                          |     0.14765 |    0.105083 | -0.04257 |    min |
|    Median cumulative merge throttle time across primary shard |                          |    0.237058 |    0.246575 |  0.00952 |    min |
|       Max cumulative merge throttle time across primary shard |                          |    0.506233 |     0.41145 | -0.09478 |    min |
|                     Cumulative refresh time of primary shards |                          |     38.6064 |      38.211 | -0.39538 |    min |
|                    Cumulative refresh count of primary shards |                          |       13308 |       12796 |     -512 |        |
|              Min cumulative refresh time across primary shard |                          |    0.538567 |    0.553567 |    0.015 |    min |
|           Median cumulative refresh time across primary shard |                          |    0.648858 |    0.625217 | -0.02364 |    min |
|              Max cumulative refresh time across primary shard |                          |    0.706917 |      0.7212 |  0.01428 |    min |
|                       Cumulative flush time of primary shards |                          |     1.25247 |     1.15738 | -0.09508 |    min |
|                      Cumulative flush count of primary shards |                          |         118 |         120 |        2 |        |
|                Min cumulative flush time across primary shard |                          | 3.33333e-05 |      0.0035 |  0.00347 |    min |
|             Median cumulative flush time across primary shard |                          |      0.0131 |   0.0179833 |  0.00488 |    min |
|                Max cumulative flush time across primary shard |                          |   0.0817667 |   0.0505333 | -0.03123 |    min |
|                                       Total Young Gen GC time |                          |     118.824 |     114.978 |   -3.846 |      s |
|                                      Total Young Gen GC count |                          |       13779 |       13021 |     -758 |        |
|                                         Total Old Gen GC time |                          |           0 |           0 |        0 |      s |
|                                        Total Old Gen GC count |                          |           0 |           0 |        0 |        |
|                                                    Store size |                          |     64.2993 |     65.0226 |   0.7233 |     GB |
|                                                 Translog size |                          |     30.3503 |     31.5609 |  1.21064 |     GB |
|                                        Heap used for segments |                          |           0 |           0 |        0 |     MB |
|                                      Heap used for doc values |                          |           0 |           0 |        0 |     MB |
|                                           Heap used for terms |                          |           0 |           0 |        0 |     MB |
|                                           Heap used for norms |                          |           0 |           0 |        0 |     MB |
|                                          Heap used for points |                          |           0 |           0 |        0 |     MB |
|                                   Heap used for stored fields |                          |           0 |           0 |        0 |     MB |
|                                                 Segment count |                          |        1046 |        1081 |       35 |        |
|                                                Min Throughput |             index-append |     36218.1 |     37977.5 |  1759.33 | docs/s |
|                                               Mean Throughput |             index-append |     38850.3 |     40693.5 |   1843.2 | docs/s |
|                                             Median Throughput |             index-append |     39062.1 |     40889.4 |   1827.3 | docs/s |
|                                                Max Throughput |             index-append |     41627.8 |     44211.4 |  2583.61 | docs/s |
|                                       50th percentile latency |             index-append |     824.462 |     782.951 | -41.5111 |     ms |
|                                       90th percentile latency |             index-append |     1759.59 |     1749.22 | -10.3688 |     ms |
|                                       99th percentile latency |             index-append |     2583.54 |     2550.07 | -33.4742 |     ms |
|                                     99.9th percentile latency |             index-append |     2862.65 |     2923.76 |  61.1034 |     ms |
|                                      100th percentile latency |             index-append |     3348.12 |      3202.7 | -145.422 |     ms |
|                                  50th percentile service time |             index-append |     824.462 |     782.951 | -41.5111 |     ms |
|                                  90th percentile service time |             index-append |     1759.59 |     1749.22 | -10.3688 |     ms |
|                                  99th percentile service time |             index-append |     2583.54 |     2550.07 | -33.4742 |     ms |
|                                99.9th percentile service time |             index-append |     2862.65 |     2923.76 |  61.1034 |     ms |
|                                 100th percentile service time |             index-append |     3348.12 |      3202.7 | -145.422 |     ms |
|                                                    error rate |             index-append |     33.9717 |     27.5655 | -6.40622 |      % |
|                                                Min Throughput | wait-until-merges-finish |   0.0865927 |     4.94769 |   4.8611 |  ops/s |
|                                               Mean Throughput | wait-until-merges-finish |   0.0865927 |     4.94769 |   4.8611 |  ops/s |
|                                             Median Throughput | wait-until-merges-finish |   0.0865927 |     4.94769 |   4.8611 |  ops/s |
|                                                Max Throughput | wait-until-merges-finish |   0.0865927 |     4.94769 |   4.8611 |  ops/s |
|                                      100th percentile latency | wait-until-merges-finish |     11547.9 |     201.474 | -11346.4 |     ms |
|                                 100th percentile service time | wait-until-merges-finish |     11547.9 |     201.474 | -11346.4 |     ms |
|                                                    error rate | wait-until-merges-finish |           0 |           0 |        0 |      % |

Issues Resolved

closes #7164

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.

Signed-off-by: Marc Handalian <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@codecov-commenter
Copy link

Codecov Report

Merging #7279 (2e2ba83) into main (115bb30) will decrease coverage by 0.06%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #7279      +/-   ##
============================================
- Coverage     70.58%   70.53%   -0.06%     
+ Complexity    59503    59475      -28     
============================================
  Files          4862     4862              
  Lines        285531   285544      +13     
  Branches      41153    41153              
============================================
- Hits         201555   201414     -141     
- Misses        67322    67560     +238     
+ Partials      16654    16570      -84     
Impacted Files Coverage Δ
...ensearch/action/bulk/TransportShardBulkAction.java 76.76% <100.00%> (+0.36%) ⬆️
...in/java/org/opensearch/index/shard/IndexShard.java 69.66% <100.00%> (-0.04%) ⬇️
...org/opensearch/index/shard/IndexShardTestCase.java 93.84% <100.00%> (+0.01%) ⬆️

... and 468 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice throughput boost

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 24, 2023
…#7279)

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit 66e49a6)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 24, 2023
…#7279)

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit 66e49a6)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
nknize pushed a commit that referenced this pull request Apr 24, 2023
…#7279) (#7282)

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.


(cherry picked from commit 66e49a6)

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
nknize pushed a commit that referenced this pull request Apr 24, 2023
…#7279) (#7281)

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.


(cherry picked from commit 66e49a6)

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Apr 28, 2023
…opensearch-project#7279)

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.

Signed-off-by: Marc Handalian <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…opensearch-project#7279)

This change removes unnecessary doc parsing currently performed on replicas by
updating applyIndexOperationOnReplicas to pass a doc id from the primary.

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Segment Replication] Reduce/remove document parsing on replicas
3 participants