Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Add Remote Store backpressure rejection stats to _nodes/stats #10524

Merged
merged 13 commits into from
Oct 14, 2023

Conversation

BhumikaSaini-Amazon
Copy link
Contributor

@BhumikaSaini-Amazon BhumikaSaini-Amazon commented Oct 10, 2023

Description

Add Remote Store backpressure rejection stats to _nodes/stats

Related Issues

#10501

#10365

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Storage:Remote labels Oct 10, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 10, 2023

Compatibility status:

Checks if related components are compatible with change cdbe14d

Incompatible components

Incompatible components: [https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@BhumikaSaini-Amazon
Copy link
Contributor Author

BhumikaSaini-Amazon commented Oct 10, 2023

@gbbafna instead of keeping the blobs of the new metrics at the top-level, should we add them under the indices.segments blob instead? A sample response from _nodes/stats API would then look like (sharing only the segments blob):

        "segments" : {
          "count" : 0,
          "memory_in_bytes" : 0,
          "terms_memory_in_bytes" : 0,
          "stored_fields_memory_in_bytes" : 0,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 0,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 0,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -9223372036854775808,
          "remote_store" : {
            "upload" : {
              "total_upload_size" : {
                "started_bytes" : 0,
                "succeeded_bytes" : 0,
                "failed_bytes" : 0
              },
              "refresh_size_lag" : {
                "total_bytes" : 0,
                "max_bytes" : 0
              },
              "max_refresh_time_lag_in_millis" : 0,
              "total_time_spent_in_millis" : 0,
              "pressure" : {
                "total_rejections" : <count>,
              }
            },
            "download" : {
              "total_download_size" : {
                "started_bytes" : 0,
                "succeeded_bytes" : 0,
                "failed_bytes" : 0
              },
              "total_time_spent_in_millis" : 0
            }
          },
          "segment_replication" : {
            "max_bytes_behind" : "0b",
            "total_bytes_behind" : "0b",
            "max_replication_lag" : "0s",
            "pressure" : {
              "total_rejections" : <count>,
            },
          },
          "file_sizes" : { }
        },

Proposing this so that we get shard-level visibility as well.

Additionally, given that remote store backpressure rejections can be split further based on various rejection reasons (code pointer), I am proposing to use the field name total_rejections instead of rejections to allow us some room to add additional rejection metrics in future without making the current metric confusing.

What do you think?

Thanks!

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew

@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Merging #10524 (a0323f8) into main (6c02261) will increase coverage by 0.10%.
Report is 2 commits behind head on main.
The diff coverage is 79.48%.

❗ Current head a0323f8 differs from pull request most recent head cdbe14d. Consider uploading reports for the commit cdbe14d to get more accurate results

@@             Coverage Diff              @@
##               main   #10524      +/-   ##
============================================
+ Coverage     71.12%   71.22%   +0.10%     
- Complexity    58391    58428      +37     
============================================
  Files          4845     4844       -1     
  Lines        275335   275315      -20     
  Branches      40088    40088              
============================================
+ Hits         195827   196091     +264     
+ Misses        63147    62820     -327     
- Partials      16361    16404      +43     
Files Coverage Δ
...c/main/java/org/opensearch/index/IndexService.java 75.48% <ø> (-0.22%) ⬇️
...earch/index/SegmentReplicationPressureService.java 77.88% <100.00%> (+1.18%) ⬆️
...rch/index/remote/RemoteSegmentTransferTracker.java 81.10% <ø> (ø)
...in/java/org/opensearch/index/shard/IndexShard.java 69.88% <100.00%> (+0.77%) ⬆️
...in/java/org/opensearch/indices/IndicesService.java 70.40% <ø> (+0.14%) ⬆️
...ch/indices/cluster/IndicesClusterStateService.java 73.82% <100.00%> (+6.97%) ⬆️
...in/java/org/opensearch/index/ReplicationStats.java 88.88% <80.00%> (+11.46%) ⬆️
...rg/opensearch/index/remote/RemoteSegmentStats.java 84.50% <72.22%> (-1.65%) ⬇️

... and 449 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit

Bhumika Saini added 7 commits October 13, 2023 16:04
@BhumikaSaini-Amazon BhumikaSaini-Amazon changed the title [Remote Store] Add Remote Store backpressure and Segment Replication backpressure stats to _nodes/stats [Remote Store] Add Remote Store backpressure rejection stats to _nodes/stats Oct 13, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT:
  • URL:
  • CommitID: e0cf278
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@BhumikaSaini-Amazon
Copy link
Contributor Author

@mch2, @gbbafna I have updated this PR to add only the code changes for remote store backpressure rejection stats. @Rishikesh1159 has inflight changes to add the segrep backpressure rejection stats, which we will track as a separate PR. This PR as well as @Rishikesh1159's PR both need to be merged to resolve #10501.

Signed-off-by: Bhumika Saini <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT:
  • URL:
  • CommitID: 1365951
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

Signed-off-by: Bhumika Saini <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Bhumika Saini <[email protected]>
@BhumikaSaini-Amazon
Copy link
Contributor Author

Summary of failed tests:

  1. org.opensearch.indices.replication.SegmentReplicationIT.testSendCorruptBytesToReplica => known issue [BUG] SegmentReplicationIT.testSendCorruptBytesToReplica fails with specific seed #10542
  2. org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled => SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled is flaky #5957
  3. org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueSameSize => need to check but seems unrelated to this change
  4. org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod => probably reported due to issue# 3 above

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna gbbafna merged commit 6c1bd48 into opensearch-project:main Oct 14, 2023
13 of 14 checks passed
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Oct 14, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10524-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6c1bd487da8ca1794f29302d7a6c6a713f9c6a01
# Push it to GitHub
git push --set-upstream origin backport/backport-10524-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-10524-to-2.x.

BhumikaSaini-Amazon pushed a commit to BhumikaSaini-Amazon/OpenSearch that referenced this pull request Oct 16, 2023
gbbafna pushed a commit that referenced this pull request Oct 16, 2023
…s/stats (#10524) (#10629)

* [Remote Store] Add Remote Store backpressure rejection stats to _nodes/stats (#10524)

Signed-off-by: Bhumika Saini <[email protected]>

* Update version check

Signed-off-by: Bhumika Saini <[email protected]>

---------

Signed-off-by: Bhumika Saini <[email protected]>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 19, 2023
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed bug Something isn't working Storage:Remote
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants