Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch translog sync/upload per x ms for remote-backed indexes #5854

Merged
merged 17 commits into from
Jan 29, 2023

Conversation

ashking94
Copy link
Member

@ashking94 ashking94 commented Jan 13, 2023

Description

Translog sync takes care of local fsync and translog upload onto remote store. Currently, there is implicit buffering that happens as the remote store upload is a time consuming operation. However, every upload adds extra cost of network interaction along with the actual file upload. If we can buffer for a pareto-optimal duration, then we can save on the additional network interaction costs and overall achieve lower latencies and higher indexing throughput in comparison to non-buffered approach. There is also a delay optimisation in place that make sure that if the upload took considerable time, then schedule the next run with a decreased interval maintaining the overall buffer interval in check.

Credits for some of the code - Ashwin Pankaj, Laxman Muttineni

Issues Resolved

This solves #5692

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ashking94 ashking94 self-assigned this Jan 13, 2023
@ashking94 ashking94 added Storage:Durability Issues and PRs related to the durability framework Performance This is for any performance related enhancements or bugs v2.6.0 'Issues and PRs related to version v2.6.0' distributed framework skip-changelog labels Jan 13, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testCancellation

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2023

Codecov Report

Merging #5854 (105cbc5) into main (715ff72) will decrease coverage by 0.04%.
The diff coverage is 75.60%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #5854      +/-   ##
============================================
- Coverage     70.75%   70.72%   -0.04%     
+ Complexity    58720    58704      -16     
============================================
  Files          4771     4772       +1     
  Lines        280818   280887      +69     
  Branches      40568    40572       +4     
============================================
- Hits         198704   198663      -41     
- Misses        65824    65860      +36     
- Partials      16290    16364      +74     
Impacted Files Coverage Δ
...pensearch/common/settings/IndexScopedSettings.java 100.00% <ø> (ø)
...earch/common/util/concurrent/AsyncIOProcessor.java 92.98% <66.66%> (-1.02%) ⬇️
...mmon/util/concurrent/BufferedAsyncIOProcessor.java 68.96% <68.96%> (ø)
...org/opensearch/cluster/metadata/IndexMetadata.java 84.36% <76.92%> (-0.12%) ⬇️
...in/java/org/opensearch/index/shard/IndexShard.java 69.95% <78.94%> (-0.16%) ⬇️
...n/java/org/opensearch/common/settings/Setting.java 89.92% <100.00%> (+0.06%) ⬆️
.../main/java/org/opensearch/index/IndexSettings.java 86.20% <100.00%> (+0.11%) ⬆️
...ain/java/org/opensearch/threadpool/ThreadPool.java 83.48% <100.00%> (+0.69%) ⬆️
...n/admin/cluster/node/tasks/get/GetTaskRequest.java 30.30% <0.00%> (-63.64%) ⬇️
...port/ResponseHandlerFailureTransportException.java 0.00% <0.00%> (-60.00%) ⬇️
... and 506 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ashking94 ashking94 changed the title Batch translog upload per x ms to allow high index throughput Batch translog upload per x ms for remote-backed indexes to allow high index throughput Jan 18, 2023
@ashking94 ashking94 changed the title Batch translog upload per x ms for remote-backed indexes to allow high index throughput Batch translog upload per x ms for remote-backed indexes Jan 18, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testDeleteOperations

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testWriteRead
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotAndRestore
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testReadNonExistingPath
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testList
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testIndicesDeletedFromRepository
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testContainerCreationAndDeletion
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testDeleteOperations
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermark

ashking94 and others added 8 commits January 25, 2023 13:54
Co-authored-by: Ashwin Pankaj <[email protected]>
Co-authored-by: Laxman Muttineni <[email protected]>
Signed-off-by: Ashish Singh <[email protected]>
Signed-off-by: Ashish Singh <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock
      1 org.opensearch.action.admin.cluster.tasks.PendingTasksBlocksIT.testPendingTasksWithClusterNotRecoveredBlock

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, otherwise looks good.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna gbbafna merged commit af566e1 into opensearch-project:main Jan 29, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Jan 29, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5854-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 af566e156fefdba192343ba8f7fce84f17d2a07a
# Push it to GitHub
git push --set-upstream origin backport/backport-5854-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5854-to-2.x.

@ashking94 ashking94 added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Jan 30, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5854-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 af566e156fefdba192343ba8f7fce84f17d2a07a
# Push it to GitHub
git push --set-upstream origin backport/backport-5854-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5854-to-2.x.

ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 30, 2023
…arch-project#5854)

* Batch translog upload per x ms to allow high index throughput

Signed-off-by: Ashish Singh <[email protected]>
Co-authored-by: Ashwin Pankaj <[email protected]>
Co-authored-by: Laxman Muttineni <[email protected]>
Signed-off-by: Ashish Singh <[email protected]>
ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 30, 2023
…arch-project#5854)

* Batch translog upload per x ms to allow high index throughput

Signed-off-by: Ashish Singh <[email protected]>
Co-authored-by: Ashwin Pankaj <[email protected]>
Co-authored-by: Laxman Muttineni <[email protected]>
Signed-off-by: Ashish Singh <[email protected]>
gbbafna pushed a commit that referenced this pull request Jan 30, 2023
…indexes (#5854) (#6066)

* Batch translog sync/upload per x ms for remote-backed indexes (#5854)

Signed-off-by: Ashish Singh <[email protected]>
Co-authored-by: Ashwin Pankaj <[email protected]>
Co-authored-by: Laxman Muttineni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch distributed framework Performance This is for any performance related enhancements or bugs skip-changelog Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants