-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Fix for AlreadyClosedException for engine #4743
[Segment Replication] Fix for AlreadyClosedException for engine #4743
Conversation
Gradle Check (Jenkins) Run Completed with:
|
b8f295a
to
cd4d7d7
Compare
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following some of the test logic. Could you describe a bit more in your commit message when/why the exception occurs and how this fixes it? From reading this and the original issue, it seems this happens when a shard/node is shut down while a replica is currently fetching new segments.
client().prepareIndex(INDEX_NAME).setId("1").setSource("foo", "bar").setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).get(); | ||
refresh(INDEX_NAME); | ||
|
||
final int initialDocCount = scaledRandomIntBetween(10000, 200000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a huge number of docs for an integ test. Given the test waits until replicas catch up to proceed with assertions, can we get by with 100-200?
) { | ||
indexer.start(initialDocCount); | ||
waitForDocs(initialDocCount, indexer); | ||
refresh(INDEX_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we flush here and avoid the flush & refresh on line 266?
logger.info("--> Closing the index "); | ||
client().admin().indices().prepareClose(INDEX_NAME).get(); | ||
|
||
// Add another node to kick off TransportNodesListGatewayStartedShards which fetches latestReplicationCheckpoint for SegRep enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite following why we need to start 3rd node for this test? Before this step you have 2 nodes, with 1 primary & one replica shard & close the index. Are you wanting to test if the 3rd node is allocated one of the shards after its opened?
final long incomingGeneration = infos.getGeneration(); | ||
readerManager.updateSegments(infos); | ||
|
||
// Commit and roll the xlog when we receive a different generation than what was last received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we use translog
instead of xlog
?
Signed-off-by: Poojita Raj <[email protected]>
1b9790c
to
eeeae29
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4743 +/- ##
============================================
- Coverage 70.97% 70.91% -0.06%
+ Complexity 58172 58127 -45
============================================
Files 4708 4708
Lines 277556 277559 +3
Branches 40189 40189
============================================
- Hits 196992 196836 -156
- Misses 64462 64608 +146
- Partials 16102 16115 +13
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
client().admin().indices().prepareClose(INDEX_NAME).get(); | ||
|
||
logger.info("--> Opening the index"); | ||
client().admin().indices().prepareOpen(INDEX_NAME).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add assertion of doc count after reopening the index ?
CHANGELOG.md
Outdated
@@ -210,6 +210,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) | |||
- Fix version check for 2.x release for awareness attribute decommission([#5034](https://github.com/opensearch-project/OpenSearch/pull/5034)) | |||
- Fix flaky test ResourceAwareTasksTests on Windows ([#5077](https://github.com/opensearch-project/OpenSearch/pull/5077)) | |||
- Length calculation for block based fetching ([#5055](https://github.com/opensearch-project/OpenSearch/pull/5055)) | |||
- Fix for AlreadyClosedException for engine ([#4743](https://github.com/opensearch-project/OpenSearch/pull/4743)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - Pls specify this as a segrep only issue.
Signed-off-by: Poojita Raj <[email protected]>
1d4a6b4
to
b1b064d
Compare
Gradle Check (Jenkins) Run Completed with:
|
* alreadyClosedExceptionFix Signed-off-by: Poojita Raj <[email protected]> * adding changelog entry Signed-off-by: Poojita Raj <[email protected]> Signed-off-by: Poojita Raj <[email protected]> (cherry picked from commit 37d1eba) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> # Conflicts: # CHANGELOG.md
Gradle Check (Jenkins) Run Completed with:
|
… (#5116) * alreadyClosedExceptionFix Signed-off-by: Poojita Raj <[email protected]> * adding changelog entry Signed-off-by: Poojita Raj <[email protected]> Signed-off-by: Poojita Raj <[email protected]> (cherry picked from commit 37d1eba) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> # Conflicts: # CHANGELOG.md Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Poojita Raj [email protected]
Description
Engine AlreadyClosedException exception takes place when replication is finalized on replica shard and when translog generation rollover takes place at which point we see the engine has already been closed.
We want to make sure we only update the segments when the engine is open. Added an integration test for the same.
Issues Resolved
Resolves #4530
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.