Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Fix flaky tests testSegmentReplicationStatsResponse() and testSegmentReplicationStatsWithTimeout() #6268
[Segment Replication] Fix flaky tests testSegmentReplicationStatsResponse() and testSegmentReplicationStatsWithTimeout() #6268
Changes from 5 commits
138c56b
d15da00
07fc9c3
a18d05f
95f2501
043d886
e9b91d0
bb0f624
6950d71
ba41b87
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rishikesh1159 : I also fixed this flaky test as part of #6370. Please rebase your changes against
main
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure let me rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I see one run where this is still flaky #6366 (comment).
Previously, I root caused the problem happening due to race condition and fixed by asserting on replication state to be done which fixed it from 20% failing -> 0%. The assertBusy block by default waits for
10s
not sure if segment replication needs more time to complete. May be we can increase the assertBusy timeout to 60secs and check if this test is still flaky.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes testSegmentReplicationStatsResponse() was flaky previously because of stage not reaching done within certain time. But now I have changed the test completely. As we are only testing API response and we are concerned only about API response but not about segment replication stage. So I changed the test and we no more assert on segrep stage which was causing flakiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With changes in testSegmentReplicationStatsResponse() we only assert if we get a successful or failure response from API call. We are not concerned about if segment replication event has finished processing or not, as purpose of API is only to give response irrespective of state of segrep event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is still value in asserting the replication state as it can uncover bugs where replication never completes for various reasons and it is also part of API response. I will leave it for you to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add asserts on replication state. It doesn't break anything but as you said there is a chance it can uncover some bugs. I have added that in latest commit. Thanks @dreamer-89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use
assertEquals
instead ofassertThat
here. This is applicable for other usages ofassertThat
as wellThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I will update all usages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change is needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If shard is primary we don't even need to make a call to get segment replication state, as segment replication only happens on replica shards. If we don't have this primary check here, it will make a call to get segment replication state and check some collections and eventually return null on a primary shard. So with this addition of condition check we are just returning null early if it is a primary sahrd