-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE-451][Improvement] Read HDFS data files with random sequence to distribute pressure #452
Conversation
Codecov Report
@@ Coverage Diff @@
## master #452 +/- ##
============================================
- Coverage 58.72% 58.67% -0.05%
- Complexity 1652 1654 +2
============================================
Files 199 199
Lines 11214 11217 +3
Branches 996 997 +1
============================================
- Hits 6585 6582 -3
- Misses 4237 4243 +6
Partials 392 392
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -143,7 +143,7 @@ protected void init(String fullShufflePath) { | |||
LOG.warn("Can't create ShuffleReaderHandler for " + filePrefix, e); | |||
} | |||
} | |||
readHandlers.sort(Comparator.comparing(HdfsShuffleReadHandler::getFilePrefix)); | |||
Collections.shuffle(readHandlers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we print the file names? It's important for us to know the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @zuston
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What changes were proposed in this pull request?
[Improvement] Read HDFS data files with random sequence to distribute pressure #452
Why are the changes needed?
In PR #396 to support concurrently writing single partition's data into multiple HDFS files, it's better to randomly read HDFS data files to distribute stress in client side.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing UTs