-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18231. Fixes failing tests & drain stream async. #4386
HADOOP-18231. Fixes failing tests & drain stream async. #4386
Conversation
made some comments. looked at the EOF swallowing code...i think we need to be happy there that all is good in terms of handling recoverable failures but not retrying on nonrecoverable ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made some minor suggestions, created a jira on "need to make sure retry code is good", which is very much manual review work, relying on @retries annotations for guidance
@@ -64,6 +67,12 @@ public class S3File implements Closeable { | |||
// That allows us to close the object when closing the stream. | |||
private Map<InputStream, S3Object> s3Objects; | |||
|
|||
// uri of the object being read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you make these all javadocs, so they show up in IDEs.
@@ -98,6 +107,7 @@ public S3File( | |||
this.streamStatistics = streamStatistics; | |||
this.changeTracker = changeTracker; | |||
this.s3Objects = new IdentityHashMap<InputStream, S3Object>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you are there, can you change to a simple <> here
Io.closeIgnoringIoException(inputStream); | ||
Io.closeIgnoringIoException(obj); | ||
Io.closeIgnoringIoException(requestObject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which library this is. switch to our IOUtils.cleanupWithLogger(lOG, inputStream, requestObject)
this will log any exceptions at debug, if ever needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a new Io
class added as part of the initial prefetching PR. I've removed this class, and updated usages with cleanupWithLogger
|
||
Validate.checkNotNull(context, "context"); | ||
Validate.checkNotNull(s3Attributes, "s3Attributes"); | ||
Validate.checkNotNull(client, "client"); | ||
Validate.checkNotNull(streamStatistics, "streamStatistics"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of those things we'd inevitably want to clean up. i can see a "steve does a cleanup" PR coming soon....
If you use Objects.requireNonNull()
you can add the check on L119 so no need for extra work...its where new code should be going, even if older code still uses the guava checkNotNull
private static final int _1K = 1024; | ||
// Path for file which should have length > block size so S3CachingInputStream is used | ||
private Path largeFile; | ||
private FileSystem fs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you name this largeFileFS to make clear why it is being used
* Test the prefetching input stream, validates that the underlying S3CachingInputStream and | ||
* S3InMemoryInputStream are working as expected. | ||
*/ | ||
public class ITestS3PrefetchingInputStream extends AbstractS3ACostTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call this ITestS3PrefetchingInputStream? I want to rename the other classes too, but let's start with the new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guessing you meant something else as it's already called ITestS3PrefetchingInputStream?
super(true); | ||
} | ||
|
||
private static final int _1K = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hadoop common test should have a version of
org.apache.hadoop.fs.azure.integration.Sizes which we can reference here and elsewhere. it uses the S_ prefix to keep checkstyle quiet.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3File.java
Show resolved
Hide resolved
* @param inputStream stream to close. | ||
* @return was the stream aborted? | ||
*/ | ||
private boolean drainOrAbortHttpStream(boolean shouldAbort, final String reason, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: can you split one to a line, all tagged final. there's enough params to justify the effort
try (FSDataInputStream in = fs.open(largeFile)) { | ||
IOStatistics ioStats = in.getIOStatistics(); | ||
|
||
byte[] buffer = new byte[(int) largeFileSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is trouble if that test file is really big. prefer to create a smaller buffer and iterate through with readFully calls.
now, one thing we don't check reliably, even in AbstractSTestS3AHugeFiles, is are the contents of large files constant even if you read them in different ways?
i think we need to worry about this, though also think AbstractSTestS3AHugeFiles might be the place...we can build an md5 checksum on the upload and verify on the reads.
@steveloughran thanks for reviewing, I've updated as per your comments |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. pending one minor change
the new test still needs a trailing newline to be added after the final "}" ... I think you added it above (which is also good)
add that and it's ready to merge. I actually want to test this out this week with the cloudstore's bandwidth command
https://github.com/steveloughran/cloudstore#command-bandwidth
@@ -284,8 +308,8 @@ private boolean drainOrAbortHttpStream(boolean shouldAbort, final String reason, | |||
shouldAbort = true; | |||
} | |||
} | |||
Io.closeIgnoringIoException(inputStream); | |||
Io.closeIgnoringIoException(requestObject); | |||
cleanupWithLogger(LOG, inputStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, you can pass any number of Closeables in; not worth updating the PR for, but useful to know
} | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still needs a trailing newline after the closing bracket;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I look into adding this as a CheckStyle rule? I can open a JIRA if so.
Looks like there's one to fit this: https://checkstyle.sourceforge.io/config_misc.html#NewlineAtEndOfFile
thanks @steveloughran, have added in the new line |
🎊 +1 overall
This message was automatically generated. |
Hey @steveloughran, we have a few prefetching PRs starting to get blocked on this one which fixes some unit tests. Can you take a look next week? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, merging. thanks!
apache#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail
This is the the a rollup patch of the HADOOP-18028 S3A performance input stream feature branch. Contains HADOOP-18028. High performance S3A input stream (apache#4109) This is the the merge of the HADOOP-18028 S3A performance input stream. This patch on its own is incomplete and must be accompanied by all other commits with HADOOP-18028 in their git commit message. Consult the JIRA for that list Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures in S3A prefetching stream (apache#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (apache#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (apache#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (apache#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (apache#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (apache#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail. Change-Id: I48f217086531c12d6e2f0f91e39f17054a74d20f
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (apache#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (apache#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (apache#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (apache#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (apache#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (apache#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (apache#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran Change-Id: I6511c51c3580c57eb72e8ea686c88e3917d12a06
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (apache#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (apache#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (apache#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (apache#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (apache#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (apache#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (apache#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (apache#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (apache#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (apache#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (apache#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (apache#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (apache#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (apache#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran Change-Id: I3eca19564dc0c0cb83184f4a42605dbafd908937
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran
This is the the preview release of the HADOOP-18028 S3A performance input stream. It is still stabilizing, but ready to test. Contains HADOOP-18028. High performance S3A input stream (#4109) Contributed by Bhalchandra Pandit. HADOOP-18180. Replace use of twitter util-core with java futures (#4115) Contributed by PJ Fanning. HADOOP-18177. Document prefetching architecture. (#4205) Contributed by Ahmar Suhail HADOOP-18175. fix test failures with prefetching s3a input stream (#4212) Contributed by Monthon Klongklaew HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386) * adds in new test for prefetching input stream * creates streamStats before opening stream * updates numBlocks calculation method * fixes ITestS3AOpenCost.testOpenFileLongerLength * drains stream async * fixes failing unit test Contributed by Ahmar Suhail HADOOP-18254. Disable S3A prefetching by default. (#4469) Contributed by Ahmar Suhail HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458) This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with new statistic names in StreamStatistics. This stream is not (yet) IOStatisticsContext aware. Contributed by Ahmar Suhail HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching Contributed by Steve Loughran
Description of PR
This is a combined PR for:
Add new test for Prefetching:
JIRA. Previous PR before rebase was opened here.
ITestS3AInputStreamPerformance
was failing when prefetching is enabled. This PR disables prefetching when running ITestS3AInputStreamPerformance and adds in a new test classITestS3PrefetchingInputStream
with tests specific for prefetching.Drain stream async:
JIRA. Previous PR before rebase was opened here.
If we close the prefetching input stream before prefetched blocks have finished reading the S3 input stream, the sdk repeatedly complains "Not all bytes were read from the S3ObjectInputStream". This happened on S3AInputStream as well, see this issue for more details. Closing the stream before draining it will abort the connection, so to allow for connection reuse we drain it asynchronously when remaining bytes is > async drain threshold.
Fix ITestS3AOpenCost:
JIRA
This test was failing after the rebase.
testOpenFileShorterLength
fails because inputStreamStats need to be created before opening the stream so that the time to prepare to open the file is included.testOpenFileLongerLength
fails because in this read method when for eg: file length is 8 and length to read is 18, it swallows the EOF exception and returns the num of successful bytes read which will be 8. So readFully doesn't throw an EOF immediately but then tries to read bytes 8-18 again, and then fails.How was this patch tested?
Tested in eu-west-1 by running
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify