-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18231. Adds in new test for S3PrefetchingInputStream #4305
HADOOP-18231. Adds in new test for S3PrefetchingInputStream #4305
Conversation
This is the the initial merge of the HADOOP-18028 S3A performance input stream. This patch on its own is incomplete and must be accompanied by all other commits with HADOOP-18028 in their git commit message. Consult the JIRA for that list Contributed by Bhalchandra Pandit.
…3A prefetching stream (apache#4115) Contributed by PJ Fanning.
Contributed by Ahmar Suhail
…ache#4212) Contributed by Monthon Klongklaew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, suggested some minor changes.
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3PrefetchingInputStream.java
Show resolved
Hide resolved
largeFileSize = fileStatus.getLen(); | ||
numBlocks = (largeFileSize == 0) ? | ||
0 : | ||
((int) (largeFileSize / blockSize)) + (largeFileSize % blockSize > 0 ? 1 : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a constant instead of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on the size of the file being used (landsat-pds/scene_list.gz), so needs to be calculated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth putting it into a private method in case we want to move numBlocks
around
blockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE); | ||
fs = largeFile.getFileSystem(getConfiguration()); | ||
FileStatus fileStatus = fs.getFileStatus(largeFile); | ||
largeFileSize = fileStatus.getLen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large is this file? I think we can use smaller block size to keep the cost at minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is currently using the landsat file landsat-pds/scene_list.gz
which has a size of 42MB
...p-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3PrefetchingInputStream.java
Show resolved
Hide resolved
Thanks @monthonk. As discussed, instead of using |
Thanks for clarifying @ahmarsuhail, then we probably have to test with this big file for now. |
just FYI, |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to create/use a new fs instance. just set up the config in a subclassed `createConfiguration()' call and let the superclass do the work
public void setup() throws Exception { | ||
super.setup(); | ||
|
||
Configuration conf = getConfiguration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comes in too late as the superclass will have created the fs already.
override createConfiguration()
which is where the config to use is created
use 'S3ATestUtils.removeBaseAndBucketOverrides()` to clear any per bucket option before setting it
private static final int _1K = 1024; | ||
// Path for file which should have length > block size so S3CachingInputStream is used | ||
private Path largeFile; | ||
private FileSystem fs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superclass already creates an fs in setup, shuts it safely in teardown, uses it to clean up dest paths. no need to create a new one
Path smallFile = path("randomReadSmallFile"); | ||
ContractTestUtils.writeDataset(getFileSystem(), smallFile, data, data.length, 16, true); | ||
|
||
try (FSDataInputStream in = getFileSystem().open(smallFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try using the openFile call for consistency, as that's where we can add the option for switching to this on a per stream basis...primarily for testing
Path dataFile = path("testReadOverBuffer.bin"); | ||
bindS3aFS(dataFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, just set up filesystem config in createConfiguration()
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm happy with the changes; one little checkstyle issue and danny's comment are all that we need.
I'm not too worried about the time for the tests provided it runs in the parallel tests; then it won't hold up work. a larger source file is more rigorous, and if there's a regression which kills performance, these tests will find it
private int blockSize; | ||
private long largeFileSize; | ||
// Size should be < block size so S3InMemoryInputStream is used | ||
private static final int smallFileSize = _1K * 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be capitalised to keep checkstyle quiet
f38bbe2
to
b75b72b
Compare
Description of PR
ITestS3AInputStreamPerformance
was failing when prefetching is enabled. This PR disables prefetching when runningITestS3AInputStreamPerformance
.It also adds in a new test class
ITestS3PrefetchingInputStream
with tests specific for prefetching. Once more stats are added to prefetching iostats, we can add in more tests + assertions. For eg it would be good to assert on if a file has been read via the cache or not on a backward seek.How was this patch tested?
Tested in eu-west-1 by running
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify