Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18231. Adds in new test for S3PrefetchingInputStream #4305

Conversation

ahmarsuhail
Copy link
Contributor

@ahmarsuhail ahmarsuhail commented May 12, 2022

Description of PR

ITestS3AInputStreamPerformance was failing when prefetching is enabled. This PR disables prefetching when running ITestS3AInputStreamPerformance.

It also adds in a new test class ITestS3PrefetchingInputStream with tests specific for prefetching. Once more stats are added to prefetching iostats, we can add in more tests + assertions. For eg it would be good to assert on if a file has been read via the cache or not on a backward seek.

How was this patch tested?

Tested in eu-west-1 by running

mvn -Dparallel-tests -DtestsThreadCount=16 clean verify

steveloughran and others added 7 commits March 28, 2022 11:05
This is the the initial merge of the HADOOP-18028 S3A performance input stream.
This patch on its own is incomplete and must be accompanied by all other commits
with HADOOP-18028 in their git commit message. Consult the JIRA for that list

Contributed by Bhalchandra Pandit.
…3A prefetching stream (apache#4115)


Contributed by PJ Fanning.
Copy link
Contributor

@monthonk monthonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, suggested some minor changes.

largeFileSize = fileStatus.getLen();
numBlocks = (largeFileSize == 0) ?
0 :
((int) (largeFileSize / blockSize)) + (largeFileSize % blockSize > 0 ? 1 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a constant instead of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the size of the file being used (landsat-pds/scene_list.gz), so needs to be calculated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth putting it into a private method in case we want to move numBlocks around

blockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE);
fs = largeFile.getFileSystem(getConfiguration());
FileStatus fileStatus = fs.getFileStatus(largeFile);
largeFileSize = fileStatus.getLen();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large is this file? I think we can use smaller block size to keep the cost at minimum.

Copy link
Contributor Author

@ahmarsuhail ahmarsuhail May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently using the landsat file landsat-pds/scene_list.gz which has a size of 42MB

@ahmarsuhail
Copy link
Contributor Author

Thanks @monthonk. As discussed, instead of using landsat-pds/scene_list.gz, I tried creating a smaller file (size 16k) and setting the block size to 4K, which would make the tests faster. But currently if you try to set the block size < default size (8M), validation fails here . The min allowed block size currently is PREFETCH_BLOCK_DEFAULT_SIZE , I'm not sure if this is something we want to update.

@monthonk
Copy link
Contributor

Thanks for clarifying @ahmarsuhail, then we probably have to test with this big file for now.

@ahmarsuhail
Copy link
Contributor Author

just FYI, testRandomReadLargeFile takes around 23 seconds to finish & testReadLargeFileFully takes 26 seconds. Wondering if that's too long and we should consider using a smaller file and updating block size validation.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ feature-HADOOP-18028-s3a-prefetch Compile Tests _
+1 💚 mvninstall 41m 7s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 compile 1m 0s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 53s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 43s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 mvnsite 0m 55s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 javadoc 0m 34s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 39s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 34s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 shadedclient 26m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 45s the patch passed
+1 💚 compile 0m 51s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 51s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 36s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 23s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 2 new + 11 unchanged - 0 fixed = 13 total (was 11)
+1 💚 mvnsite 0m 42s the patch passed
+1 💚 javadoc 0m 21s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 29s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 23s the patch passed
+1 💚 shadedclient 25m 39s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 14s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 43s The patch does not generate ASF License warnings.
110m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4305/4/artifact/out/Dockerfile
GITHUB PR #4305
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 2dd2aa4928de 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-18028-s3a-prefetch / 9026fab
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4305/4/testReport/
Max. process+thread count 594 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4305/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to create/use a new fs instance. just set up the config in a subclassed `createConfiguration()' call and let the superclass do the work

public void setup() throws Exception {
super.setup();

Configuration conf = getConfiguration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comes in too late as the superclass will have created the fs already.

override createConfiguration() which is where the config to use is created

use 'S3ATestUtils.removeBaseAndBucketOverrides()` to clear any per bucket option before setting it

private static final int _1K = 1024;
// Path for file which should have length > block size so S3CachingInputStream is used
private Path largeFile;
private FileSystem fs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superclass already creates an fs in setup, shuts it safely in teardown, uses it to clean up dest paths. no need to create a new one

Path smallFile = path("randomReadSmallFile");
ContractTestUtils.writeDataset(getFileSystem(), smallFile, data, data.length, 16, true);

try (FSDataInputStream in = getFileSystem().open(smallFile)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try using the openFile call for consistency, as that's where we can add the option for switching to this on a per stream basis...primarily for testing

Path dataFile = path("testReadOverBuffer.bin");
bindS3aFS(dataFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, just set up filesystem config in createConfiguration()

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ feature-HADOOP-18028-s3a-prefetch Compile Tests _
+1 💚 mvninstall 37m 22s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 compile 1m 3s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 57s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 46s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 mvnsite 1m 1s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 javadoc 0m 50s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 53s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 41s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 shadedclient 21m 58s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 43s the patch passed
+1 💚 compile 0m 47s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 47s the patch passed
+1 💚 compile 0m 40s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 40s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 28s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 2 new + 11 unchanged - 0 fixed = 13 total (was 11)
+1 💚 mvnsite 0m 45s the patch passed
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 34s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 18s the patch passed
+1 💚 shadedclient 20m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 45s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
98m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4305/5/artifact/out/Dockerfile
GITHUB PR #4305
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 4fbcbfa9c7c5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-18028-s3a-prefetch / b170718
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4305/5/testReport/
Max. process+thread count 666 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4305/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus May 30, 2022
@apache apache deleted a comment from hadoop-yetus May 30, 2022
@apache apache deleted a comment from hadoop-yetus May 30, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm happy with the changes; one little checkstyle issue and danny's comment are all that we need.

I'm not too worried about the time for the tests provided it runs in the parallel tests; then it won't hold up work. a larger source file is more rigorous, and if there's a regression which kills performance, these tests will find it

private int blockSize;
private long largeFileSize;
// Size should be < block size so S3InMemoryInputStream is used
private static final int smallFileSize = _1K * 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be capitalised to keep checkstyle quiet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants