Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadoop-18184. Adds support for unbuffer #4298

Conversation

ahmarsuhail
Copy link
Contributor

@ahmarsuhail ahmarsuhail commented May 10, 2022

Description of PR

This PR adds support for unbuffer.

Unbuffer is used by certain applications (eg: Impala) when they want to hold onto an input stream but free the resource it's using. This is useful as when it needs to read from the stream again, it doesn't have to open the stream again, and can save on HEAD calls.

For prefetching, unbuffer needs to free up the buffer pool, delete any local files, clear state about blocks in the file etc. Also, when reading after an unbuffer, the input stream should reinitialise all this state. It should also read from the last active position before the read.

How was this patch tested?

Tested in eu-west-1 by running

mvn -Dparallel-tests -DtestsThreadCount=16 clean verify

ITestS3AInputStreamPerformance is failing, unrelated to this PR. Created issue: https://issues.apache.org/jira/browse/HADOOP-18231

ITestS3AUnbuffer fails. instance of assertion & isObjectStreamOpen() fails. Similar to the above issue, there are a few different ways to fix this test. I'm not sure what the best way is. Parameterized tests and different assertions based on if prefetching is enabled/new tests? I've left it failing for now.

All unbuffer contract tests are passing now.

Also tested a few different read sequences, for eg seek should work after an unbuffer:

in.read(buffer, 0, _1MB * 3);
in.unbuffer();
in.seek(_10MB);
in.read(buffer, 0, _1MB * 3);

steveloughran and others added 10 commits March 28, 2022 11:05
This is the the initial merge of the HADOOP-18028 S3A performance input stream.
This patch on its own is incomplete and must be accompanied by all other commits
with HADOOP-18028 in their git commit message. Consult the JIRA for that list

Contributed by Bhalchandra Pandit.
…3A prefetching stream (apache#4115)


Contributed by PJ Fanning.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ feature-HADOOP-18028-s3a-prefetch Compile Tests _
+1 💚 mvninstall 45m 26s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 compile 0m 56s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 47s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 42s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 mvnsite 0m 55s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 javadoc 0m 39s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 42s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 36s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 shadedclient 26m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 45s the patch passed
+1 💚 compile 0m 50s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 50s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 38s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 27s the patch passed
+1 💚 mvnsite 0m 45s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 34s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌ spotbugs 1m 22s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 24m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 56s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
114m 33s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
numBlocksToPrefetch isn't initialized in org.apache.hadoop.fs.s3a.read.S3CachingInputStream.initializeUnderlyingResources() when invoked from constructor for superclass At S3CachingInputStream.java:when invoked from constructor for superclass At S3CachingInputStream.java:[line 81]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4298/1/artifact/out/Dockerfile
GITHUB PR #4298
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 073a5278cea8 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-18028-s3a-prefetch / 2e455d0
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4298/1/testReport/
Max. process+thread count 606 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4298/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ feature-HADOOP-18028-s3a-prefetch Compile Tests _
+1 💚 mvninstall 41m 45s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 compile 1m 5s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 57s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 51s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 mvnsite 1m 5s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 javadoc 0m 46s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 52s feature-HADOOP-18028-s3a-prefetch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 43s feature-HADOOP-18028-s3a-prefetch passed
+1 💚 shadedclient 25m 33s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 45s the patch passed
+1 💚 compile 0m 51s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 51s the patch passed
+1 💚 compile 0m 42s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 42s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 26s the patch passed
+1 💚 mvnsite 0m 46s the patch passed
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 34s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌ spotbugs 1m 24s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 24m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 45s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
110m 1s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
numBlocksToPrefetch isn't initialized in org.apache.hadoop.fs.s3a.read.S3CachingInputStream.initializeUnderlyingResources() when invoked from constructor for superclass At S3CachingInputStream.java:when invoked from constructor for superclass At S3CachingInputStream.java:[line 81]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4298/2/artifact/out/Dockerfile
GITHUB PR #4298
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 8f58b81d4856 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-18028-s3a-prefetch / 34c5e5d
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4298/2/testReport/
Max. process+thread count 692 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4298/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@dannycjones
Copy link
Contributor

@ahmarsuhail for JIRA to pick up the PR, can you update the title to "HADOOP-18184. Add support for unbuffer"

@ahmarsuhail ahmarsuhail changed the title Hadoop 18184. Adds support for unbuffer Hadoop-18184. Adds support for unbuffer May 18, 2022
@asfgit asfgit force-pushed the feature-HADOOP-18028-s3a-prefetch branch from f38bbe2 to b75b72b Compare May 30, 2022 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants