-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18340. deleteOnExit does not work with S3AFileSystem #4608
Conversation
Will address the checkstyle issues. |
Hi @steveloughran, can you take a look at the patch? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code in s3afs and the tests all look good.
But i think the changes should be restricted to s3a fs, even if duplicates a bit of the superclass.
That whole section in the javadocs at the top of the Filesystem. class explains why; you've already seen some of those test values but it's and external implementations which we don't have control of. Adding anything is making a commitment to preserve a new public API forever.
I will be happier if you were just do it all in S3AFileSystem.
override deleteOnExit(Path f) and
- skip the exists check because it doesn't do the right thing if you call the method while writing a file, because the file isn't visible until close. saves HEAD/List probes too
- save the list to a set local to s3a fs. you could make its getter protected so that mock test can access it.
now, what about an integration test too?
add a test case which creates a new fs in the test case (keeping the normal getFileSystem() fs for assertions)
-
adds a file which doesn't exist
-
adds a file which doesn't exist, then create it
-
adds a file which does exist
-
add a directory path
-
close the fs, use ContractTestUtils methods to assert the files and dirs are gone.
seem good?
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3ADeleteOnExit.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
Thanks! Points well taken.
Got it, for s3a fs, the deleteOnExit set is going to be in s3a.
Will try to add an IT.
Sounds good to me, thanks for the feedbacks. May come back to you for test case, going to check it out first by myself. |
yes; look at some of the other ITest cases in the s3a code. if you aren't set up to run the hadoop-aws test suite, get set up to do that, as we require anyone submitting a pr to run the tests themselves first. getting them to work in your IDE is best for debugging. |
Thanks for info for ITtest for s3a. I did run IT for s3a before submitting the patch, but did not try to run in IDE yet. Let me try. |
db475ed
to
74f042d
Compare
Thanks @steveloughran for the helps. I pushed a new patch, which I hope addresses your comments.
I have run all integration test under hadoop-aws. Here are the results. Last Published: 2022-08-02 | Version: 3.4.0-SNAPSHOT
testCommonCrawlLookup[1] + [ Detail ] 0.234 testJobSubmissionCollectsTokens[1] + [ Detail ] 0.218 |
I also run all unitest cases under hadoop-aws. They passed. |
74f042d
to
38cd6ad
Compare
@steveloughran Sorry to bother you while you are busy, just a gentle reminder for the patch I posted couple days ago. Appreciate that you can review the patch when you get chance, thanks. |
yes, sorry, been neglecting this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the production code is good; some minor changes to the tests and its ready for the next hadoop release
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ADeleteOnExit.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ADeleteOnExit.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ADeleteOnExit.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ADeleteOnExit.java
Outdated
Show resolved
Hide resolved
I also run the new IT test without change in S3AFileSystem, it failed as expected. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. merging to trunk. will backport with a retest. patches which only touch hadoop-aws are always less risky than anything which changes hadoop-common, but i should test the changes before pushing to branch-3.3
Thanks @steveloughran for the review, really nice comments! |
Description of PR
processDeleteOnExit() is overiden in S3AFilesystem, it skips exist() check and delete objects without checking if FileSystem is closed.
How was this patch tested?
A new unitest case is added. And all unittest cases under hadoop-tools/hadoop-aws passed.
mvn -Dparallel-tests clean test
Did S3A Integration tests against us-west-2 region and there are a few failures/errors. Run the trunk code without the patch, there are same errors/failures. The errors/failures are not caused by the patch, probably due to misconfiguration ( I could not figure out)
mvn -Dparallel-tests clean verify
The result is
`
The errors are
org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob#testCommonCrawlLookup[1] + [ Detail ] | 0.324
| s3a://hbase-test-data/fork-0001/test: getFileStatus on s3a://hbase-test-data/fork-0001/test: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: XJ3DCCR6Q7SXTJDW; S3 Extended Request ID: fRmP3m1lThWxhj3s9VkSNEtuBz1JeBWYw65aRajrSg/H7IN+muB7d8PavSeqJ2urvLZtguTbnlc=; Proxy: null), S3 Extended Request ID: fRmP3m1lThWxhj3s9VkSNEtuBz1JeBWYw65aRajrSg/H7IN+muB7d8PavSeqJ2urvLZtguTbnlc=:InvalidAccessKeyId |
| |
| testJobSubmissionCollectsTokens[1] + [ Detail ] | 0.329
| s3a://hbase-test-data/fork-0001/test: getFileStatus on s3a://hbase-test-data/fork-0001/test: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: XJ35WHK7X6EMP9B6; S3 Extended Request ID: rtWEsDYcGqNiaoKy2D5EQQqN+O7MbYe1bYbiSmkF+FOz9/wb6+t+dQooqj7ppCSCZMBgC3PeEw4=; Proxy: null), S3 Extended Request ID: rtWEsDYcGqNiaoKy2D5EQQqN+O7MbYe1bYbiSmkF+FOz9/wb6+t+dQooqj7ppCSCZMBgC3PeEw4=:InvalidAccessKeyId
`
`
org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion#testBlankRegionTriggersSDKResolution + [ Detail ] | 2.817
| [Client region name] expected:<"[mars-north]-2"> but was:<"[us-west]-2">
and
org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials#estSTS
| : request session credentials: com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Cannot call GetSessionToken with session credentials (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: d935696d-7d98-4a1c-825f-22bf3d28ed9d; Proxy: null):AccessDenied
`
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?