-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17409. Remove s3guard from S3A module #3534
HADOOP-17409. Remove s3guard from S3A module #3534
Conversation
@mehakmeet @mukund-thakur @bogthe @bgaborg going to be a big patch, but it removes everything. Current PR turns s3guard off, but doesn't yet remove the (no always no-op API calls) from the listing/delete/rename code. |
617c02c
to
2d6644a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great effort in doing the removal 👏 🙇
Added a few small comments. Also ran the tests + integration tests and they finished in ~5 mins 🎊 (on ubuntu).
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentS3Object.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
Outdated
Show resolved
Hide resolved
@@ -422,7 +315,7 @@ public void checkBasicFileOperations() throws Throwable { | |||
|
|||
readonlyFS.getFileStatus(emptyDir); | |||
// now look at a file; the outcome depends on the mode. | |||
accessDeniedIf(!guardedInAuthMode, () -> | |||
accessDeniedIf(!false, () -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: !false
=> true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
came from me fixing the calls to be true/false and then inlining; can make clause non-conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixedl and updated the comment above
...-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestRestrictedReadAccess.java
Outdated
Show resolved
Hide resolved
...-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestRestrictedReadAccess.java
Outdated
Show resolved
Hide resolved
...-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestRestrictedReadAccess.java
Outdated
Show resolved
Hide resolved
2dd5c19
to
ce7e4ac
Compare
testing s3 london, all good. removed ITestPartialRenamesDeletes test which was failing as the output of the processing of the multi object delete exception didn't match expectations. We don't use that feature in production any more, once the need to update the s3guard tables goes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing an initial review.
I see that you have removed MultiObjectDeleteSupport
class. Was class used solely for S3Guard? I still see some references in S3AUtils
which breaks the compilation of these changes when trying to build.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentAmazonS3Client.java
Show resolved
Hide resolved
in RenameOperation & DeleteOperatoin, move off guava preconditions at the same time; fix imports. |
2eced9d
to
d174cba
Compare
d174cba
to
adaa03e
Compare
spotbugs is #3630; |
minor spotbugs to fix up |
1490e29
to
85eb597
Compare
HBoss contains references to local metastore & ability to set it in maybe we should make that a no-op. or at least fix up hboss to never use it |
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some initial review. Will get back to this again. `
@@ -343,7 +343,7 @@ stores pretend that they are a FileSystem, a FileSystem with the same | |||
features and operations as HDFS. This is —ultimately—a pretence: | |||
they have different characteristics and occasionally the illusion fails. | |||
|
|||
1. **Consistency**. Object stores are generally *Eventually Consistent*: it | |||
1. **Consistency**. Object mqy be *Eventually Consistent*: it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : typo mqy -> may
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
hboss is just using the config options to switch to local s3guard in tests. we leave the options alone but: should we treat asking for local as a no op? because they call it to get a consistent s3 store...there's no risk of ddb tables corrupting the view of other processes |
test run s3 london.
|
+1 pending the yetus checks, tested on non-CSE and CSE test suite, everything went fine and getting the correct "not supported" error when turning s3guard on. |
a94e0e1
to
ac185db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this patch @steveloughran, LGTM.
Well, it's really the end of an era here - I remember we worked a lot to stabilize this. No more support needed for "out-of-band" operations :)
thanks. realised there's one thing left -an entry in troubleshooting about the new error message. building a release, generating the real output and adding. this is doc only change so shouldn't force a new review. time to "knife the baby" , as bill gates said to steve jobs |
Change-Id: I773848d0386d44f78dcca2126372e1ec734fce5e HADOOP-17409. Aggressive Cull of S3Guard from test code * obsolete tests cut completely * cost tests which are parameterized have had their scope cut down to keep/delete markers only. * some tests have had the guarded flags fixed to true/false, then inlined, then removed the branches which are no longer taken. Change-Id: I871a41b009a8abf1335211556c48ca220ae35b44 HADOOP-17409. remove inconsistency expectations in code * now unused classes * references to exceptions, in text * assumed role tests don't ask for DDB perms * All cost tests only test raw values Scale test down to 20 min Change-Id: Ia86ace7e25206f4885fcbb7943d925f2da31a3fa HADOOP-17409. remove S3Guard from docs Retaining some references. Change-Id: I97b68273ce84d440ebbac355d2dc2646b617ff33 HADOOP-17409. s3guard. continue purge most classes removed completely source and text scanned for s3guard and text "consistent"; removed where needed. a lot of javadoc fixup here. probably lots of unused import failures. Change-Id: Ibe94b11666870bd53c75bc6502ba4c6a9f4e99a7 HADOOP-17409. s3guard removal * remove references to dynamodb, metastore from code * core-default.xml stripped of s3guard revs * javadocs and comments updated * MultiObjectDeleteSupport cut MultiObjectDeleteSupport was there to ensure that the metastore was correctly updated after partial failure of delete. no metastore: no need for class, or for tests, which were a bit flaky, especially in ITestPartialRenamesDeletes. Change-Id: Ie88442268bb2fd39f0c8e3587eac20c55a222d8c HADOOP-17409. fix build, some imports. cut a lot more of the InconsistentClient; all it does now is simulate throttling. reinstate MultiObjectDeleteSupport, but only the exception translation. Change-Id: I55a94780ec9e27136ad6ca567f917feca6446609 HADOOP-17409. checkstyles Change-Id: I4a4800b789671598fe979bab3de00bde165976dd HADOOP-17409. more doc/comment purging + update all bucket-info examples in docs. + scan for references to consistent/consistency across project Change-Id: Icc43dbaf7d55c2afb697f2155cd940a98684c793
Change-Id: I33eced2a8635c9743a939c73274c048fea49e3ad
review comments, inc reverting one change purge text/comment references to auth and 'mode' Change-Id: I8b2969299b66bfae0526c37bb1a8c4c13f773e66
Change-Id: Ib13bc9b3320fd62935aa1fd3dddfc0038d0f6150
if an application (such as a test suite) explicitly requested the local metadata store, a warning is printed and the FS initialization continues without a metastore. This is for compatibility with applications/libraries which set this to be guaranteed a consistent store view within their test process. It is no longer needed -and so downgraded to a no-op. updated docs, including discussion on increased S3 IO Change-Id: Ifacb2d9688d1852cfded17cd1ff0eb0a3200425c
…UCCESS file Change-Id: I45001a6bdb0e59e5a37da21f31b033bb922b5a0a
getting rid of the tombstone reconciliation really simplifies list processing Change-Id: I2cb64e9a710f846e0c451dbcc241e924ce9189cc
Change-Id: I5d667b669890d9c150636b397e897aede3372d5e
b7fed11
to
4cfc07e
Compare
rebased to trunk to line everything up for a merge and to retest locally first |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
done. will do the backporting next |
Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran. Change-Id: I4b8429640d6debd3928f991ef5fbc6d0aa1cab55
Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran.
Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran.
Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran.
Description of PR
Note: this patch retains the hadoop s3guard command.
This is because it is a broadly documented, used in scripts and tests. We will have to retain it.
We can add another entry point "s3a" which would invoke the same operations.
However, we still get to maintain the s3guard command and will then be left with the
issue that you can only use that new command on more recent versions of Hadoop.
How was this patch tested?
deleting tests until everything worked again
For code changes:
TODO
whenGuarded()
clauses from cost tests;withWhenRaw()
to become simplywith()