Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16599. Allow a SignerInitializer to be specified along with a #1516

Merged
merged 3 commits into from
Oct 2, 2019

Conversation

sidseth
Copy link
Contributor

@sidseth sidseth commented Sep 24, 2019

Patch is missing some unit tests, which will get added soon. Posting early to solicit feedback on the interface additions - @steveloughran

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial patch looks OK

  • I've noted where I want classes moved into .auth or .impl groups; consider the base fs.s3a module somewhere where things shouldn't be dropped in any more

  • and what scope/stability tags we should have on a new feature

  • Have you thought about how to do an integration test here? I could imagine a custom signer which just forwards to the AWS signer

  • and what about collecting metrics on this, e.g. #of signing requests made. We could have another callback under org.apache.hadoop.fs.s3a.S3AInstrumentation which the signers could use to pass this info back

* limitations under the License.
*/

package org.apache.hadoop.fs.s3a;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to go into fs.s3a.impl.auth

@InterfaceAudience.LimitedPrivate("authorization-subsystems")
@InterfaceStability.Unstable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved it under fs.s3a.auth (not fs.s3a.auth.impl). This is an interface which is meant to be implemented by others.
Removed the interface* annotation in favor or package-info.

*
*/
@Public
@Evolving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* CustomSigners -> 'CSigner1:CustomSignerClass1,CSigner2:CustomerSignerClass2
* name will be associated with this signer class in the S3 SDK.
* Examples
* CustomSigner -> 'CustomSigner:org.apache...CustomSignerClass'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just had to merge https://issues.apache.org/jira/browse/HADOOP-16602 in to fix the javadoc here

  1. merge and copy the fix
  2. do a test run with mvn package to see all is well (I do mvn javadoc:javadoc when I remember)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Surprised the pre-commit didn't catch this.

* limitations under the License.
*/

package org.apache.hadoop.fs.s3a;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.hadoop.fs.s3a.auth.delegation;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.
On a related note - what are your thoughts on moving some of these delegation and auth interfaces to a new module - something like s3a-plugins. That makes it easier for downstream projects to have a limited dependency which doesn't pull in all of S3AFileSystem, aws-sdk etc. Would be a separate jira ofcourse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry it'd force even more version brittleness: "you are trying to use the hadoop-aws-plugins-3.2.1 with hadoop-aws-3.2.2". If you ever search for spark + s3 on stack overflow you can see that the #1 recurrent complaint is "I added hadoop-aws-3.1 to hadoop and now I get a class not found exception.

This is why the S3A troubleshooting docs start by telling people not to mix jars as all they do is move stack traces around https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/troubleshooting_s3a.html#Classpath_Setup

I don't want to make things worse. if you don't want the aws sdk, don't include it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws-plugins would help break some circular dependencies.
Users would ofcourse need to make sure the versions of aws-plugins and aws are the same. In fact, users would only import 'aws' which would automatically include the 'aws-plugins' - in that sense this would not be an incompatible change.
The intent of switching 'aws-plugins' to a separate module is to allow people writing plugins to not depend on everything, and potentially help with circular dependencies.

IAC, that's unrelated to this jira an PR. Can be taken up later if it makes sense.

/**
* Interface for S3A Delegation Token access.
*/
@Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auth.delegation package is declared as
@InterfaceAudience.LimitedPrivate("authorization-subsystems")
@InterfaceStability.Unstable

This is exactly the guarantees we should be making here. We don't know whether this is going to work, and should not make any commitments about stability.
you can just cut these lines are rely on package-info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import java.io.Closeable;
import java.io.IOException;
import java.util.LinkedList;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class has only gone in. I thought it was going into .impl, but clearly not

  1. Move in this patch to a subdir; s3a.auth is the most appropriate
  2. Sort out the imports

I don't want any new implementation classes to go into fs.s3a; I have a goal of marking the impl as private in java 11 modules. Please avoid adding things there unless unavoidable -and explain to have to justify. thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to s3a.auth.

What needs to change in the imports?
It's using

java.*
\n
Everything other than org.apache.*
\n
org.apache.*
\n
static imports

@@ -1879,3 +1879,61 @@ To disable checksum verification in `distcp`, use the `-skipcrccheck` option:
hadoop distcp -update -skipcrccheck -numListstatusThreads 40 /user/alice/datasets s3a://alice-backup/datasets
```

### <a name="customsigners"></a> Advanced - Custom Signers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably time to add a whole new authentication.md file, linked off the index.md file; index.md is a bit to big and we actually need to document things like the standard set of signers.

A new file actually makes merging easier...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly keeping the signer documentation vague. This is not a feature that's going to be used by a lot of people. The default signers will change with the SDK version - and are mentioned in the documentation already.
I'd prefer not having a page which talks only about signing (the new auth page) - again don't want to call this out since it's not something that majority of users will want to touch. Delegation is already a separate page from what I can tell.

@sidseth
Copy link
Contributor Author

sidseth commented Sep 26, 2019

Have you thought about how to do an integration test here? I could imagine a custom signer which just forwards to the AWS signer
Wasn't planning on adding any integration tests. Most of this can be tested quite easily with unit tests.

and what about collecting metrics on this, e.g. #of signing requests made. We could have another callback under org.apache.hadoop.fs.s3a.S3AInstrumentation which the signers could use to pass this info back
The default usage (no custom signers) will not be able to use any instrumentation, and I don't think we want to force a Wrapper Signer just for isntrumentation(may not even be possible given Signers cannot access configs and we would not know the real signer in a wrapper) Instrumentation could be passed as a parameter to SignerInitializer that is being added as part of this patch. I'll defer to you on whether adding the Instrumentation to the interface makes sense. Don't know enough about S3AInstrumentation and usage.

@sidseth
Copy link
Contributor Author

sidseth commented Oct 1, 2019

Updated PR based on the review comments, except for what is called out further up. Also adds unit tests and an integration test.

@sidseth sidseth requested a review from steveloughran October 1, 2019 01:29
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 1788 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1062 trunk passed
+1 compile 35 trunk passed
+1 checkstyle 27 trunk passed
+1 mvnsite 41 trunk passed
+1 shadedclient 793 branch has no errors when building and testing our client artifacts.
+1 javadoc 31 trunk passed
0 spotbugs 59 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 57 trunk passed
-0 patch 84 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 mvninstall 33 the patch passed
+1 compile 29 the patch passed
+1 javac 29 the patch passed
-0 checkstyle 20 hadoop-tools/hadoop-aws: The patch generated 11 new + 10 unchanged - 2 fixed = 21 total (was 12)
+1 mvnsite 33 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 777 patch has no errors when building and testing our client artifacts.
+1 javadoc 27 the patch passed
+1 findbugs 62 the patch passed
_ Other Tests _
+1 unit 71 hadoop-aws in the patch passed.
+1 asflicense 32 The patch does not generate ASF License warnings.
5020
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/2/artifact/out/Dockerfile
GITHUB PR #1516
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux d6ba1c17f3d8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b3275ab
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/2/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/2/testReport/
Max. process+thread count 446 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@sidseth
Copy link
Contributor Author

sidseth commented Oct 1, 2019

Tests run against a bucket in us-east-2
mvn -T 1C clean verify -Dparallel-tests -DtestsThreadCount=12 -Ds3guard -Dauth -Ddynamo
Passed, except for the usual flaky tests (ITestDynamoDBMetadataStore.setUp:159 » IllegalArgument Table s3guard-sseth-in-...)

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 2097 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1223 trunk passed
+1 compile 32 trunk passed
+1 checkstyle 25 trunk passed
+1 mvnsite 35 trunk passed
+1 shadedclient 851 branch has no errors when building and testing our client artifacts.
+1 javadoc 27 trunk passed
0 spotbugs 58 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 56 trunk passed
-0 patch 79 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 mvninstall 32 the patch passed
+1 compile 26 the patch passed
+1 javac 26 the patch passed
-0 checkstyle 18 hadoop-tools/hadoop-aws: The patch generated 11 new + 10 unchanged - 2 fixed = 21 total (was 12)
+1 mvnsite 31 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 850 patch has no errors when building and testing our client artifacts.
+1 javadoc 23 the patch passed
+1 findbugs 62 the patch passed
_ Other Tests _
+1 unit 70 hadoop-aws in the patch passed.
+1 asflicense 29 The patch does not generate ASF License warnings.
5582
Subsystem Report/Notes
Docker Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/3/artifact/out/Dockerfile
GITHUB PR #1516
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 03eee379ce5d 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b3275ab
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/3/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/3/testReport/
Max. process+thread count 357 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/3/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Oct 2, 2019
@steveloughran
Copy link
Contributor

patch LGTM, +1 once you fix whatever merge conflicts have crept in (Constants, inevitably)

regarding instrumentation, it'd make sense to have some interface for the signers to invoke with some signed/rejected counters; we'd have an implementation in S3AInstrumentation which would be the one normally passed down.
Now, if we also wanted to track signing latency, that would be fun -and it might something we'd always want to track, given the various extension points for auth which exist (AWS IAM stuff, our DT plugins, etc)

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 42 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1088 trunk passed
+1 compile 35 trunk passed
+1 checkstyle 29 trunk passed
+1 mvnsite 40 trunk passed
+1 shadedclient 793 branch has no errors when building and testing our client artifacts.
+1 javadoc 30 trunk passed
0 spotbugs 60 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 57 trunk passed
-0 patch 85 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 mvninstall 33 the patch passed
+1 compile 28 the patch passed
+1 javac 28 the patch passed
-0 checkstyle 20 hadoop-tools/hadoop-aws: The patch generated 11 new + 10 unchanged - 2 fixed = 21 total (was 12)
+1 mvnsite 33 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 778 patch has no errors when building and testing our client artifacts.
+1 javadoc 26 the patch passed
+1 findbugs 63 the patch passed
_ Other Tests _
+1 unit 72 hadoop-aws in the patch passed.
+1 asflicense 34 The patch does not generate ASF License warnings.
3294
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/4/artifact/out/Dockerfile
GITHUB PR #1516
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 230d3f683ca4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 685918e
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/4/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/4/testReport/
Max. process+thread count 412 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1516/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@sidseth
Copy link
Contributor Author

sidseth commented Oct 2, 2019

Fixed the merge conflicts, and have run mvn javadoc:javadoc successfully.
Tests run again against us-east-2.
Usual failures + ITestRestrictedReadAccess (which fails with and without the patch). Filed HADOOP-16626.

Thanks for the review. Merging the changes.

@sidseth sidseth merged commit 559ee27 into apache:trunk Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants