Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17511. Add audit/telemetry logging to S3A connector #2675

Closed

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Feb 2, 2021

Superceded by #2807

Notion of AuditSpan which is created for a given operation; goal is
to pass it along everywhere.

It's thread local per FS-instance; store operations pick this up in their
constructor from the StoreContext.

The entryPoint() method in S3A FS has been enhanced to initiate the spans.
For this to work, internal code SHALL NOT call those entry points (Done)
and all public API points MUST be declared as entry points. Tricky given
the lack of layering in this branch.

The auditing is intended to be a plugin point; currently there is one
which logs at Info (yes, noisy but good for development)

Change-Id: Idddbbd3138e52b40e97f9c2383453190a02cd8ec

NOTICE

Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute

@steveloughran steveloughran requested a review from bgaborg February 2, 2021 16:25
@steveloughran steveloughran marked this pull request as draft February 2, 2021 16:26
@steveloughran steveloughran added enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Feb 2, 2021
@steveloughran
Copy link
Contributor Author

Testing? Not yet. It compiles, and I've done a few of the ITests.

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch from 00c3ce7 to 445754f Compare February 2, 2021 18:04
@steveloughran
Copy link
Contributor Author

Latest PR has mkdir under files working by the look of things

textURI
[ERROR] testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextURI)  Time elapsed: 6.385 s  <<< FAILURE!
java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:86)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.junit.Assert.assertFalse(Assert.java:64)
	at org.junit.Assert.assertFalse(Assert.java:74)
	at org.apache.hadoop.fs.FileContextURIBase.testMkdirsFailsForSubdirectoryOfExistingFile(FileContextURIBase.java:260)

@apache apache deleted a comment from hadoop-yetus Feb 3, 2021
@apache apache deleted a comment from hadoop-yetus Feb 3, 2021
@apache apache deleted a comment from kabeersdev Feb 3, 2021
@apache apache deleted a comment from hadoop-yetus Feb 4, 2021
@apache apache deleted a comment from hadoop-yetus Feb 4, 2021
@apache apache deleted a comment from hadoop-yetus Feb 4, 2021
@apache apache deleted a comment from hadoop-yetus Feb 4, 2021
@apache apache deleted a comment from hadoop-yetus Feb 4, 2021
@steveloughran
Copy link
Contributor Author

Latest release adds operation context info to the referer logs so can generate a trace mapping S3 operations to the FS ops and S3A FS instance making the call. No tie-in to opentelemetry tho'. Future work.

183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 0E574923013D27E3 REST.DELETE.OBJECT fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 "DELETE /fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 HTTP/1.1" 204 - - - 14 - "http://hadoop.apache.org/audit/0ce4df25-02cc-43d0-98a2-6b13f12662dd/span/17.08.39#343/operation/op_rename" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.15.7 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/AdoptOpenJDK" - EE5czDIkLDNEPLxZ8XDADyRrZQ5KvstPk1E4Hei1qceDWnG+nAg7CrutshO9eEhmRhBuviommqY= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 1CF7A2A5E1AE304E REST.HEAD.OBJECT fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir2/subDir2/newfile1 "HEAD /fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir2/subDir2/newfile1 HTTP/1.1" 404 NoSuchKey 417 - 8 - "http://hadoop.apache.org/audit/0ce4df25-02cc-43d0-98a2-6b13f12662dd/span/17.08.39#341/operation/op_exists" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.15.7 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/AdoptOpenJDK" - 1FGjqpSb90398PGXq7yUMEWrJnWb5Gk8tGHXCj5adoFJxVleU0ffo93JQBJ/6kEL8+glLrpS0Ik= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 243F9FE15CB25A9E REST.HEAD.OBJECT fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 "HEAD /fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 HTTP/1.1" 200 - - 0 5 - "-" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.15.7 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/AdoptOpenJDK" - h4p6Og/ROqiWR7THNlvl7jK8mWSNbWbne2HfUvi+HO88IsW/seDQ1o8KvTJ6+FxgBKbH2JgCW24= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 470666E02E16F8B6 REST.GET.BUCKET - "GET /?list-type=2&delimiter=%2F&max-keys=2&prefix=fork-0003%2Ftest%2FITestS3AContractDistCp%2FtestUpdateDeepDirectoryStructureToRemote%2Fremote%2FDELAY_LISTING_ME%2FoutputDir%2FinputDir%2FsubDir2%2FsubDir2%2Fnewfile1%2F&fetch-owner=false HTTP/1.1" 200 - 412 - 13 12 "http://hadoop.apache.org/audit/0ce4df25-02cc-43d0-98a2-6b13f12662dd/span/17.08.39#343/operation/op_rename" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.15.7 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/AdoptOpenJDK" - p6rYqjQGRRT7jzz1tpdyshDJ673ul1A4YYHb/CFLOTv8++McNJYMxev+98vjEU1Kute5ggU3sFM= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 5F2E21B0F0071218 REST.COPY.OBJECT fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir2/subDir2/newfile1 "PUT /fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir2/subDir2/newfile1 HTTP/1.1" 200 - 234 0 41 19 "http://hadoop.apache.org/audit/0ce4df25-02cc-43d0-98a2-6b13f12662dd/span/17.08.39#343/operation/op_rename" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.15.7 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/AdoptOpenJDK com.amazonaws.services.s3.transfer.TransferManager/1.11.901" - bkgJI8wkqzT6Cx+jsbLFB6Mra/qLJiwa0VxvginE5nk3xBT15dBp0Tn8m3HLom2oChaybsbUSFc= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 5F2E21B0F0071218 REST.COPY.OBJECT_GET fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 - 200 - - 0 - - - - - bkgJI8wkqzT6Cx+jsbLFB6Mra/qLJiwa0VxvginE5nk3xBT15dBp0Tn8m3HLom2oChaybsbUSFc= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 82F908B5DE7BCF3D REST.HEAD.OBJECT fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 "HEAD /fork-0003/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/.distcp.tmp.attempt_local1771928054_0008_m_000001_0.1612458639648 HTTP/1.1" 200 - - 0 6 - "http://hadoop.apache.org/audit/0ce4df25-02cc-43d0-98a2-6b13f12662dd/span/17.08.39#343/operation/op_rename" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.15.7 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/AdoptOpenJDK" - 4czEsj1JWzHooqkzqfCW2W/U0ofSs3KUpDUWD9HeYE2POs7AtUP9vG/wDWOCG/vMRZmogEOxMZs= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 9774E0C9B42FB919 BATCH.DELETE.OBJECT fork-0003/ - 204 - - - - - - - - pbNOdYmZ24OIqstCyMnVcnT8PQlxJlYSnFSIiZrGLFjy6/UHFcySp78w0gaB0TgxRO2CJTakx+A= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 9774E0C9B42FB919 BATCH.DELETE.OBJECT fork-0003/test/ - 204 - - - - - - - - pbNOdYmZ24OIqstCyMnVcnT8PQlxJlYSnFSIiZrGLFjy6/UHFcySp78w0gaB0TgxRO2CJTakx+A= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [04/Feb/2021:17:10:40 +0000] 109.157.193.10 arn:aws:iam::152813717728:user/stevel-dev 9774E0C9B42FB919 BATCH.DELETE.OBJECT fork-0003/test/ITestS3AContractDistCp/ - 204 - - - - - - - - pbNOdYmZ24OIqstCyMnVcnT8PQlxJlYSnFSIiZrGLFjy6/UHFcySp78w0gaB0TgxRO2CJTakx+A= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-

@steveloughran
Copy link
Contributor Author

HADOOP-15711. Auditing: scope/coverage

running all the tests with auditing set to fail if any call is made
unaudited highlights scope for improvement.

Production

  • Critical: nested scopes
  • Committers: scoping
  • S3GuardTool spans around lower-level operations.
  • Multipart listing to become a spanned entry point.

Test

  • create spans before all tests doing low-level things

Still a WiP, needs

  • new tests for new code
  • full coverage
  • stability on http referrer format. Even if doesn't yet use opentelemetry, I'd like compatible names/structure

+thinking of seeing how to push in telemetry prepareRequest() into AWS SDK logic so all SDK calls have span checking/header addition, and production code would actually be slightly leaner.

Needs me to understand the SDK a bit better, then for the audit manager to use an appropriate plugin point in the
SDK to hand off to the active audit for preparation.

@steveloughran
Copy link
Contributor Author

Example log from a bit of a test run;
logs2021-02-05-15-25-40-BD60AB2786B3EBED.txt

@steveloughran steveloughran changed the title HADOOP-17511. Add audit logging to S3A connector HADOOP-17511. Add audit/telemetry logging to S3A connector Feb 5, 2021
Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the audit support, I think it's really useful.
I added some comments with my review.

Path path = qualify(f);
// Unless that iterator is closed, the iterator wouldn't be closed
// there.
entryPoint(INVOCATION_LIST_LOCATED_STATUS, path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not using try with resource here. maybe it's justified, I'm just pointing it out because maybe it's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to explain it above. The iterator is picking it up. We do the same for all ops which return something (listing, input stream, output stream...) which outlives the API. The audit span goes with the object and so is left open. Listings are a troublespot as they are unlikely ever to be closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest PR is moving back to try-with-resource/some closure mechanism. As close() only deactivates the span, it does not close it.

@steveloughran
Copy link
Contributor Author

@bgaborg: thanks for the comments, will take them on. I've done another iteration of this which won't currently compile (pulls in the request factory from my refactoring, to do the prepareRequest() in there rather than throughout the source. Also I'm trying to make that context for the logs really useful by wiring up spark/MR jobs and including that info. We will be able to see which job is doing the IO, rather than just the user. sweet eh?

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch from 1a7729c to 0399008 Compare February 16, 2021 19:32
@apache apache deleted a comment from hadoop-yetus Feb 16, 2021
@apache apache deleted a comment from hadoop-yetus Feb 16, 2021
@steveloughran
Copy link
Contributor Author

Testing: all those which talk to landsat or common crawl fail -that's what happens when you switch to the new s3 client constructor. Which I didn't think I had done, not on this branch.

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch from 0399008 to 2e7e4e9 Compare February 17, 2021 15:55
@steveloughran
Copy link
Contributor Author

Latest PR moves the prepareRequest() callback of the active span into the AWS SDK.

HEAD^ does it in the request factory construction -that's still supported, but if we can keep the AWS SDK wiring -that will be better

  • guarantee of being invoked on all S3 operations, even those done in the client during startup
  • means we get audit/log of calls like doesBucketExistV2 without us reimplementing them
  • means we can simplify the work I've just done regarding marking operations as OnceRaw and walking up/down the callstack

But: this does mean we need to solve the problem of endpoints and regions

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch 2 times, most recently from dfa4985 to 15cf403 Compare February 17, 2021 20:34
@apache apache deleted a comment from hadoop-yetus Feb 22, 2021
@apache apache deleted a comment from hadoop-yetus Mar 15, 2021
@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch from 84162d6 to 480ec15 Compare March 17, 2021 14:23
@steveloughran
Copy link
Contributor Author

This PR is (And will repeatedly be) rebased on top of #2778; as that gets revised this will be force pushed.

Latest PR pulls out getContentStatus as it turns out some apps use it too much. The stats and logs will now make clear how much IO it uses.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 37 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 0s Maven dependency ordering for branch
+1 💚 mvninstall 23m 14s trunk passed
+1 💚 compile 22m 29s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 18m 47s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 4m 4s trunk passed
+1 💚 mvnsite 2m 17s trunk passed
-1 ❌ javadoc 0m 58s /branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt hadoop-common in trunk failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚 javadoc 2m 7s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 3m 32s trunk passed
+1 💚 shadedclient 16m 50s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 20s Maven dependency ordering for patch
+1 💚 mvninstall 1m 26s the patch passed
+1 💚 compile 21m 40s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
-1 ❌ javac 21m 40s /results-compile-javac-root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 1 new + 1955 unchanged - 1 fixed = 1956 total (was 1956)
+1 💚 compile 18m 52s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
-1 ❌ javac 18m 52s /results-compile-javac-root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 1 new + 1850 unchanged - 1 fixed = 1851 total (was 1851)
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 3m 52s /results-checkstyle-root.txt root: The patch generated 48 new + 192 unchanged - 7 fixed = 240 total (was 199)
+1 💚 mvnsite 2m 14s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
-1 ❌ javadoc 0m 57s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
-1 ❌ javadoc 0m 37s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
-1 ❌ spotbugs 1m 26s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 17m 35s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 45s hadoop-common in the patch passed.
+1 💚 unit 2m 31s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 50s The patch does not generate ASF License warnings.
207m 36s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
Dead store to awsResponse in org.apache.hadoop.fs.s3a.audit.LoggingAuditor$LoggingAuditSpan.afterResponse(Request, Response) At LoggingAuditor.java:org.apache.hadoop.fs.s3a.audit.LoggingAuditor$LoggingAuditSpan.afterResponse(Request, Response) At LoggingAuditor.java:[line 268]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2675/28/artifact/out/Dockerfile
GITHUB PR #2675
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname Linux c905bc15640e 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0403406d6e1e192a0a39e5b91c7f466ded845512
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2675/28/testReport/
Max. process+thread count 2604 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2675/28/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Mar 18, 2021
@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch 2 times, most recently from da731e2 to dba6ea2 Compare March 19, 2021 18:51
@liuml07 liuml07 self-requested a review March 20, 2021 03:25
@steveloughran
Copy link
Contributor Author

will have to rebase. (sighs)

last version removed some of the callback from the AWS SDK related to copy as I didn't think it was needed/actually worked. it is. The transfer manager initiates its multipart copy in an executor, somehow we need to add the relevant trace data there.


[ERROR] Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.034 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.scale.ITestS3AHugeFilesDiskBlocks
[ERROR] test_100_renameHugeFile(org.apache.hadoop.fs.s3a.scale.ITestS3AHugeFilesDiskBlocks)  Time elapsed: 1.194 s  <<< ERROR!
java.nio.file.AccessDeniedException: tests3ascale/disk/hugefile: org.apache.hadoop.fs.s3a.audit.AuditFailureException: 49-00000001 unaudited operation Executing a request outside an audit span {multipart_upload_started 'tests3ascale/disk/hugefileRenamed' size=0, mutating=true}
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:119)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:322)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:318)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:293)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFile(S3AFileSystem.java:4100)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.access$1300(S3AFileSystem.java:241)
	at org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.copyFile(S3AFileSystem.java:1943)
	at org.apache.hadoop.fs.s3a.impl.RenameOperation.copySourceAndUpdateTracker(RenameOperation.java:636)
	at org.apache.hadoop.fs.s3a.impl.RenameOperation.renameFileToDest(RenameOperation.java:364)
	at org.apache.hadoop.fs.s3a.impl.RenameOperation.execute(RenameOperation.java:309)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1870)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$rename$7(S3AFileSystem.java:1717)
	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2186)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:1715)
	at org.apache.hadoop.fs.s3a.scale.AbstractSTestS3AHugeFiles.test_100_renameHugeFile(AbstractSTestS3AHugeFiles.java:511)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.s3a.audit.AuditFailureException: 49-00000001 unaudited operation Executing a request outside an audit span {multipart_upload_started 'tests3ascale/disk/hugefileRenamed' size=0, mutating=true}
	at org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor$WarningSpan.beforeExecution(LoggingAuditor.java:386)
	at org.apache.hadoop.fs.s3a.audit.impl.ActiveAuditManager$WrappingAuditSpan.beforeExecution(ActiveAuditManager.java:394)
	at org.apache.hadoop.fs.s3a.audit.impl.ActiveAuditManager.beforeExecution(ActiveAuditManager.java:238)
	at org.apache.hadoop.fs.s3a.audit.impl.ActiveAuditManager$SdkRequestHandler.beforeExecution(ActiveAuditManager.java:277)
	at com.amazonaws.AmazonWebServiceClient.beforeClientExecution(AmazonWebServiceClient.java:787)
	at com.amazonaws.services.s3.AmazonS3Client.initiateMultipartUpload(AmazonS3Client.java:3528)
	at com.amazonaws.services.s3.transfer.internal.CopyCallable.initiateMultipartUpload(CopyCallable.java:250)
	at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInParts(CopyCallable.java:168)
	at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:130)
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:132)
	at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:43)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more

@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch from dba6ea2 to 8468a7f Compare March 22, 2021 10:17
Moves to the builder API for AWS S3 client creation, and
offers a similar-style API to the S3AFS and tests, hiding
the details of what options are client, what are AWS Conf,
and doing the wiring up of S3A stats interfaces to the AWS
internals.

This will break HBase's HBoss test suite, but I couldn't see a way to
cleanly address this. I'll need to work on that code separately

Change-Id: I4fb05f4a54eece8fac96abf8d6af9c889392a6c1
Change-Id: I9ea831f5fec48ebcc7d08fe0e5d6918f8dfa2786
Notion of AuditSpan which is created for a given operation; goal is
to pass it along everywhere.

It's thread local per FS-instance; store operations pick this up in their
constructor from the StoreContext.

The entryPoint() method in S3A FS has been enhanced to initiate the spans.
For this to work, internal code SHALL NOT call those entry points (Done)
and all public API points MUST be declared as entry points. Tricky given
the lack of layering in this branch.

The auditing is intended to be a plugin point; currently there is one
which logs at Info (yes, noisy but good for development)

Change-Id: Idddbbd3138e52b40e97f9c2383453190a02cd8ec

HADOOP-15711. Audit Log.

- compiles
- activate() is no-op if current thread audit span is the one to activate.
- input streams pick up context
- some audit-driven tuning of test teardown !

running tests: key omitted coverage is commit operations.

One thing to (seriously) consider is creating a WriteOperationsHelper
for each writing op, storing the context in it, then activating on each
operation. Need to consider full lifecycle.

Change-Id: I2998c4980fce3c65984fd728ec98e3299d4c329d

HADOOP-17511. Audit Log

* mkdir working again
* review changed code and tune slightly
* strip out some surplus getFileStatus calls from contract test utils.

Change-Id: I5124d6e9579abbe650fe53e2531aff50dc2fedcd

HADOOP-15711. Auditing

Listing is (probably) wired up
IOStats integration too

Change-Id: I4293c99d7a67a05d659ec3c690daeab5c883a5fb

HADOOP-15711. Auditing

* Dynamic loading of audit class
* IOStatistics integration is better
* Fewer (no?) activation of active span in current thread
* log audit uses timestamp of creation as partial ID; and
  unique counter per instance (for better multi-fs logging)
* thread pools for s3 buckets include bucket name (gets into log4j logs)

Change-Id: Id2f977354851f3dd96c005450d46b8181dbf567e

HADOOP-15711. S3A Auditing

All FS entry points instrumented for auditing

* WriteOperationHelper is no longer a singleton and picks up the
  active span when created. This lets us use it in places like
  commit logic without them needing changed everywhere
* various helper classes of S3A now extend AbstractStoreOperation
  so pick up their span automatically. With the WriteOperationHelper
  code it's *less* critical, but...
* before almost every S3 API call, the active span gets to "prepare"
  the request. That's where headers can be set, etc.
  Even without header manipulation -this is where checks for calls without
  a valid span can go.
* which I'm trying to do in the logging code, including incrementing the .failed
  counter. Doesn't seem to be live tho

Yes, this is a fairly large patch, but not through complexity, just that
every entry point needs to become a span.

Life would be much, much easier with an MVC-style structure as its
~impossible to be confident that various public S3AFS API calls aren't
being made externally.

Change-Id: I22dfe6f5bdb7bae4c4ba14f2a77e460f12ca1f14

HADOOP-17511. Auditing

* adds referer header to AWS requests -this gets into the logs
  (todo: optional)
* can declare that lack of a span for an s3 request is a failure
  (this doesn't get picked up in copy ops BTW)
* passes span through submitted callbacks
* moves most of the span management logic into ActiveAuditManager
* getWriteOperationHelper() now dynamically creates a helper within
  the active span, rather than a long-lived static instance.

Change-Id: Iecb9911a84421cec0e33011db60159c7c6ee79f3

HADOOP-15711. Auditing: scope/coverage

running all the tests with auditing set to fail if any call is made
unaudited highlights scope for improvement.

Production
* Critical: nested scopes
* Committers: scoping
* S3GuardTool spans around lower-level operations.
* Multipart listing to become a spanned entry point.

Test
* create spans before all tests doing low-level things

Change-Id: Ic3577681181942af1a0807dc78ff6393bb3d325f

HADOOP-17511. audit elements as ?k=v params; MR/job ID.

* multiple params go in as key=value
* includes UGI principal
* and the MR or spark job which created the FS instances/auditor

For MR/Distcp, JobID will work. For spark, it's not enough. Assuming
that the FS instances are cached across tasks (they should be for performance
reasons) the scope is incorrect.

We'd really need task setup patching in a Context scope somewhere

Change-Id: Idc4e7eec29e0137e002be4ae56efed8798b8e337

HADOOP-17511. auditing ongoing

* CommonContext for JobID from spark/MR jobs
* RequestFactory from the HADOOP-16848 refactoring -preparation takes
  place there
* new s3a.api package for the new API stuff

Change-Id: Ib11e53304506897f2366672b268257e668a403ed

HADOOP-15711. Auditing: use RequestFactory everywhere and wrap invocation
as appropriate

Change-Id: Ibabb5828c9db018cd9262ae814d37a1a1a15fb02

HADOOP-17511. Auditing: request preparation now happens in AWS SDK

Change-Id: Ib2be23510db72b6152658523ae7c79117f0e4391

HADOOP-17511 Auditing; with HADOOP-13551 metrics.

* Get the new builder working, with metric wireup the test
* AuditSpan prepareRequest() ops are called through S3 SDK
* List iterators are cleaned up in s3a code and FS Shell; this
  closes the spans.
* Starting on a document

Change-Id: I728fa071f81da5dbc52fd517b537004107cdf65b

HADOOP-15711. Auditing -trying to get COPY To audit

Wiring up through request header to try to attach a context to
all requests.
But: the context list is *not* passed through the xfer manager

Next attempt: state change callbacks. This may work for
one thread, but as many blocks are copied across different threads,
it lacks complete coverage.

Plan: give up on trying to get an audit log entry for copy/rename.
The only next step would be to move off xfer manager and do it ourselves,
which isn't necessarily bad, but it is extra work.

Or: somehow the thread pool/executor could be made "clever", with
every thread submitted through the transfer manager (created uniquely for
each copy) to pick up the span

Change-Id: I424675228a0fd9b45c0b1ce8de24b3c20d9cfdfe

HADOOP-15711. Auditing.

* lots more of logging and useful stuff at that
* Input stream correctly creating requests through the system.

This includes moving the input stream away from direct use of the s3 client
to through an interface -the S3AFS impl includes the audit span and
sets it up.

Change-Id: Ic8969e4444333cf557247aff8c1bcaca64ed9ebf

HADOOP-15711. Move ~all setup into AuditManager

* AuditManager is CompositeService; wraps active audit Service
* Unit tests for span lifecycle showed more work was needed there
  in order to have the "reset span" (naming?) always avaiable, active...
* logging auditor moves very low level logs to trace; preparation -> debug

Change-Id: I0ef81dd3d26fe2b8eee943c73808e4791c1a1dd7

HADOOP-15711. Auditing: mkdir working; listing getting complex

All the mkdir logic is moved into MkdirOperation with callbacks for required
FS calls; tests isolate this and verify cost. All within the same audit span.

Listing:

* the span to use is passed down and cached in the object listing iterator,
where it is passed in to the listing callbacks.
* the close() of the span is no longer performed when the object listing completes.

The complication here is that the delete and rename calls also use
the same operations; they don't want their spans closed prematurely.

Change-Id: I773e98694c4403857b48ddaa529303498c8d752b

HADOOP-15711. Auditing for multipart listing; other failures

Find/fix failures triggered by ops happening outside spans, either
genuine (multipart, getBucketLocation()) or from test cases which
bypass the public entry points.

Change-Id: I88aba852161320c9391541522eaba73e4a5f708d
Proposed: test suite teardown to reset any span

HADOOP-15711. Auditing: redefine AuditSpan lifecycle

declare that AuditSpans last until GC'd, moving to making sure
that every use deactivates after. Lets me move all of the span knowledge
from S3ABlockOutputStream into WriteOperationHelper...should see if
the same can be done for other bits.

Change-Id: I28627438d2ba9f0a1118f05f3a4a4d7191830db2

HADOOP-15711. Auditing: invocation coverage and enhancement

- mkdirs and getFilesStatus are duration; the tracking of that is integrated
with the auditing.
- StoreContext provides access to the RequestFactory for anything which wants
  to make AWS requests.
-ExecutingStoreOperation extends CallableRaisingIOE<T>. This allows an
operation to be passed straight into a closure/executor!

Change-Id: Id24c7d3b8025a6d058d0195fea6def45bcfbe9a7

HADOOP-15711. Auditing: S3AFS integrate audit and duration invocations

Makes for cleaner code which also measures API duration.

Change-Id: I7586b39630972007491d4229d9e28cbc7e39d074
* added getContentSummary as a single span, with some minor speedups
* still trying to do best design to wire up dynamically evaluated attributes

Currently logs should include the current thread ID; we don't yet pick up and
include the thread where the span was created, which is equally important

Change-Id: Ieea88e4228da0ac4761d8c006051cd1095c5fce8
* Select is spanned properly
* HttpReferrer adds dynamic eval of current thread.
* Every entry point in S3A code which creates a span is tagged
  @AuditSpan.AuditEntryPoint. This is to hint what is/isn't spanned.

Public code MUST go through an AuditEntryPoint; internal code
within an existing span MUST NOT -doing so will lose the first
span.

+revert test fix of ITestAssumeRole as it is covered in HADOOP-17476/apache#2777

Change-Id: I2cfdba907df0dcc76629e5af8effeed8d25d5933
* move audit impl classes in private package fs.s3a.audit.impl;
  logging + no op auditors; ActiveAuditManager
* AuditIntegration provides indirect access to these as needed.
* HeaderProcessing has its own callbacks class; reverted recent addition
  of StoreContext getObjectMetadata call.
* fixed latest round of style errors
* updated docs with updated PNG
* tuning referrer entry: moving all args towards ? params as they are logged.
* log audit manager lifecycle events
+ also: S3AFS logs stats @ debug in close

Change-Id: If3785aff94489dc2867396c0c14ebe277f1fb48b
Change-Id: Ifd1b217861c646a9c17b56614bd1f6da570e3d15
+ an interface ActiveThreadSpanSource to give current thread span.

This is to allow integrations with the AWS SDK &C to query
the active span for an FS and immediately be able to identify span by ID,
for logging etc.

Adding a unique ID to all audit spans and supporting active thread span
(with activate/deactivate)
causes major changes in the no-op code, as suddenly there's a lot more state
there across manager, auditor and span.

will be testable though.

Change-Id: Id4dddcab7b735bd01f3d3b8a8236ff6da8f97671
@steveloughran steveloughran force-pushed the s3/HADOOP-17511-auditing branch from af0e305 to 4c41298 Compare March 23, 2021 14:21
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 18s #2675 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #2675
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2675/34/console
versions git=2.17.1
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Mar 23, 2021
@apache apache deleted a comment from hadoop-yetus Mar 23, 2021
@apache apache deleted a comment from hadoop-yetus Mar 23, 2021
@apache apache deleted a comment from hadoop-yetus Mar 23, 2021
@apache apache deleted a comment from hadoop-yetus Mar 23, 2021
@steveloughran
Copy link
Contributor Author

PR no longer applies, even with rebase. Will have to close and resubmit

@steveloughran
Copy link
Contributor Author

See #2807 for successor PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants