Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" -everything but the switch #5859

Conversation

steveloughran
Copy link
Contributor

This change has all of PR #5689 except for changing the default value of marker retention from keep to delete.

  1. leaves the default value of fs.s3a.directory.marker.retention at "delete"
  2. no longer prints a message when an S3A FS instances is instantiated with any option other than delete.
  3. Updates the directory marker documentation

Switching to marker retention improves performance on any S3 bucket as there are no needless marker DELETE requests -leading to a reduction in write IOPS and and any delays waiting for the DELETE call to finish.

There are very significant improvements on versioned buckets, where tombstone markers slow down LIST operations: the more tombstones there are, the worse query planning gets.

Having versioning enabled on production stores is the foundation of any data protection strategy, so this has tangible benefits in production.

Marker deletion is not compatible with older hadoop releases; specifically

  • Hadoop branch 2 < 2.10.2
  • Any release of Hadoop 3.0.x and Hadoop 3.1.x
  • Hadoop 3.2.0 and 3.2.1
  • Hadoop 3.3.0 Incompatible releases have no problems reading data in stores where markers are retained, but can get confused when deleting or renaming directories.

Contributed by Steve Loughran

Change-Id: Ic9a05357a4b1b1ff6dfecf8b0f30e1eeedb2fe75

Description of PR

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran steveloughran force-pushed the s3/HADOOP-18752-marker-retention-keep-branch-3.3 branch from 3be8800 to 6a2b198 Compare July 19, 2023 17:56
@steveloughran
Copy link
Contributor Author

testing: s3 london. this is a backport and as it doesn't include the contentious issue "actually changing the switch" then I'm happy to cherrypick as is. lets see what the tests say now I've rolled back the default

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 49m 15s branch-3.3 passed
+1 💚 compile 0m 42s branch-3.3 passed
+1 💚 checkstyle 0m 37s branch-3.3 passed
+1 💚 mvnsite 0m 50s branch-3.3 passed
+1 💚 javadoc 0m 42s branch-3.3 passed
+1 💚 spotbugs 1m 16s branch-3.3 passed
+1 💚 shadedclient 37m 32s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 43s the patch passed
+1 💚 compile 0m 32s the patch passed
+1 💚 javac 0m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 22s the patch passed
+1 💚 mvnsite 0m 38s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 spotbugs 1m 11s the patch passed
+1 💚 shadedclient 37m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 42s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 41s The patch does not generate ASF License warnings.
139m 44s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/1/artifact/out/Dockerfile
GITHUB PR #5859
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux d6e4a71d159d 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 6a2b198
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu118.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/1/testReport/
Max. process+thread count 585 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 53m 22s branch-3.3 passed
+1 💚 compile 0m 38s branch-3.3 passed
+1 💚 checkstyle 0m 30s branch-3.3 passed
+1 💚 mvnsite 0m 45s branch-3.3 passed
+1 💚 javadoc 0m 34s branch-3.3 passed
+1 💚 spotbugs 1m 12s branch-3.3 passed
+1 💚 shadedclient 42m 50s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 39s the patch passed
+1 💚 compile 0m 31s the patch passed
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 25s the patch passed
+1 💚 spotbugs 1m 14s the patch passed
+1 💚 shadedclient 40m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 31s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
160m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/2/artifact/out/Dockerfile
GITHUB PR #5859
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 4b8cb5861852 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 6a2b198
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu118.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/2/testReport/
Max. process+thread count 527 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/2/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

```

Example: test with `markers=keep`
This is the default and does not need to be explicitly set.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops; wrong

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 7m 18s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 52m 43s branch-3.3 passed
+1 💚 compile 0m 43s branch-3.3 passed
+1 💚 checkstyle 0m 36s branch-3.3 passed
+1 💚 mvnsite 0m 50s branch-3.3 passed
+1 💚 javadoc 0m 42s branch-3.3 passed
+1 💚 spotbugs 1m 17s branch-3.3 passed
+1 💚 shadedclient 37m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 40s the patch passed
+1 💚 compile 0m 33s the patch passed
+1 💚 javac 0m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 22s the patch passed
+1 💚 mvnsite 0m 38s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 spotbugs 1m 10s the patch passed
+1 💚 shadedclient 37m 9s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 47s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 40s The patch does not generate ASF License warnings.
149m 39s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/3/artifact/out/Dockerfile
GITHUB PR #5859
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 7e5380e1a46b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / b986515
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu118.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/3/testReport/
Max. process+thread count 552 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/3/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

…rything but the switch

This change has all of PR apache#5689 *except* for changing the
default value of marker retention from keep to delete.

1. leaves the default value of fs.s3a.directory.marker.retention
   at "delete"
2. no longer prints a message when an S3A FS instances is
   instantiated with any option other than delete.
3. Updates the directory marker documentation

Switching to marker retention improves performance
on any S3 bucket as there are no needless marker DELETE requests
-leading to a reduction in write IOPS and and any delays waiting
for the DELETE call to finish.

There are *very* significant improvements on versioned buckets,
where tombstone markers slow down LIST operations: the more
tombstones there are, the worse query planning gets.

Having versioning enabled on production stores is the foundation
of any data protection strategy, so this has tangible benefits
in production.

Marker deletion is *not* compatible with older hadoop releases;
specifically
- Hadoop branch 2 < 2.10.2
- Any release of Hadoop 3.0.x and Hadoop 3.1.x
- Hadoop 3.2.0 and 3.2.1
- Hadoop 3.3.0
Incompatible releases have no problems reading data in stores
where markers are retained, but can get confused when deleting
or renaming directories.

Contributed by Steve Loughran

Change-Id: Ic9a05357a4b1b1ff6dfecf8b0f30e1eeedb2fe75
…rything but the switch

update testing.md to remove statement that default is keep

Change-Id: Ic28ad7d9fe17566ee0603b3f6fc41f27df754222
@steveloughran steveloughran force-pushed the s3/HADOOP-18752-marker-retention-keep-branch-3.3 branch from b986515 to 25e9322 Compare July 20, 2023 15:31
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 5m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ branch-3.3 Compile Tests _
+1 💚 mvninstall 36m 5s branch-3.3 passed
+1 💚 compile 0m 30s branch-3.3 passed
+1 💚 checkstyle 0m 24s branch-3.3 passed
+1 💚 mvnsite 0m 37s branch-3.3 passed
+1 💚 javadoc 0m 30s branch-3.3 passed
+1 💚 spotbugs 0m 54s branch-3.3 passed
+1 💚 shadedclient 23m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 24s the patch passed
+1 💚 javac 0m 24s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 27s the patch passed
+1 💚 javadoc 0m 19s the patch passed
+1 💚 spotbugs 0m 50s the patch passed
+1 💚 shadedclient 23m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 20s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
99m 11s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/4/artifact/out/Dockerfile
GITHUB PR #5859
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux ca5c4acd6d1b 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 25e9322
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu118.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/4/testReport/
Max. process+thread count 681 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5859/4/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

OK, backport has gone in cleanly. This does not change the default value, as discussed, even though I'd like to.

@steveloughran steveloughran merged commit 210104e into apache:branch-3.3 Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants