-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" -everything but the switch #5859
HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" -everything but the switch #5859
Conversation
3be8800
to
6a2b198
Compare
testing: s3 london. this is a backport and as it doesn't include the contentious issue "actually changing the switch" then I'm happy to cherrypick as is. lets see what the tests say now I've rolled back the default |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
``` | ||
|
||
Example: test with `markers=keep` | ||
This is the default and does not need to be explicitly set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops; wrong
💔 -1 overall
This message was automatically generated. |
…rything but the switch This change has all of PR apache#5689 *except* for changing the default value of marker retention from keep to delete. 1. leaves the default value of fs.s3a.directory.marker.retention at "delete" 2. no longer prints a message when an S3A FS instances is instantiated with any option other than delete. 3. Updates the directory marker documentation Switching to marker retention improves performance on any S3 bucket as there are no needless marker DELETE requests -leading to a reduction in write IOPS and and any delays waiting for the DELETE call to finish. There are *very* significant improvements on versioned buckets, where tombstone markers slow down LIST operations: the more tombstones there are, the worse query planning gets. Having versioning enabled on production stores is the foundation of any data protection strategy, so this has tangible benefits in production. Marker deletion is *not* compatible with older hadoop releases; specifically - Hadoop branch 2 < 2.10.2 - Any release of Hadoop 3.0.x and Hadoop 3.1.x - Hadoop 3.2.0 and 3.2.1 - Hadoop 3.3.0 Incompatible releases have no problems reading data in stores where markers are retained, but can get confused when deleting or renaming directories. Contributed by Steve Loughran Change-Id: Ic9a05357a4b1b1ff6dfecf8b0f30e1eeedb2fe75
…rything but the switch update testing.md to remove statement that default is keep Change-Id: Ic28ad7d9fe17566ee0603b3f6fc41f27df754222
b986515
to
25e9322
Compare
💔 -1 overall
This message was automatically generated. |
OK, backport has gone in cleanly. This does not change the default value, as discussed, even though I'd like to. |
This change has all of PR #5689 except for changing the default value of marker retention from keep to delete.
Switching to marker retention improves performance on any S3 bucket as there are no needless marker DELETE requests -leading to a reduction in write IOPS and and any delays waiting for the DELETE call to finish.
There are very significant improvements on versioned buckets, where tombstone markers slow down LIST operations: the more tombstones there are, the worse query planning gets.
Having versioning enabled on production stores is the foundation of any data protection strategy, so this has tangible benefits in production.
Marker deletion is not compatible with older hadoop releases; specifically
Contributed by Steve Loughran
Change-Id: Ic9a05357a4b1b1ff6dfecf8b0f30e1eeedb2fe75
Description of PR
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?