-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26271: Cleanup the broken store files under data directory #3786
Conversation
@Apache9, @wchevreuil This is the commit only containing changes related to the CleanerChore and exposing what is absolutely necessary for it's functionality. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I like the approach, it is almost the same with what I expected when I review your first PR.
The problems here are all implementation details, we need to make sure that there are no races so we do not miss some corner cases.
* This Chore, every time it runs, will clear the unsused HFiles in the data | ||
* folder. | ||
*/ | ||
@InterfaceAudience.Private public class FileBasedStoreFileCleaner extends ScheduledChore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not name it 'FileBased'. I think it should be more general, maybe 'BrokenStoreFileCleaner'.
For now, the only condition is whether we will write to the data directory directly. No matter what is the actual store file tracker implementation, if it writes to the data directory directly, then we need this cleaner, otherwise we do not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm going with BrokenStoreFileCleaner then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another name suggestion: "LeftoversStoreFileCleaner"... Up to you, @BukrosSzabolcs !
for (HRegion region : regionServer.getRegions()) { | ||
for (HStore store : region.getStores()) { | ||
//only clean do cleanup in store using file based storefile tracking | ||
if (store.getStoreEngine().requireWritingToTmpDirFirst()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good, this is exactly what I expected. So the only problem is the naming. Let's find a better name.
deleteFile(file, store, deletedFiles, failedDeletes); | ||
} | ||
|
||
private boolean isCompactingFile(FileStatus file, HStore store) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should move the below isXXXFile to HRegion or HStore, with the protection of some locks. Otherwise we may have race, or at least, we depend on a very flaky order of the testing of each condition.
And do we really need to test isCompactingFile here? The compactedFile will not be tracked but compacting files should always be tracked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should move the below isXXXFile to HRegion or HStore, with the protection of some locks. Otherwise we may have race, or at least, we depend on a very flaky order of the testing of each condition.
I guess the isOldEnough
check would prevent that problem, as we would only care about files older than DEFAULT_FILEBASED_STOREFILE_CLEANER_TTL
?
And do we really need to test isCompactingFile here? The compactedFile will not be tracked but compacting files should always be tracked?
Yeah, this seems redundant with the isActiveStorefile
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do we really need to test isCompactingFile here? The compactedFile will not be tracked but compacting files should always be tracked?
The confusion might come from suboptimal naming.
isActiveStorefile -> lists the currently active storefiles
isCompactedFile -> are hfiles that got compacted, no longer active storefiles, but were not deleted yet (deletion is handled by a separate subsystem, so we should not touch them)
isCompactingFile -> file(s) a currently running compaction is writing into. This will become the new storefile when the compaction is done. It is checked to make sure we do not break stuck/longrunning compactions even is they are stuck/idle for more than the configured TTL.
I think we do need all of these checks.
I suppose we should move the below isXXXFile to HRegion or HStore, with the protection of some locks. Otherwise we may have race, or at least, we depend on a very flaky order of the testing of each condition.
As Wellington mentioned, the TTL should make sure we filter out any currently handled hfile. Any file we check afterwards was not touched in 12 hours. I think the only usecase where this might be an issue is a stuck compaction where we check isCompactingFile, but that method (getCompactionTargets) is synchronised so should be safe. Do you see any other possible issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confusion might come from suboptimal naming.
isActiveStorefile -> lists the currently active storefiles
isCompactedFile -> are hfiles that got compacted, no longer active storefiles, but were not deleted yet (deletion is handled by a separate subsystem, so we should not touch them)
isCompactingFile -> file(s) a currently running compaction is writing into. This will become the new storefile when the compaction is done. It is checked to make sure we do not break stuck/longrunning compactions even is they are stuck/idle for more than the configured TTL.
I think we do need all of these checks.
Thanks for clarifying. Thinking again, maybe it's worth having this extra check, for the exceptional cases compaction can last longer than the time threshold.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Show resolved
Hide resolved
...er/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/StoreFileTracker.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/StoreFileTrackerBase.java
Outdated
Show resolved
Hide resolved
deleteFile(file, store, deletedFiles, failedDeletes); | ||
} | ||
|
||
private boolean isCompactingFile(FileStatus file, HStore store) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should move the below isXXXFile to HRegion or HStore, with the protection of some locks. Otherwise we may have race, or at least, we depend on a very flaky order of the testing of each condition.
I guess the isOldEnough
check would prevent that problem, as we would only care about files older than DEFAULT_FILEBASED_STOREFILE_CLEANER_TTL
?
And do we really need to test isCompactingFile here? The compactedFile will not be tracked but compacting files should always be tracked?
Yeah, this seems redundant with the isActiveStorefile
check.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The failing tests seem to be flaky. I was able to successfully run them locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-picky things from me, and one request for an additional test method.
Otherwise, it looks fine to me.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractMultiFileWriter.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Outdated
Show resolved
Hide resolved
...-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBrokenStoreFileCleaner.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBrokenStoreFileCleaner.java
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBrokenStoreFileCleaner.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Just to make my intent clear, I think with a test method to exercise the TTL effectiveness and the minor code-formatting cleanup, this is fine to commit. Not sure if @Apache9 has more opinions on this change before he'd like to see it merged to the feature branch. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if QA comes back positively.
Will let this go a day or two for Duo to come back with feedback if he has some.
return this.enabled.get(); | ||
} | ||
|
||
@InterfaceAudience.Private |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this extra IA annotation? The class itself is IA.Private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, deleting it.
return; | ||
} | ||
|
||
if(isCompactingFile(file, store)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'compacting file' has its meaning in HBase. There is a filesCompacting field in HStore class, which is the files being compacted currently, not the files written out. So here let's change the name, otherwise it will confuse people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm renaming it to "isCompactionResultFile"
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java
Show resolved
Hide resolved
...main/java/org/apache/hadoop/hbase/regionserver/compactions/AbstractMultiOutputCompactor.java
Show resolved
Hide resolved
#3700 has been merged and I rebased HBASE-26067 with the newest master. Please rebase the PR too. Thanks for your patient. We are very close to merge this PR. |
) Signed-off-by: Duo Zhang <[email protected]>
… tracker (apache#3681) Signed-off-by: Josh Elser <[email protected]>
Add new chore to delete lefotver files in case file based storefile handling is used Expose the target files of currently running compactions for easier validation
fixes based on feedback
added javadoc fixed typos additonal fileTTL UT
improve naming and add some explanation in the comments
clean up after rebase
a86b9ca
to
ae0fa65
Compare
@Apache9 Rebased my changes |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Just need to add more javadoc or comments to let later developers better understand the code.
And please fix the checkstyle issue before merging.
Thanks @BukrosSzabolcs
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java
Show resolved
Hide resolved
...main/java/org/apache/hadoop/hbase/regionserver/compactions/AbstractMultiOutputCompactor.java
Show resolved
Hide resolved
checkstyle fixes additonal comments
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
…he#3786) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Conflicts: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java
…he#3786) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Conflicts: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java
…he#3786) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Conflicts: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java
…he#3786) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Conflicts: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java
…others) to branch-2.5 Previous cherry picks: commit 6aaef89 HBASE-26064 Introduce a StoreFileTracker to abstract the store file tracking logic commit 43b40e9 HBASE-25988 Store the store file list by a file apache#3578) commit 6e05376 HBASE-26079 Use StoreFileTracker when splitting and merging apache#3617) commit 090b2fe HBASE-26224 HBASE-26224 Introduce a MigrationStoreFileTracker to support migratin… apache#3656) commit 0ee1689 HBASE-26246 Persist the StoreFileTracker configurations to TableDescriptor when creating table apache#3666) commit 2052e80 HBASE-26248 Should find a suitable way to let users specify the store… apache#3665) commit 5ff0f98 HBASE-26264 Add more checks to prevent misconfiguration on store file… apache#3681) commit fc4f6d1 HBASE-26280 HBASE-26280 Use store file tracker when snapshoting apache#3685) commit 06db852 HBASE-26326 CreateTableProcedure fails when FileBasedStoreFileTracker… apache#3721) commit e4e7cf8 HBASE-26386 Refactor StoreFileTracker implementations to expose the s… apache#3774) commit 08d1171 HBASE-26328 Clone snapshot doesn't load reference files into FILE SFT impl apache#3749) commit 8bec26e HBASE-26263 [Rolling Upgrading] Persist the StoreFileTracker configur… apache#3700) commit a288365 HBASE-26271: Cleanup the broken store files under data directory apache#3786) commit d00b5fa HBASE-26454 CreateTableProcedure still relies on temp dir and renames… apache#3845) commit 771e552 HBASE-26286: Add support for specifying store file tracker when restoring or cloning snapshot commit f16b7b1 HBASE-26265 Update ref guide to mention the new store file tracker im… apache#3942) commit 755b3b4 HBASE-26585 Add SFT configuration to META table descriptor when creating META apache#3998) commit 39c42c7 HBASE-26639 The implementation of TestMergesSplitsAddToTracker is pro… apache#4010) commit 6e1f5b7 HBASE-26586 Should not rely on the global config when setting SFT implementation for a table while upgrading apache#4006) commit f1dd865 HBASE-26654 ModifyTableDescriptorProcedure shoud load TableDescriptor… apache#4034) commit 8fbc9a2 HBASE-26674 Should modify filesCompacting under storeWriteLock apache#4040) commit 5aa0fd2 HBASE-26675 Data race on Compactor.writer apache#4035) commit 3021c58 HBASE-26700 The way we bypass broken track file is not enough in Stor… apache#4055) commit a8b68c9 HBASE-26690 Modify FSTableDescriptors to not rely on renaming when wr… apache#4054) commit dffeb8e HBASE-26587 Introduce a new Admin API to change SFT implementation (#… apache#4080) commit b265fe5 HBASE-26673 Implement a shell command for change SFT implementation apache#4113) commit 4cdb380 HBASE-26640 Reimplement master local region initialization to better … apache#4111) commit 77bb153 HBASE-26707: Reduce number of renames during bulkload (apache#4066) apache#4122) commit a4b192e HBASE-26611 Changing SFT implementation on disabled table is dangerous apache#4082) commit d3629bb HBASE-26837 Set SFT config when creating TableDescriptor in TestClone… apache#4226) commit 541d748 HBASE-26881 Backport HBASE-25368 to branch-2 (apache#4267) Fixups for precommit error prone, checkstyle, and javadoc warnings after applying cherry picks. Signed-off-by: Josh Elser <[email protected]> Reviewed-by: Wellington Ramos Chevreuil <[email protected]>
Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Conflicts: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobStoreCompactor.java
…others) to branch-2.5 Previous cherry picks: commit 6aaef89 HBASE-26064 Introduce a StoreFileTracker to abstract the store file tracking logic commit 43b40e9 HBASE-25988 Store the store file list by a file #3578) commit 6e05376 HBASE-26079 Use StoreFileTracker when splitting and merging #3617) commit 090b2fe HBASE-26224 HBASE-26224 Introduce a MigrationStoreFileTracker to support migratin… #3656) commit 0ee1689 HBASE-26246 Persist the StoreFileTracker configurations to TableDescriptor when creating table #3666) commit 2052e80 HBASE-26248 Should find a suitable way to let users specify the store… #3665) commit 5ff0f98 HBASE-26264 Add more checks to prevent misconfiguration on store file… #3681) commit fc4f6d1 HBASE-26280 HBASE-26280 Use store file tracker when snapshoting #3685) commit 06db852 HBASE-26326 CreateTableProcedure fails when FileBasedStoreFileTracker… #3721) commit e4e7cf8 HBASE-26386 Refactor StoreFileTracker implementations to expose the s… #3774) commit 08d1171 HBASE-26328 Clone snapshot doesn't load reference files into FILE SFT impl #3749) commit 8bec26e HBASE-26263 [Rolling Upgrading] Persist the StoreFileTracker configur… #3700) commit a288365 HBASE-26271: Cleanup the broken store files under data directory #3786) commit d00b5fa HBASE-26454 CreateTableProcedure still relies on temp dir and renames… #3845) commit 771e552 HBASE-26286: Add support for specifying store file tracker when restoring or cloning snapshot commit f16b7b1 HBASE-26265 Update ref guide to mention the new store file tracker im… #3942) commit 755b3b4 HBASE-26585 Add SFT configuration to META table descriptor when creating META #3998) commit 39c42c7 HBASE-26639 The implementation of TestMergesSplitsAddToTracker is pro… #4010) commit 6e1f5b7 HBASE-26586 Should not rely on the global config when setting SFT implementation for a table while upgrading #4006) commit f1dd865 HBASE-26654 ModifyTableDescriptorProcedure shoud load TableDescriptor… #4034) commit 8fbc9a2 HBASE-26674 Should modify filesCompacting under storeWriteLock #4040) commit 5aa0fd2 HBASE-26675 Data race on Compactor.writer #4035) commit 3021c58 HBASE-26700 The way we bypass broken track file is not enough in Stor… #4055) commit a8b68c9 HBASE-26690 Modify FSTableDescriptors to not rely on renaming when wr… #4054) commit dffeb8e HBASE-26587 Introduce a new Admin API to change SFT implementation (#… #4080) commit b265fe5 HBASE-26673 Implement a shell command for change SFT implementation #4113) commit 4cdb380 HBASE-26640 Reimplement master local region initialization to better … #4111) commit 77bb153 HBASE-26707: Reduce number of renames during bulkload (#4066) #4122) commit a4b192e HBASE-26611 Changing SFT implementation on disabled table is dangerous #4082) commit d3629bb HBASE-26837 Set SFT config when creating TableDescriptor in TestClone… #4226) commit 541d748 HBASE-26881 Backport HBASE-25368 to branch-2 (#4267) Fixups for precommit error prone, checkstyle, and javadoc warnings after applying cherry picks. Signed-off-by: Josh Elser <[email protected]> Reviewed-by: Wellington Ramos Chevreuil <[email protected]>
…he#3786) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Change-Id: If9825f5199385e8ccc65f4f7637461f3651a6836
Add new chore to delete lefotver files in case file based storefile
handling is used
Expose the target files of currently running compactions for easier
validation