-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26938 Compaction failures after StoreFileTracker integration #4338
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Outdated
Show resolved
Hide resolved
// This step is necessary for the correctness of BrokenStoreFileCleanerChore. It lets the | ||
// CleanerChore know that compaction is done and the file can be cleaned up if compaction | ||
// have failed. | ||
storeEngine.resetCompactionWriter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave it in the API, but the current implementation only set the writer
field to null, and so the method bodies become empty after that field is converted into a parameter and they no longer have anything to do, so I removed it.
If we are going to keep it, we need to use a Map
instead of Set
to track the writers, and somehow need to pass a key as a parameter to abort and remove a StoreFileWriter
instance. What should be the key? The CompactionRequestImpl
? I do not think there is a requirement to abort compaction writers in this way. We abort compactions today by interrupting the thread.
If BrokenStoreFileCleanerChore
will not function correctly without this, then it will need modification.
I think BrokenStoreFileCleanerChore
sees the same results from getCompactionTargets
after these changes. When the compaction is finished the StoreFileWriter
will be removed from the set in the finally
block of compact
, so getCompactionTargets
will not include the files being written by that writer after that point, which is the same thing that happened when resetCompactionWriter
would cause the writer
field in the previous impl to become null, and also the files being written by that writer would no longer appear in getCompactionTargets
results afterward. But the timing has changed, that is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically we need this to keep correctness. IIRC, the problem here is that, we can only cleanup the writer instance after we successfully commit the store files to SFT, i.e, after the replaceStoreFile method. That's why we can not just simply remove the writer instance in commitWriter, otherwise there could be data loss, i.e, the BrokenStoreFileCleanerChore may delete the store files which are written just now but have not been added to the SFT yet...
Let me check again if the new implementation can solve the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that, you know BrokenStoreFileCleanerChore
best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually there is even a problem here in the current code, let me fix it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, at least now we do not remove the writer from the set until after commit.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Outdated
Show resolved
Hide resolved
Pushed updates responding to review feedback. |
ce237a9
to
55206be
Compare
🎊 +1 overall
This message was automatically generated. |
FYI, we have some folks running this on a cluster against S3 now. I'll find their Github IDs to tag them here, so we can keep you up to date on real test runs :) edit: hat-tip for @chrajeshbabu for now |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
return Collections.emptyList(); | ||
} else { | ||
// Finished, commit the writer's results. | ||
return commitWriter(writer, fd, request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commitWriter here just means append metadata and close the writer, it does not mean to record the files in SFT...
We still need to add something after the replaceStoreFile call to remove the writer...
So maybe you are right, we need to use a Map instead of Set, the key can be the CompactionRequestImpl?
FWIW, we had a big |
Last set of changes broke tests so I will revert them. @Apache9 I think we need to go back to the drawing board here. We need to put the call out to SFT logic somewhere else, or we need to make Compactor something that is created per compaction as was the assumption behind the changes that added the 'writer' field and requires the semantics offered by that. I will try the latter. |
Creating Compactor per compaction maybe too big? FWIW, we have a Compactor instance in StoreEngine, if we want to make it per compaction, it will cause a very big refactoring. So I suggest that, we add something like a compactionId to the CompactionRequest interface, and use it as the key for our map, when calling StoreEngine.replaceStoreFile, we pass this id in, then we could use this to remove the writer in the Compactor. WDYT? |
Edit: I had a change of heart because the result is not bad and solves a couple of related problems. Anyway my other idea wouldn't work because of resetCompactionWriter in StoreFileEngine, which assumes that Compactor is a singleton, even though the other SFT changes assume it is per compaction. |
@Apache9 I have been testing with my original workaround for what it's worth, #4334 . That change does not allow concurrent compaction against a given store, respecting that Compactor is not thread safe for now. It works well. The performance of the test scenario is unchanged from baseline without any SFT changes. As an option to unblock us we could use it for now and come back to the implementation issues with SFT and compaction in a follow up issue. |
Maybe we do not call it resetCompactionWriter? We just call it cleanupCompaction or something else, which indicate that the compaction is finally finished, and the compactor should release all the related resources of this compaction. |
I do not think this is a good way to solve the problem, we do allow concurrent compactions happen at the same time in the past... |
I still think it is messy. |
I agree. It would be to unblock us to give more time to think about SFT design around compaction, not a permanent solution. Anyway I will update this PR as promised soon. |
I agree with your feelings, Andrew. Correctness first and then optimization.
Acknowledging this too: yes, we should be able to compact two distinct subsets of the files in a store concurrently. And, I could see value in doing so (e.g. compacting three larger files in a store and also wanting to compact a few smaller hfiles created from memstore flushes -- we don't want to wait for that big compaction to finish to run the smaller one). |
Posting an update soon. Making sure compaction unit tests all pass first. |
@joshelser We have the other approach to fall back on but I think we can make the current code work with some polish. |
Compactions might be concurrent against a given store and the Compactor is shared among them. Do not put mutable state into shared class fields. All Compactor class fields should be final or effectively final. 'keepSeqIdPeriod' is an exception to this rule because unit tests may set it.
Pushed an update that brings all of the above discussion together.
All compaction unit tests pass.
|
🎊 +1 overall
This message was automatically generated. |
I think the key problem here is the compactionTargets is used during the whole compaction life time, but the state is only stored in compactor, which is only used for generating the new store files, so no matter what we do, it is still very awkward. I got another idea in mind that, maybe we could just pass a StoreFileWriterTargetCollector to the compactor, the compactor implementation will add new target to the collector when creating new store file writer. And the collector is stored in StoreEngine or HStore, so we are free to clean it up at any time. The compactor does not need to do the cleanup work any more. I could implement a POC to see if it works. In general, I think we should also track the target of flushed store files as well, but this is a missing part in the original patch of HBASE-26271. Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Please see if this POC could make things better... I've introduced a a StoreFileWriterCreationTracker(actually in most places it is just a Consumer), for recording the creation of a StoreFileWriter, and this time we could also track the written files for flush. The trackers are maintained in HStore, so in compactor we do not need to store them any more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long before really digging into the code. Your solution makes sense to me, Andrew. I also see why Duo doesn't think this is optimal.
Going over to his commit to look at that now.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
Show resolved
Hide resolved
// This signals that the target file is no longer written and can be cleaned up | ||
completeCompaction(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying: we only need to call completeCompaction in the exceptional case (compaction didn't finish normally)? And commitWriter()
down below is doing the same kind of cleanup work that completeCompaction()
here is doing?
if (writer instanceof StoreFileWriter) { | ||
targets.add(((StoreFileWriter) writer).getPath()); | ||
} else { | ||
((AbstractMultiFileWriter) writer).writers().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: make this an else if (writer instance of AbstractMultiFileWriter)
to be defensive and have the else
branch to fail loudly if something goes wrong.
writerMap.remove(request); | ||
progressMap.remove(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that you're generally OK having these two synchronized data structures updated not under the same lock? Given the current use of writerMap
and progressMap
, nothing is jumping out at me as bad as you currently have it. Just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care if either of the remove()
calls returns null
(i.e. the map doesn't have the mapping in it)?
Let @Apache9 propose a PR for his solution that moves the scope of where this state is tracked, makes sense to me. |
One thing I will note is I also tried to clean up progress reporting, which will need to be separately addressed. |
Oh, sorry, I didn't mean to push my solution. Why I implement a POC is that seems we all think the current approach in this PR is a bit messy as we can not clean up the state with the 'try..finally' style, so I just give us an impression that what the code will look like if we can do the cleanup with the 'try...finally' style. In fact I think the current solution in this PR is better enough to solve the problem for now and all related classes are IA.Private, so we are free to apply this PR first and then change to other approachs in the future. So I'm neutrality on these two approach, especially if we want to fix the problem ASAP. Then we will have plenty of time to discuss what is a better solution, as it will not block the 2.5.0 release. Just my thpughts, sorry if I didn't say this clearly before posting the POC. Anyway, if you guys think the approach in POC is generally good, let me convert it to a PR so you can better review it. Thanks~ |
Compactions might be concurrent against a given store and the Compactor is shared among them. Do not put mutable state into shared class fields. All Compactor class fields should be final. At the moment 'keepSeqIdPeriod' is an exception to this rule because some unit tests change it.
Compactor#getProgress and Compactor#getCompactionTargets now return union results of all compactions in progress against the store.