-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compaction validate, unschedule and repair #481
Conversation
@n3nash : Please review and let @vinothchandar know if the PR looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.. Most of it looks like you are moving out, we have internally..
import org.apache.spark.api.java.JavaSparkContext; | ||
|
||
/** | ||
* Client to perform admin operations related to compaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a plan to use this class to schedule compaction internally as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of keeping only potentially unsafe (non MVCC ) compaction operation in this CompactionAdminClient (manually invoked with no compaction/write is happening).
|
||
private final Config cfg; | ||
|
||
public HoodieCompactorAdmin(Config cfg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to "HoodieCompactionDriver" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or HoodieCompactionTool
. Something that calls out this is an utility to run and not an internal class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the class one should consider to be scheduled via a Workflow manager ? I am wondering if we need a separate class or we can just run a series of hoodie-cli commands instead passed into stdin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rename to HoodieCompactionAdminTool. This class only contain methods to unschedule and repair compactions - which requires manual triggering and guarantee that no concurrent compaction/ingestion is happening. The general compaction triggering methods - compaction schedule and compaction run are in the original HoodieCompactor class.
I kept the pattern is similar to other Hoodie-CLI commands (e:g - hdfsparquetimport) which involves running Spark-Launcher process.
There are 2 options to run - One to invoke from Hoodie-CLI and another to run directly as spark-submit. Agree that this class must not be scheduled as part of Workflow-Manager. But, for whatever reasons, if someone want to submit this as spark-submit job, its possible with this approach.
|
||
Validating a compaction plan : Check if all the files necessary for compactions are present and are valid | ||
|
||
``` | ||
hoodie:trips->compaction validate --compactionInstant <instantId> | ||
hoodie:stock_ticks_mor->compaction validate --instant 20181005222611 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an example of what an invalid plan looks like and what error the user can expect ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
docs/admin_guide.md
Outdated
``` | ||
hoodie:trips->compaction unscheduleFileId --fileId <FileUUID> | ||
.... | ||
No File renames needed to unschedule compaction. Compaction successfully unscheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this text correct ? Here if we unschedule a fileId, may be we should say 👍
No File renames needed to unschedule file id from compaction. FileId successfully unscheduled from compaction plan.. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was coming from a reused method. Customized the message.
``` | ||
|
||
##### Repair Compaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!
ObjectInputStream in = new ObjectInputStream(fsDataInputStream); | ||
T result = (T) in.readObject(); | ||
log.info("Result : " + result); | ||
in.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put in try - finally ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import java.io.Serializable; | ||
|
||
/** | ||
* Holds Operation result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc doesn't help :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more context around the usage.
Minor comments, didn't go through each command in very detail but in general looks fine to me. |
b7b88e0
to
de74153
Compare
@vinothchandar @n3nash : Addressed review comments. |
de74153
to
ee11726
Compare
CLI commands to validate, unschedule and repair pending compactions.