Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction validate, unschedule and repair #481

Merged
merged 1 commit into from
Oct 25, 2018

Conversation

bvaradar
Copy link
Contributor

@bvaradar bvaradar commented Oct 9, 2018

CLI commands to validate, unschedule and repair pending compactions.

@bvaradar
Copy link
Contributor Author

bvaradar commented Oct 9, 2018

@n3nash : Please review and let @vinothchandar know if the PR looks good.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.. Most of it looks like you are moving out, we have internally..

import org.apache.spark.api.java.JavaSparkContext;

/**
* Client to perform admin operations related to compaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a plan to use this class to schedule compaction internally as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of keeping only potentially unsafe (non MVCC ) compaction operation in this CompactionAdminClient (manually invoked with no compaction/write is happening).


private final Config cfg;

public HoodieCompactorAdmin(Config cfg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to "HoodieCompactionDriver" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or HoodieCompactionTool. Something that calls out this is an utility to run and not an internal class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the class one should consider to be scheduled via a Workflow manager ? I am wondering if we need a separate class or we can just run a series of hoodie-cli commands instead passed into stdin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename to HoodieCompactionAdminTool. This class only contain methods to unschedule and repair compactions - which requires manual triggering and guarantee that no concurrent compaction/ingestion is happening. The general compaction triggering methods - compaction schedule and compaction run are in the original HoodieCompactor class.

I kept the pattern is similar to other Hoodie-CLI commands (e:g - hdfsparquetimport) which involves running Spark-Launcher process.
There are 2 options to run - One to invoke from Hoodie-CLI and another to run directly as spark-submit. Agree that this class must not be scheduled as part of Workflow-Manager. But, for whatever reasons, if someone want to submit this as spark-submit job, its possible with this approach.


Validating a compaction plan : Check if all the files necessary for compactions are present and are valid

```
hoodie:trips->compaction validate --compactionInstant <instantId>
hoodie:stock_ticks_mor->compaction validate --instant 20181005222611
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an example of what an invalid plan looks like and what error the user can expect ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

```
hoodie:trips->compaction unscheduleFileId --fileId <FileUUID>
....
No File renames needed to unschedule compaction. Compaction successfully unscheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this text correct ? Here if we unschedule a fileId, may be we should say 👍

No File renames needed to unschedule file id from compaction. FileId successfully unscheduled from compaction plan.. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was coming from a reused method. Customized the message.

```

##### Repair Compaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

ObjectInputStream in = new ObjectInputStream(fsDataInputStream);
T result = (T) in.readObject();
log.info("Result : " + result);
in.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in try - finally ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import java.io.Serializable;

/**
* Holds Operation result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc doesn't help :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more context around the usage.

@n3nash
Copy link
Contributor

n3nash commented Oct 24, 2018

Minor comments, didn't go through each command in very detail but in general looks fine to me.

@bvaradar bvaradar force-pushed the async_compac_cli_part_2 branch from b7b88e0 to de74153 Compare October 25, 2018 17:00
@bvaradar
Copy link
Contributor Author

@vinothchandar @n3nash : Addressed review comments.

@bvaradar bvaradar force-pushed the async_compac_cli_part_2 branch from de74153 to ee11726 Compare October 25, 2018 17:28
@vinothchandar vinothchandar merged commit 07324e7 into apache:master Oct 25, 2018
vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants