Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useful Hudi CLI commands to debug/analyze production workloads #477

Merged
merged 1 commit into from
Oct 30, 2018

Conversation

bvaradar
Copy link
Contributor

@bvaradar bvaradar commented Oct 2, 2018

This PR does not include Compaction CLI commands. Those will come in separate PR. This PR contains miscellaneous CLIs which were useful when debugging issues that came up when testing Async compaction in production.

E:g output:

oodie:stock_ticks_mor->show fsview all
18/10/02 18:22:24 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:24 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1797410908_11, ugi=root (auth:SIMPLE)]]]
18/10/02 18:22:24 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
18/10/02 18:22:24 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:24 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:24 INFO timeline.HoodieActiveTimeline: Loaded instants [[20181002180759__clean__COMPLETED], [20181002180759__deltacommit__COMPLETED], [20181002181337__clean__COMPLETED], [20181002181337__deltacommit__COMPLETED]]
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
| Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta Files| Total Delta File Size| Delta Files |
|==============================================================================================================================================================================================================================================================================================================================================================================================================|
| 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet| 432.5 KB | 1 | 20.8 KB | [HoodieLogFile {hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]|

hoodie:stock_ticks_mor->show rollbacks
18/10/02 18:22:42 INFO timeline.HoodieActiveTimeline: Loaded instants []
__________________________________________________________________________________________
| Instant| Rolledback Instant| Total Files Deleted| Time taken in millis| Total Partitions|
|=========================================================================================|

hoodie:stock_ticks_mor->show fsview latest --partitionPath "2018/08/31"
18/10/02 18:22:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:54 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1797410908_11, ugi=root (auth:SIMPLE)]]]
18/10/02 18:22:54 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
18/10/02 18:22:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:54 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20181002180759__clean__COMPLETED], [20181002180759__deltacommit__COMPLETED], [20181002181337__clean__COMPLETED], [20181002181337__deltacommit__COMPLETED]]
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
| Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta Files| Total Delta Size| Delta Size - compaction scheduled| Delta Size - compaction unscheduled| Delta To Base Ratio - compaction scheduled| Delta To Base Ratio - compaction unscheduled| Delta Files - compaction scheduled | Delta Files - compaction unscheduled|
|=================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================|
| 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet| 432.5 KB | 1 | 20.8 KB | 20.8 KB | 0.0 B | 0.0 B | 0.0 B | [HoodieLogFile {hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]| [] |

hoodie:stock_ticks_mor->

@bvaradar
Copy link
Contributor Author

bvaradar commented Oct 2, 2018

@n3nash @vinothchandar : Please review when you get a chance.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. LG overall

@@ -50,6 +50,7 @@ public boolean isShowArchivedCommitAvailable() {

@CliCommand(value = "show archived commits", help = "Read commits from archived files and show details")
public String showCommits(
@CliOption(key = {"skipMetadata"}, help = "Ordering", unspecifiedDefaultValue = "false") boolean skipMetadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix copy-paste on help message. should default be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.springframework.stereotype.Component;

@Component
public class FileSystemViewCommand implements CommandMarker {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very useful.. thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, descending, limit, headerOnly, rows);
}

@CliCommand(value = "show rollback", help = "Read commits from archived files and show details")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having commits and rollback mixed as terminology is very confusing.. can you call them (comments/string/variables) as instants? or Am I missing something here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the help string needs correction. Read "rollbacks"...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Fixing the copy/paste errors.

List<Comparable[]> rows = new ArrayList<>();
fsView.getAllFileGroups().forEach(fg -> fg.getAllFileSlices().forEach(fs -> {
int idx = 0;
Comparable[] row = new Comparable[readOptimizedOnly ? 5 : 8];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are "5" and "8" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment above this line. For ReadOptimized Views, this command does not display any delta-file related columns. As the display framework uses arrays, I need to bound the size differently for ReadOptimized vs Realtime displays


HoodieTableFileSystemView fsView = buildFileSystemView(globRegex, maxInstant, readOptimizedOnly, includeMaxInstant,
includeInflight, excludeCompaction);
List<Comparable[]> rows = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a large partition or a glob that gets all partitions, will this list oom ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran show fsview all for the entire production trips dataset and noticed the jvm memory usage. Went to around 2G resident memory. No OOMs. the default memory setting was good enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay awesome

return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
}

@CliCommand(value = "show fsview latest", help = "Show latest file-system view")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this command is very useful, but it's pretty involved as well, requiring deep knowledge of hoodie. I wonder if there's a way to simplify naming conventions for anybody to understand and use this command

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be add some comments before each operation to say what it's result is supposed to mean ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash : Actually, most of the help texts were copy/paste. Went through one round of it and fixed the help texts. Please take a look after I update this PR

@n3nash
Copy link
Contributor

n3nash commented Oct 24, 2018

Few minor comments, rest LGTM

@n3nash n3nash self-requested a review October 24, 2018 15:10
Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be merged after addressing minor comments.

@bvaradar
Copy link
Contributor Author

@n3nash @vinothchandar : Addressed review comments. Please review

@n3nash
Copy link
Contributor

n3nash commented Oct 29, 2018

@bvaradar Not sure if you got a chance to look at my comments...

@n3nash
Copy link
Contributor

n3nash commented Oct 29, 2018

LGTM, ready to be merged

@vinothchandar vinothchandar merged commit 25cd05b into apache:master Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants