-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Useful Hudi CLI commands to debug/analyze production workloads #477
Conversation
@n3nash @vinothchandar : Please review when you get a chance. |
8fd7cbc
to
96dc397
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. LG overall
@@ -50,6 +50,7 @@ public boolean isShowArchivedCommitAvailable() { | |||
|
|||
@CliCommand(value = "show archived commits", help = "Read commits from archived files and show details") | |||
public String showCommits( | |||
@CliOption(key = {"skipMetadata"}, help = "Ordering", unspecifiedDefaultValue = "false") boolean skipMetadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix copy-paste on help message. should default be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import org.springframework.stereotype.Component; | ||
|
||
@Component | ||
public class FileSystemViewCommand implements CommandMarker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very useful.. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, descending, limit, headerOnly, rows); | ||
} | ||
|
||
@CliCommand(value = "show rollback", help = "Read commits from archived files and show details") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having commits and rollback mixed as terminology is very confusing.. can you call them (comments/string/variables) as instants? or Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the help string needs correction. Read "rollbacks"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Fixing the copy/paste errors.
List<Comparable[]> rows = new ArrayList<>(); | ||
fsView.getAllFileGroups().forEach(fg -> fg.getAllFileSlices().forEach(fs -> { | ||
int idx = 0; | ||
Comparable[] row = new Comparable[readOptimizedOnly ? 5 : 8]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are "5" and "8" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment above this line. For ReadOptimized Views, this command does not display any delta-file related columns. As the display framework uses arrays, I need to bound the size differently for ReadOptimized vs Realtime displays
|
||
HoodieTableFileSystemView fsView = buildFileSystemView(globRegex, maxInstant, readOptimizedOnly, includeMaxInstant, | ||
includeInflight, excludeCompaction); | ||
List<Comparable[]> rows = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a large partition or a glob that gets all partitions, will this list oom ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran show fsview all for the entire production trips dataset and noticed the jvm memory usage. Went to around 2G resident memory. No OOMs. the default memory setting was good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay awesome
return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows); | ||
} | ||
|
||
@CliCommand(value = "show fsview latest", help = "Show latest file-system view") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this command is very useful, but it's pretty involved as well, requiring deep knowledge of hoodie. I wonder if there's a way to simplify naming conventions for anybody to understand and use this command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be add some comments before each operation to say what it's result is supposed to mean ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : Actually, most of the help texts were copy/paste. Went through one round of it and fixed the help texts. Please take a look after I update this PR
Few minor comments, rest LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be merged after addressing minor comments.
96dc397
to
9b6081e
Compare
@n3nash @vinothchandar : Addressed review comments. Please review |
9b6081e
to
227009a
Compare
@bvaradar Not sure if you got a chance to look at my comments... |
LGTM, ready to be merged |
This PR does not include Compaction CLI commands. Those will come in separate PR. This PR contains miscellaneous CLIs which were useful when debugging issues that came up when testing Async compaction in production.
E:g output:
oodie:stock_ticks_mor->show fsview all
18/10/02 18:22:24 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:24 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1797410908_11, ugi=root (auth:SIMPLE)]]]
18/10/02 18:22:24 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
18/10/02 18:22:24 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:24 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:24 INFO timeline.HoodieActiveTimeline: Loaded instants [[20181002180759__clean__COMPLETED], [20181002180759__deltacommit__COMPLETED], [20181002181337__clean__COMPLETED], [20181002181337__deltacommit__COMPLETED]]
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
| Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta Files| Total Delta File Size| Delta Files |
|==============================================================================================================================================================================================================================================================================================================================================================================================================|
| 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet| 432.5 KB | 1 | 20.8 KB | [HoodieLogFile {hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]|
hoodie:stock_ticks_mor->show rollbacks
18/10/02 18:22:42 INFO timeline.HoodieActiveTimeline: Loaded instants []
__________________________________________________________________________________________
| Instant| Rolledback Instant| Total Files Deleted| Time taken in millis| Total Partitions|
|=========================================================================================|
hoodie:stock_ticks_mor->show fsview latest --partitionPath "2018/08/31"
18/10/02 18:22:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:54 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1797410908_11, ugi=root (auth:SIMPLE)]]]
18/10/02 18:22:54 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
18/10/02 18:22:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:54 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/stock_ticks_mor
18/10/02 18:22:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20181002180759__clean__COMPLETED], [20181002180759__deltacommit__COMPLETED], [20181002181337__clean__COMPLETED], [20181002181337__deltacommit__COMPLETED]]
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
| Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta Files| Total Delta Size| Delta Size - compaction scheduled| Delta Size - compaction unscheduled| Delta To Base Ratio - compaction scheduled| Delta To Base Ratio - compaction unscheduled| Delta Files - compaction scheduled | Delta Files - compaction unscheduled|
|=================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================|
| 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet| 432.5 KB | 1 | 20.8 KB | 20.8 KB | 0.0 B | 0.0 B | 0.0 B | [HoodieLogFile {hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]| [] |
hoodie:stock_ticks_mor->