-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Hive file system listing #18179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just not sure how much does it buy us given we have cache in place anyway.
Also can the cache itself be keyed on scheme? Not on whole filesystem object?
I was finding that cache hits were somewhat uncommon because TrinoFileSystem instances are fairly short lived in the Hive code.
No, unfortunately not. ADLS Gen2 can be used in either a hierarchical or non-hierarchical mode and I don't think it is indicated by a different scheme. |
40bbfd9
to
67a5c7b
Compare
67a5c7b
to
7dcc4a4
Compare
Reduce the number of times Hive needs to call the file system listing API when using common file systems. Additionally, avoid unnecessary directory exists checks when file listing returns a non-empty result.
7dcc4a4
to
f0ef639
Compare
try { | ||
return ImmutableList.copyOf(new HiveFileIterator(table, location, fs, directoryLister, hdfsNamenodeStats, FAIL)); | ||
HiveFileIterator fileIterator = new HiveFileIterator(table, location, fs, directoryLister, hdfsNamenodeStats, FAIL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will iterator creation report error if location does not exists? If so will this error be as nice as one generated by checkPartitionLocationExists
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will iterator creation report error if location does not exists?
It will not, that's why the check is there.
Description
Reduce the number of times Hive needs to call the file system listing API when using common file systems.
Additional context and related issues
Relates to: #17323
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: