-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple OrcPageSource from HiveColumnHandle #1557
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Only one style comment.
presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
797eb74
to
f37659e
Compare
@dain Thanks. Updated. |
presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSourceFactory.java
Outdated
Show resolved
Hide resolved
ImmutableList.Builder<Type> typesBuilder = ImmutableList.builder(); | ||
for (int columnIndex = 0; columnIndex < columns.size(); columnIndex++) { | ||
HiveColumnHandle column = columns.get(columnIndex); | ||
checkState(column.getColumnType() == REGULAR, "column type must be regular"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this logic go?
Am I correct this is handled by HivePageSourceProvider
?
If so, I'd name this commit differently. Something signalling you're simply reusing information already collected (removing unnecessary code). Or, you may include this information as part of commit message body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The useful logic in this block is the checking done by recordReader.isColumnPresent
. I moved it to getNextPage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi I had to do some digging when I reviewed this. This code is a duplicate of https://github.com/prestosql/presto/blob/f37659ee6ce907e51a75f325ed744c2fcbc459ca/presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSourceFactory.java#L193-L201 I think this is just cruft from refactorings.
presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
Block[] blocks = new Block[hiveColumnIndexes.length]; | ||
for (int fieldId = 0; fieldId < blocks.length; fieldId++) { | ||
if (constantBlocks[fieldId] != null) { | ||
blocks[fieldId] = constantBlocks[fieldId].getRegion(0, batchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're no longer reusing constantBlocks
. I think this is OK. @dain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both this and the streaming, I'm not super concerned as this code is only executed once per page. That said, we haven't measured the impact, and the new code isn't dramatically more readable, so I'd would have left is as is to keep risk down. I don't have a strong preference, so I open to either solution.
@@ -66,43 +55,15 @@ | |||
public OrcPageSource( | |||
OrcRecordReader recordReader, | |||
OrcDataSource orcDataSource, | |||
List<HiveColumnHandle> columns, | |||
TypeManager typeManager, | |||
Map<Integer, Type> includedColumns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the usage of Map
here.
- it's very non-obvious what map keys are
- this is not used as a map. It's just a list of pairs
(hive column index, column type)
(... or triples(block no, hive column index, column type)
whereblock no
is implied from iteration order)
A List would be better.
I know this is pre-existing, but I think we should improve (might be a follow-up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1552 you seem to be using List<Type> types, List<Optional<Field>> fields
for similar purpose.
I'm ok with two-list approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the usage of
Map
here.
- it's very non-obvious what map keys are
- this is not used as a map. It's just a list of pairs
(hive column index, column type)
(... or triples(block no, hive column index, column type)
whereblock no
is implied from iteration order)A List would be better.
I know this is pre-existing, but I think we should improve (might be a follow-up).
Your description is precise. It's essentially a list of triples. I think we can do the improvement as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the API we already use in OrcRecordReader
, so it makes sense to me.
That said, we are going to need some better abstractions around ORC reader columns in general to deal with features like ACID, and I think we should deal with the new API with that PR instead.
f37659e
to
18402fe
Compare
This is a counterpart of #1552 , also part of the efforts to remove
HiveColumnHandle
from Iceberg Connector (#1324 ). (We don't useOrcPageSource
in Iceberg Connector now but will do in the future)