New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Decouple OrcPageSource from HiveColumnHandle #1557

Merged

electrum merged 1 commit into trinodb:master from lxynov:orc-hive-column-handle

Sep 23, 2019

Member

lxynov commented Sep 19, 2019

This is a counterpart of #1552 , also part of the efforts to remove HiveColumnHandle from Iceberg Connector (#1324 ). (We don't use OrcPageSource in Iceberg Connector now but will do in the future)

cla-bot bot added the cla-signed label

dain reviewed

View reviewed changes

Member

dain left a comment

Looks good. Only one style comment.

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java Outdated Show resolved Hide resolved

lxynov force-pushed the orc-hive-column-handle branch from 797eb74 to f37659e Compare

September 19, 2019 06:18

Member Author

lxynov commented Sep 19, 2019

@dain Thanks. Updated.

findepi reviewed

View reviewed changes

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java Outdated Show resolved Hide resolved

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java Outdated Show resolved Hide resolved

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSourceFactory.java Outdated Show resolved Hide resolved

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java

    
                      ImmutableList.Builder<Type> typesBuilder = ImmutableList.builder();

                      for (int columnIndex = 0; columnIndex < columns.size(); columnIndex++) {

                          HiveColumnHandle column = columns.get(columnIndex);

                          checkState(column.getColumnType() == REGULAR, "column type must be regular");

Member

findepi Sep 19, 2019

Where did this logic go?
Am I correct this is handled by HivePageSourceProvider?

If so, I'd name this commit differently. Something signalling you're simply reusing information already collected (removing unnecessary code). Or, you may include this information as part of commit message body.

Member Author

lxynov Sep 19, 2019

The useful logic in this block is the checking done by recordReader.isColumnPresent. I moved it to getNextPage.

Member

dain Sep 19, 2019

@findepi I had to do some digging when I reviewed this. This code is a duplicate of https://github.com/prestosql/presto/blob/f37659ee6ce907e51a75f325ed744c2fcbc459ca/presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSourceFactory.java#L193-L201 I think this is just cruft from refactorings.

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java Outdated Show resolved Hide resolved

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java Outdated

    
                          Block[] blocks = new Block[hiveColumnIndexes.length];

                          for (int fieldId = 0; fieldId < blocks.length; fieldId++) {

                              if (constantBlocks[fieldId] != null) {

                                  blocks[fieldId] = constantBlocks[fieldId].getRegion(0, batchSize);

Member

findepi Sep 19, 2019

We're no longer reusing constantBlocks. I think this is OK. @dain

Member

dain Sep 19, 2019

For both this and the streaming, I'm not super concerned as this code is only executed once per page. That said, we haven't measured the impact, and the new code isn't dramatically more readable, so I'd would have left is as is to keep risk down. I don't have a strong preference, so I open to either solution.

presto-hive/src/main/java/io/prestosql/plugin/hive/orc/OrcPageSource.java

    
            @@ -66,43 +55,15 @@
          
                  public OrcPageSource(

                          OrcRecordReader recordReader,

                          OrcDataSource orcDataSource,

                          List<HiveColumnHandle> columns,

                          TypeManager typeManager,

                          Map<Integer, Type> includedColumns,

Member

findepi Sep 19, 2019

I don't like the usage of Map here.

it's very non-obvious what map keys are
this is not used as a map. It's just a list of pairs (hive column index, column type) (... or triples (block no, hive column index, column type) where block no is implied from iteration order)

A List would be better.

I know this is pre-existing, but I think we should improve (might be a follow-up).

Member

findepi Sep 19, 2019

In #1552 you seem to be using List<Type> types, List<Optional<Field>> fields for similar purpose.
I'm ok with two-list approach.

Member Author

lxynov Sep 19, 2019

I don't like the usage of Map here.

it's very non-obvious what map keys are

this is not used as a map. It's just a list of pairs (hive column index, column type) (... or triples (block no, hive column index, column type) where block no is implied from iteration order)

A List would be better.

I know this is pre-existing, but I think we should improve (might be a follow-up).

Your description is precise. It's essentially a list of triples. I think we can do the improvement as a follow-up.

Member

dain Sep 19, 2019

This is the API we already use in OrcRecordReader, so it makes sense to me.

That said, we are going to need some better abstractions around ORC reader columns in general to deal with features like ACID, and I think we should deal with the new API with that PR instead.


          Decouple OrcPageSource from HiveColumnHandle

18402fe

lxynov force-pushed the orc-hive-column-handle branch from f37659e to 18402fe Compare

September 19, 2019 23:35

Member Author

lxynov commented Sep 19, 2019

@dain @findepi Thanks for the review. I've updated the PR.

dain approved these changes

View reviewed changes

dain added the ready to merge label

electrum merged commit b142e8a into trinodb:master

lxynov deleted the orc-hive-column-handle branch

September 23, 2019 22:02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed ready to merge