Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CombineTextInputFormat from Hive #23072

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

sjdurfey
Copy link
Contributor

Description

As brought up in #21842, I am unable to query text based hive tables that use org.apache.hadoop.mapred.lib.CombineTextInputFormat. I get this error:

io.trino.spi.TrinoException: Unsupported storage format: mydatabase.mytable:<UNPARTITIONED> StorageFormat{serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, inputFormat=org.apache.hadoop.mapred.lib.CombineTextInputFormat, outputFormat=org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat}

This change was the suggestion from the linked issue. To test this out I tried to query with Trino 453 without the code change, and I get this error on the CLI:

$ ./trino https://localhost:8443 --catalog hive
trino> select count(*) from <my hive table>;
Query 20240819_203410_00002_cg577 failed: Unsupported storage format: <my hive table>:dateint=20240810/state=open StorageFormat{serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, inputFormat=org.apache.hadoop.mapred.lib.CombineTextInputFormat, outputFormat=org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat}

Then I built the hive plugin jar with the code change and copied it onto the Trino docker image I was using, and re-ran my query with success:

$ ./trino https://localhost:8443 --catalog hive
trino> select count(*) from <my hive table>;
 _col0
--------
 242185
(1 row)

Query 20240819_203152_00000_r8kgb, FINISHED, 1 node
Splits: 5,616 total, 5,616 done (100.00%)
30.30 [242K rows, 131MB] [7.99K rows/s, 4.31MB/s]

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

c.c. @electrum

Copy link

cla-bot bot commented Aug 19, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the hive Hive connector label Aug 19, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 10, 2024
@mosabua
Copy link
Member

mosabua commented Sep 10, 2024

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Sep 10, 2024
Copy link

cla-bot bot commented Sep 10, 2024

The cla-bot has been summoned, and re-checked this pull request!

@mosabua
Copy link
Member

mosabua commented Sep 10, 2024

Can you change the commit message to

Add support for CombineTextInputFormat from Hive

@github-actions github-actions bot removed the stale label Sep 11, 2024
@sjdurfey sjdurfey changed the title Added support for hives CombineTextInputFormat Add support for CombineTextInputFormat from Hive Sep 16, 2024
@sjdurfey sjdurfey force-pushed the combinetext-hive-format branch from 8d3c3e0 to 7d887dd Compare September 16, 2024 14:06
@sjdurfey
Copy link
Contributor Author

@mosabua, the commit message has been changed. I'm not sure how to re-run the failing delta lake build

@sjdurfey
Copy link
Contributor Author

is there anything else that I need to do on this PR for it to be merged?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mosabua PTAL at release notes. This is ready to be merged.

@mosabua
Copy link
Member

mosabua commented Oct 8, 2024

I will take care of release notes in the 461 RN PR.

@mosabua mosabua merged commit 8590255 into trinodb:master Oct 8, 2024
56 checks passed
@github-actions github-actions bot added this to the 461 milestone Oct 8, 2024
@mosabua
Copy link
Member

mosabua commented Oct 8, 2024

Added to release note now with suggested RN entry

@mosabua
Copy link
Member

mosabua commented Oct 8, 2024

Thank you for the PR @sjdurfey .. this will be in the upcoming 461 release

@sjdurfey
Copy link
Contributor Author

sjdurfey commented Oct 8, 2024

woohoo! Thanks, and you're welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

4 participants