-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Parquet Modular Encryption for Trino #20069
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Done |
@@ -264,6 +264,12 @@ | |||
<scope>provided</scope> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently completed a huge project to decouple Trino from Hive and Hadoop, so we can't reintroduce Hadoop here. Unfortunately, since the Parquet encryption APIs are tied to Hadoop, we'll need to reimplement the decryption code ourselves, inside of Trino, similar to what we have done for most of Parquet and the other file format readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum Hey, thanks for review. I will take a look and the complexity of it and get back here. (Also thanks for the link to the project. It's nice to learn the reasons behind it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @electrum I have migrated necessary classes and methods to Trino project and remove the dep on hadoop. I tested with HUDI catalog and it works with encrypted datasets. I could add more unit tests later if the PR looks OK to you. Could you take another look? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum Bump. No rush tho. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the Hadoop dependency is still here. We need to remove this.
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
3bbc5a5
to
e6578c9
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
👋 @my7ym .. please let us know if you need any help and ping us when this is ready for review again. |
@amoghmargoor @findepi @electrum can you potentially help here and work on the removal of any Hadoop deps as discussed in the contributor call https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-1-feb-2024 In fact looking at the message in the PR it looks like these removals are done. Maybe all we need is for @my7ym to rebase and then coordinate reviews and changes. Also @my7ym .. I am not sure if you are on trino slack .. if so please let us know and chime in at https://trinodb.slack.com/archives/CP1MUNEUX/p1706832386414769 or sync up in general with us to continue this work together |
@my7ym Great work on reimplementing the Parquet code to not require Hadoop. I'll review this ASAP. |
Also fyi @raunaqmorarka |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @my7ym - thanks for PR. I will try to run this internally and see if it works on our file. If any issues I will raise my patch so that it can help taking this forward. We have been running this in production for 1.5+ years a while and have fixed some crucial bugs on the way.
It looks like this code still depends on Hadoop. The config also contains a class name that is loaded at runtime, which is something we avoid in Trino. I think we'll want to have a list of supported KMS implementations that work out-of-the-box with appropriate config. It would be helpful to have a product test that shows this working end-to-end. |
I think we'll want to have a list of supported KMS implementations that work out-of-the-box with appropriate config. Good call. Will do.
Thanks! |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@mosabua A bit tied up in the past weeks. Will work on this in next week. Will let you know if I have more questions. Thanks for checking in. |
hi @my7ym, I'm working on a project that needs this feature, and I've been following this PR with interest. Is the branch featured in this PR in a workable state (with the understanding that there are Hadoop deps that currently prevent merging into Trino)? I'm eager to assist if there's any way I can help to move this forward. I'm relatively new to Trino but very enthusiastic about contributing to the community :) |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@electrum @amoghmargoor how do we want to proceed here? |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
Can you provide guidance on how to proceed here @electrum ? |
@cla-bot check |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
The cla-bot has been summoned, and re-checked this pull request! |
CLA check failed because it expects |
Related to #23583 |
Description
This large PR is to support Parquet Modular Encryption for Trino. The PR core logic is migrated from the one in Presto with several changes to bridge the big gaps between APIs in these two repos.
To avoid no trino-hdfs dep in other modules, we pass the trino decryption configs in parquet configs and then translate to parquet flags. Not sure if there is a better way to do this.
For Trino issue: #9383
Will fix the test and linting failures after the implementation is stable.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: