Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Hive GCS to pass JSON key as a config #17892

Merged
merged 5 commits into from
Jun 15, 2023
Merged

Conversation

findepi
Copy link
Member

@findepi findepi commented Jun 14, 2023

No description provided.

findepi added 4 commits June 14, 2023 10:25
The validation is added in a non-standard manner (not annotation
driven), as it would make it impossible to write
`TestHiveGcsConfig.testExplicitPropertyMappings`.
@cla-bot cla-bot bot added the cla-signed label Jun 14, 2023
@findepi findepi force-pushed the findepi/gcs-credentials branch from 426b922 to 04e9fee Compare June 14, 2023 09:11
@@ -58,6 +58,7 @@ public GcsStorageFactory(HdfsEnvironment hdfsEnvironment, HiveGcsConfig hiveGcsC
throws IOException
{
this.hdfsEnvironment = requireNonNull(hdfsEnvironment, "hdfsEnvironment is null");
hiveGcsConfig.validate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should support for validate method in Config class be added to airlift?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PostConstruct exists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airlift supports standard bean validation, so i don't think so

normally it works awesome
the problem for this particular config is with the testExplicitPropertyMappings that wants to set all properties to some value, and they are mutually exclusive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the proper solution would be to split this into two config classes, and make one bound conditionally.
overkill i think.

// This cannot be normal validation, as it would make it impossible to write TestHiveGcsConfig.testExplicitPropertyMappings

if (useGcsAccessToken) {
checkState(jsonKeyFilePath == null, "Cannot specify 'hive.gcs.json-key-file-path' when 'hive.gcs.use-access-token' is set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the other way around.
if jsonKeyFilePath is null is it expected that useGcsAccessToken is true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are mutually exclusive, and i don't think it matters which way around i write it?

{
try {
// Just create a temporary json key file.
Path tempFile = Files.createTempFile("gcs-key-", ".json", PosixFilePermissions.asFileAttribute(EnumSet.of(OWNER_READ, OWNER_WRITE)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of exposing credentials in temporary file - but I assume this is the only way you can integrate with Hive Configuration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this will be gone after we de-hadoop file systems

obviously this should be used only when deploying in a trusted isolated environment, like k8s

@github-actions github-actions bot added delta-lake Delta Lake connector iceberg Iceberg connector labels Jun 14, 2023
return jsonKey;
}

@Config("hive.gcs.json-key")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConfigSecuritySensitive seems missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

return useGcsAccessToken;
}

@Config("hive.gcs.use-access-token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update hive-gcs-tutorial.rst? Follow-up is fine though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's follow-up. Docs would need to have good advice on where this can be used, which is tricky.

Co-authored-by: Eric Hwang <[email protected]>
Co-authored-by: Slawomir Pajak <[email protected]>
Co-authored-by: Marius Grama <[email protected]>
@findepi findepi force-pushed the findepi/gcs-credentials branch from 04e9fee to a7d3f8e Compare June 15, 2023 12:10
@findepi findepi merged commit 5804fb1 into master Jun 15, 2023
@findepi findepi deleted the findepi/gcs-credentials branch June 15, 2023 12:10
@github-actions github-actions bot added this to the 420 milestone Jun 15, 2023
@colebow
Copy link
Member

colebow commented Jun 21, 2023

Does this need release notes? @findepi

@findepi findepi added the no-release-notes This pull request does not require release notes entry label Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector iceberg Iceberg connector no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

7 participants