Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Hive GCS to pass JSON key as a config #17892

Merged
merged 5 commits into from
Jun 15, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ public class GoogleGcsConfigurationInitializer
@Inject
public GoogleGcsConfigurationInitializer(HiveGcsConfig config)
{
config.validate();
this.useGcsAccessToken = config.isUseGcsAccessToken();
this.jsonKeyFilePath = config.getJsonKeyFilePath();
}
Expand Down
32 changes: 23 additions & 9 deletions lib/trino-hdfs/src/main/java/io/trino/hdfs/gcs/HiveGcsConfig.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,29 @@
import io.airlift.configuration.ConfigDescription;
import io.airlift.configuration.validation.FileExists;

import javax.annotation.Nullable;

import static com.google.common.base.Preconditions.checkState;

public class HiveGcsConfig
{
private boolean useGcsAccessToken;
private String jsonKeyFilePath;

public boolean isUseGcsAccessToken()
{
return useGcsAccessToken;
}

@Config("hive.gcs.use-access-token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update hive-gcs-tutorial.rst? Follow-up is fine though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's follow-up. Docs would need to have good advice on where this can be used, which is tricky.

@ConfigDescription("Use client-provided OAuth token to access Google Cloud Storage")
public HiveGcsConfig setUseGcsAccessToken(boolean useGcsAccessToken)
{
this.useGcsAccessToken = useGcsAccessToken;
return this;
}

@Nullable
@FileExists
public String getJsonKeyFilePath()
{
Expand All @@ -36,16 +54,12 @@ public HiveGcsConfig setJsonKeyFilePath(String jsonKeyFilePath)
return this;
}

public boolean isUseGcsAccessToken()
public void validate()
{
return useGcsAccessToken;
}
// This cannot be normal validation, as it would make it impossible to write TestHiveGcsConfig.testExplicitPropertyMappings

@Config("hive.gcs.use-access-token")
@ConfigDescription("Use client-provided OAuth token to access Google Cloud Storage")
public HiveGcsConfig setUseGcsAccessToken(boolean useGcsAccessToken)
{
this.useGcsAccessToken = useGcsAccessToken;
return this;
if (useGcsAccessToken) {
checkState(jsonKeyFilePath == null, "Cannot specify 'hive.gcs.json-key-file-path' when 'hive.gcs.use-access-token' is set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the other way around.
if jsonKeyFilePath is null is it expected that useGcsAccessToken is true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are mutually exclusive, and i don't think it matters which way around i write it?

}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,16 @@
import static io.airlift.configuration.testing.ConfigAssertions.assertFullMapping;
import static io.airlift.configuration.testing.ConfigAssertions.assertRecordedDefaults;
import static io.airlift.configuration.testing.ConfigAssertions.recordDefaults;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

public class TestHiveGcsConfig
{
@Test
public void testDefaults()
{
assertRecordedDefaults(recordDefaults(HiveGcsConfig.class)
.setJsonKeyFilePath(null)
.setUseGcsAccessToken(false));
.setUseGcsAccessToken(false)
.setJsonKeyFilePath(null));
}

@Test
Expand All @@ -42,14 +43,25 @@ public void testExplicitPropertyMappings()
Path jsonKeyFile = Files.createTempFile(null, null);

Map<String, String> properties = ImmutableMap.<String, String>builder()
.put("hive.gcs.json-key-file-path", jsonKeyFile.toString())
.put("hive.gcs.use-access-token", "true")
.put("hive.gcs.json-key-file-path", jsonKeyFile.toString())
.buildOrThrow();

HiveGcsConfig expected = new HiveGcsConfig()
.setJsonKeyFilePath(jsonKeyFile.toString())
.setUseGcsAccessToken(true);
.setUseGcsAccessToken(true)
.setJsonKeyFilePath(jsonKeyFile.toString());

assertFullMapping(properties, expected);
}

@Test
public void testValidation()
{
assertThatThrownBy(
new HiveGcsConfig()
.setUseGcsAccessToken(true)
.setJsonKeyFilePath("/dev/null")::validate)
.isInstanceOf(IllegalStateException.class)
.hasMessage("Cannot specify 'hive.gcs.json-key-file-path' when 'hive.gcs.use-access-token' is set");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@
import io.trino.spi.connector.ConnectorSession;
import org.apache.hadoop.fs.Path;

import javax.annotation.Nullable;

import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.IOException;
Expand All @@ -50,14 +48,14 @@ public class GcsStorageFactory

private final HdfsEnvironment hdfsEnvironment;
private final boolean useGcsAccessToken;
@Nullable
private final Optional<GoogleCredential> jsonGoogleCredential;

@Inject
public GcsStorageFactory(HdfsEnvironment hdfsEnvironment, HiveGcsConfig hiveGcsConfig)
throws IOException
{
this.hdfsEnvironment = requireNonNull(hdfsEnvironment, "hdfsEnvironment is null");
hiveGcsConfig.validate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should support for validate method in Config class be added to airlift?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PostConstruct exists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airlift supports standard bean validation, so i don't think so

normally it works awesome
the problem for this particular config is with the testExplicitPropertyMappings that wants to set all properties to some value, and they are mutually exclusive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the proper solution would be to split this into two config classes, and make one bound conditionally.
overkill i think.

this.useGcsAccessToken = hiveGcsConfig.isUseGcsAccessToken();
String jsonKeyFilePath = hiveGcsConfig.getJsonKeyFilePath();
if (jsonKeyFilePath != null) {
Expand Down Expand Up @@ -88,7 +86,7 @@ public Storage create(ConnectorSession session, String path)
}
}
else {
credential = jsonGoogleCredential.get();
credential = jsonGoogleCredential.orElseThrow(() -> new IllegalStateException("GCS credentials not configured"));
findepi marked this conversation as resolved.
Show resolved Hide resolved
}
return new Storage.Builder(httpTransport, JacksonFactory.getDefaultInstance(), new RetryHttpInitializer(credential, APPLICATION_NAME))
.setApplicationName(APPLICATION_NAME)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,6 @@ public void extendEnvironment(Environment.Builder builder)
.withEnv("GCP_CREDENTIALS_FILE_PATH", containerGcpCredentialsFile));

builder.configureContainer(TESTS, container -> container
.withCopyFileToContainer(forHostPath(gcpCredentialsFile.toPath()), containerGcpCredentialsFile)
.withEnv("GCP_CREDENTIALS_FILE_PATH", containerGcpCredentialsFile)
.withEnv("GCP_STORAGE_BUCKET", gcpStorageBucket)
.withEnv("GCP_TEST_DIRECTORY", gcsTestDirectory));

Expand Down