-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor HiveMetastore API and Implementations #4748
Conversation
cfed8bc
to
efa50d4
Compare
fix merge conflict and resulting compiler errors; minor format fix |
2ca72a4
to
fa91389
Compare
debugging failed tests; will update today |
dfa4ff4
to
c906c6e
Compare
split out the single property in HiveConfig hive.assume-canonical-partition-keys to per-metastore. also debugging some various test failures |
c906c6e
to
9b5a5d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing, but some initial commits. Overall looks good. Great work.
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.prestosql.plugin.hive.spi.block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to io.prestosql.plugin.hive.util
package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.prestosql.plugin.hive.spi.block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to io.prestosql.plugin.hive.util
package
@@ -124,6 +128,11 @@ public void configure(Binder binder) | |||
|
|||
jsonBinder(binder).addDeserializerBinding(Type.class).to(TypeDeserializer.class); | |||
|
|||
binder.bind(HiveBlockEncodingSerde.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't bind RecordingHiveMetastore
here unconditionally.
There are three duplicated bindings for RecordingHiveMetastore
in ThriftMetastoreModule
and GlueMetastoreModule
. Existing duplicated code:
binder.bind(HiveMetastore.class)
.annotatedWith(ForCachingHiveMetastore.class)
.to(RecordingHiveMetastore.class)
.in(Scopes.SINGLETON);
binder.bind(RecordingHiveMetastore.class).in(Scopes.SINGLETON);
newExporter(binder).export(RecordingHiveMetastore.class).withGeneratedName();
Create a RecordingHiveMetastoreModule
and move that plus this code into the new module. Then install the module in those metastore modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now need additional code when binding RecordingHiveMetastore
, so it makes sense to extract the module for it.
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/MetastoreUtil.java
Show resolved
Hide resolved
@@ -297,5 +296,21 @@ public static void main(String[] args) | |||
Thread.sleep(10); | |||
log.info("======== SERVER STARTED ========"); | |||
log.info("\n====\n%s\n====", queryRunner.getCoordinator().getBaseUrl()); | |||
|
|||
Session session = createSession(Optional.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to remove later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh shoot, yes, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Remove WIP and "closes xxx" from the commit message, then we can merge it.
@@ -266,70 +241,17 @@ public static boolean partitionMatches(List<HiveColumnHandle> partitionColumns, | |||
return true; | |||
} | |||
|
|||
private List<String> getFilteredPartitionNames(SemiTransactionalHiveMetastore metastore, HiveIdentity identity, SchemaTableName tableName, List<HiveColumnHandle> partitionKeys, TupleDomain<ColumnHandle> effectivePredicate) | |||
private List<String> getFilteredPartitionNames( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this method signature didn't change, so keep the original formatting for the arguments.
/** | ||
* @param columnNames | ||
* @param partitionKeysFilter | ||
* @param assumeCanonicalPartitionKeys allow conversion of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be missing something. Maybe remove all the @param
and just explain what the method does.
} | ||
|
||
/** | ||
* @param columnNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We avoid having empty/useless documentation for obvious parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake, i left this half-finished
checkArgument(partitionKeysFilter.getDomains().isPresent()); | ||
|
||
Map<String, Domain> domainMap = partitionKeysFilter.getDomains().orElse(ImmutableMap.of()); | ||
List<String> partitionList = columnNames.stream().map(cn -> domainToString(domainMap.get(cn), assumeCanonicalPartitionKeys, HIVE_PARTITION_VALUE_WILDCARD)).collect(toImmutableList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: wrap stream operations
List x = stream()
.map(...)
.collect(...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do. i'll check my other uses of stream()
if (domain != null && domain.isNullableSingleValue()) { | ||
return sqlScalarToString(domain.getType(), domain.getNullableSingleValue(), assumeCanonicalPartitionKeys, partitionWildcardString); | ||
} | ||
else { // null or not a single value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: else is redundant
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/MetastoreUtil.java
Show resolved
Hide resolved
{ | ||
this.client = requireNonNull(client); | ||
this.assumeCanonicalPartitionKeys = requireNonNull(metastoreConfig).isAssumeCanonicalPartitionKeys(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for requireNonNull
since the .
access will throw NPE anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point
if (partitionKeysFilter.isNone()) { | ||
return Optional.of(ImmutableList.of()); | ||
} | ||
Optional<List<String>> parts = partitionKeyFilterToStringList(columnNames, partitionKeysFilter, assumeCanonicalPartitionKeys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need the assumeCanonicalPartitionKeys
config or filtering here at all, as filtering by the metastore is best effort. This isn't being pushed down, so we can let the caller do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. previously, this information was indeed used by the caller to generate a List<String> parts
that included the information. it would be a "regression" in the sense the caller would get a larger list.
are you saying just pass false and let the caller deal with it to avoid having this config for AlluxioHiveMetastore
?
the Glue and Thrift impls will still need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked on slack and i get this, i've made the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't assumeCanonicalPartitionKeys
applicable to some metastores and not the others -- eg depending whether the call is over string representation or interpreted values?
for example it may turn out to be HMS- (and file-?) metastore specific?
{ | ||
return delegate.getPartitionNamesByParts(identity, databaseName, tableName, parts); | ||
if (partitionKeysFilter.isAll()) { | ||
return delegate.getPartitionNames(identity, databaseName, tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we drop the getPartitionNames()
method from the ThriftMetastore
interface? Then this can be a straight delegation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I didn't know what to pass to thrift to say "all". This should, in theory, be just TupleDomain.all()
, but i hit some snags
i can look at it again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change and I'm waiting for a new CI (pushed right now) to run to verify that my translation of TupleDomain.all()
to a list of "" works. it should as this was the previous behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this and what happens is that once it's down to the thrift API, it seems like any value for a partitionKey in the 'parts' api that is "" (our wildcard value) gets removed. the parts api then fails when the list is empty
we might be able to solve this when we translate it to an expression, but for now, i think i need to restore the previous special case
does this change mean a higher minimum version of hive is required? |
@tooptoop4 No, the subsequent PR that adds support for the new API will include a fallback for the old API. |
@@ -635,34 +652,17 @@ public synchronized void truncateUnpartitionedTable(ConnectorSession session, St | |||
} | |||
// add newly-added partitions to the results from underlying metastore | |||
if (!partitionActionsOfTable.isEmpty()) { | |||
List<String> columnNames = table.get().getPartitionColumns().stream().map(Column::getName).collect(Collectors.toList()); | |||
// the top of the function guards against TupleDomain.none() so this cannot be empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this" means what here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this comment, holdover from when this still did filtering. it referred to the columnNames
I'll remove the comment
0ce0a8d
to
bb6119e
Compare
I just pushed the latest update that should address all comments. i'll make one last pass locally, though |
bb6119e
to
39faca6
Compare
a8327ae
to
c24a983
Compare
* @param partitionKeysFilter map of filters (Domain) for each partition column | ||
* @return optionally, a list of strings where each entry is in the form of {key}={value} | ||
* @see TupleDomain | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove blank line before Javadoc
if (partitionKeysFilter.isNone()) { | ||
return Optional.of(Collections.emptyList()); | ||
} | ||
Optional<List<String>> optionalParts = partitionKeyFilterToStringList(columnNames, partitionKeysFilter, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove all this client side filtering here since it's only a hint. So I think the whole method can be
return ProtoUtils.toPartitionInfoList(
client.readTable(databaseName, tableName, Constraint.getDefaultInstance()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. it looks like I need a mapping of PartitionInfo::getPartitionName
as well
@@ -69,7 +72,7 @@ | |||
private TableMasterClient client; | |||
|
|||
@Inject | |||
public AlluxioHiveMetastore(TableMasterClient client) | |||
public AlluxioHiveMetastore(TableMasterClient client, AlluxioHiveMetastoreConfig metastoreConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is not needed
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.prestosql.plugin.hive.spi.block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to io.prestosql.plugin.hive.util
package
import static com.google.common.base.Preconditions.checkArgument; | ||
import static java.nio.charset.StandardCharsets.UTF_8; | ||
|
||
// This class is exactly the same as BlockEncodingManager. They are in SPI and don't have access to InternalBlockEncodingSerde. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the second sentence as it doesn't really fit here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought i did these, will update
throws Exception | ||
{ | ||
Optional<List<String>> ds = MetastoreUtil.partitionKeyFilterToStringList(Arrays.asList("ds1", "ds2", "ds3"), TupleDomain.all(), false); | ||
System.err.println(ds.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the printing. Probably add an assertion or remove the entire test method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops again, yea i'll assert or remove it
ecfd87c
to
a167bba
Compare
This commit changes the HiveMetastore API to remove methods to methods getPartionNames() and getPartitionNamesByParts() and replace with getPartitionNamesByFilter(). Each implementation is updated as well as all callsites. The previous behavior is maintained as far as translation of any non-equal Domain values to the wildcard value. A subsequent commit will implement translation of a Domain into a proper Glue expression for the GlueHiveMetastore GetPartitions API in the Glue Data Catalog.
a167bba
to
43843b2
Compare
Thanks! |
return retry() | ||
.stopOn(NoSuchObjectException.class) | ||
.stopOnIllegalExceptions() | ||
.run("getPartitionNamesByParts", stats.getGetPartitionNamesByParts().wrap(() -> { | ||
try (ThriftMetastoreClient client = createMetastoreClient(identity)) { | ||
return Optional.of(client.getPartitionNamesFiltered(databaseName, tableName, parts)); | ||
return Optional.of(client.getPartitionNamesFiltered(databaseName, tableName, parts.get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "getPartitionNamesByParts", stats.getGetPartitionNamesByParts()...
above should be renamed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvrm, i see this is pre-existing... still worth updating
This commit changes the HiveMetastore API to remove methods to methods
getPartionNames() and getPartitionNamesByParts() and replace with
getPartitionNamesByFilter(). Each implementation is updated as well as
all callsites. The previous behavior is maintained as far as
translation of any non-equal Domain values to the wildcard value.
A subsequent commit will implement this for the GlueHiveMetastore and
pass the supported translation of the Domain per column using the
GetPartitions API in the Glue Data Catalog.