-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pushdown subscript paths into project on top of scan #12571
Pushdown subscript paths into project on top of scan #12571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
306ff04
to
4e56af1
Compare
4e56af1
to
8ae386d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took an initial pass. I have a question about the intent behind the SPI change.
import static java.lang.String.format; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public class SubfieldPathTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but I see some inspiration from AbstractIterator :) If so I'd leave a comment explaining. Or perhaps for the benefit of others using SPI we copy it over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan You are correct :-) Originally, I copied AbstractIterator verbatim, but it pulls in some unnecessary dependencies and code style in that class doesn't match Presto. Hence, I folded the logic into SubfieldPathTokenizer.
|
||
ProjectNode projectNode = new ProjectNode( | ||
idAllocator.getNextId(), | ||
new TableScanNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across a similar thing and thought it might be better to flatten. While having this nested with indentation shows the layout, it can make it hard to see which parameter belongs to which node. This is just my personal preference though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan How about this?
TableScanNode tableScanNode = new TableScanNode(
node.getId(),
node.getTable(),
node.getOutputSymbols().stream()
.map(s -> symbolMap.getOrDefault(s, s))
.collect(toImmutableList()),
newAssignments.build(),
node.getLayout(),
node.getCurrentConstraint(),
node.getEnforcedConstraint());
return new ProjectNode(idAllocator.getNextId(), tableScanNode, projections.build());
@@ -1652,6 +1652,33 @@ public boolean supportsMetadataDelete(ConnectorSession session, ConnectorTableHa | |||
return true; | |||
} | |||
|
|||
@Override | |||
public Map<ColumnHandle, ColumnHandle> pushdownSubfieldPruning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for prototyping or do we envision this being the standard for master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan It depends on whether @highker's work on allowing connectors to participate in query planning lands soon enough. Ideally, we would present project(filter(scan))
subplan to the connector and it will decide that it can implement all of it and return a new table scan node with table and column handles encoding projections and filters. If this can't happen soon enough, I do envision introducing a couple of new metadata APIs to allow for pushing down some information into the connector. Specifically,
pushdownSubfieldPruning
tells the connector that it can prune structs, maps and arraysgetTableLayouts(TupleDomain<ColumnHandleWithSubfieldPaths>)
allows the connector to implement simple filters on subfields, e.g. c[2] = 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I would add a comment discouraging its use before it reaches master.
8ae386d
to
f9379c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% nits
Map<ColumnHandle, List<SubfieldPath>> desiredSubfields) | ||
{ | ||
if (!isSubfieldPushdownEnabled(session)) { | ||
return Collections.emptyMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableMap.of()
@@ -119,6 +127,18 @@ | |||
.map(value -> new NullableValue(getPrestoType(PartColumn.CONTAINER), value)) | |||
.collect(toSet()); | |||
|
|||
private static final Map<String, List<ColumnMetadata>> EXTRA_TABLES = ImmutableMap.<String, List<ColumnMetadata>>builder() | |||
.put( | |||
"lineitem_ext", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put a comment here. We're going to remove it before master right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan We'll need something like this to test on master as well. One option would be to add a new TpchPlusConnector and leave TpchConnector being purely TPC-H. Another option would be to allow extended TPC-H connector like this.
private static final List<PropertyMetadata<?>> sessionProperties = ImmutableList.of( | ||
booleanProperty( | ||
PUSHDOWN_SUBFIELDS, | ||
"Enable bucket-aware execution: only use a single worker per bucket", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems incorrect
@@ -27,14 +30,22 @@ | |||
{ | |||
private final String columnName; | |||
private final Type type; | |||
private final Optional<List<SubfieldPath>> subfieldPaths; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we represent emptiness with the list? Empty list rather than Optional.empty
{ | ||
SubfieldPathTokenizer tokenizer = new SubfieldPathTokenizer(path); | ||
List<PathElement> elements = new ArrayList<>(); | ||
while (tokenizer.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (tokenizer.hasNext()) { | |
tokenizer.forEachRemaining(elements::add); |
@JsonSubTypes.Type(value = LongSubscript.class, name = "longSubscript"), | ||
@JsonSubTypes.Type(value = StringSubscript.class, name = "stringSubscript"), | ||
@JsonSubTypes.Type(value = AllSubscripts.class, name = "allSubscripts"), | ||
}) | ||
public abstract static class PathElement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this change
@Override | ||
public Map<ColumnHandle, ColumnHandle> pushdownSubfieldPruning(Session session, TableHandle tableHandle, Map<ColumnHandle, List<SubfieldPath>> desiredSubfields) | ||
{ | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any harm in returning empty map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan All other methods here throw, hence, I did the same.
@@ -1652,6 +1652,33 @@ public boolean supportsMetadataDelete(ConnectorSession session, ConnectorTableHa | |||
return true; | |||
} | |||
|
|||
@Override | |||
public Map<ColumnHandle, ColumnHandle> pushdownSubfieldPruning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I would add a comment discouraging its use before it reaches master.
f9379c7
to
75dc441
Compare
@tdcmeehan Tim, thank you for review. Comments addressed. |
Rename PushdownSubscripts rule into PushdownSubfields. Change its logic to create project node on top of scan using new filter_by_subfield_paths function instead of modifying column handles in table scan node via calls to ColumnHandle.createSubfieldPruningColumnHandle. Remove ColumnHandle.createSubfieldPruningColumnHandle API.
Add an optimizer rule to pushdown subfield pruning into connectors. The new rule uses the new metadata API pushdownSubfieldPruning.
75dc441
to
2de138c
Compare
No description provided.