Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushdown subscript paths into project on top of scan #12571

Merged

Conversation

mbasmanova
Copy link
Contributor

No description provided.

Copy link
Contributor

@elonazoulay elonazoulay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mbasmanova mbasmanova force-pushed the pushdown-subscript-paths branch 6 times, most recently from 306ff04 to 4e56af1 Compare April 3, 2019 19:03
@mbasmanova mbasmanova force-pushed the pushdown-subscript-paths branch from 4e56af1 to 8ae386d Compare April 3, 2019 19:04
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took an initial pass. I have a question about the intent behind the SPI change.

import static java.lang.String.format;
import static java.util.Objects.requireNonNull;

public class SubfieldPathTokenizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but I see some inspiration from AbstractIterator :) If so I'd leave a comment explaining. Or perhaps for the benefit of others using SPI we copy it over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan You are correct :-) Originally, I copied AbstractIterator verbatim, but it pulls in some unnecessary dependencies and code style in that class doesn't match Presto. Hence, I folded the logic into SubfieldPathTokenizer.


ProjectNode projectNode = new ProjectNode(
idAllocator.getNextId(),
new TableScanNode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came across a similar thing and thought it might be better to flatten. While having this nested with indentation shows the layout, it can make it hard to see which parameter belongs to which node. This is just my personal preference though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan How about this?

            TableScanNode tableScanNode = new TableScanNode(
                    node.getId(),
                    node.getTable(),
                    node.getOutputSymbols().stream()
                            .map(s -> symbolMap.getOrDefault(s, s))
                            .collect(toImmutableList()),
                    newAssignments.build(),
                    node.getLayout(),
                    node.getCurrentConstraint(),
                    node.getEnforcedConstraint());

            return new ProjectNode(idAllocator.getNextId(), tableScanNode, projections.build());

@@ -1652,6 +1652,33 @@ public boolean supportsMetadataDelete(ConnectorSession session, ConnectorTableHa
return true;
}

@Override
public Map<ColumnHandle, ColumnHandle> pushdownSubfieldPruning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for prototyping or do we envision this being the standard for master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan It depends on whether @highker's work on allowing connectors to participate in query planning lands soon enough. Ideally, we would present project(filter(scan)) subplan to the connector and it will decide that it can implement all of it and return a new table scan node with table and column handles encoding projections and filters. If this can't happen soon enough, I do envision introducing a couple of new metadata APIs to allow for pushing down some information into the connector. Specifically,

  • pushdownSubfieldPruning tells the connector that it can prune structs, maps and arrays
  • getTableLayouts(TupleDomain<ColumnHandleWithSubfieldPaths>) allows the connector to implement simple filters on subfields, e.g. c[2] = 5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I would add a comment discouraging its use before it reaches master.

@highker highker added the aria Presto Aria performance improvements label Apr 4, 2019
@mbasmanova mbasmanova force-pushed the pushdown-subscript-paths branch from 8ae386d to f9379c7 Compare April 4, 2019 20:50
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% nits

Map<ColumnHandle, List<SubfieldPath>> desiredSubfields)
{
if (!isSubfieldPushdownEnabled(session)) {
return Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableMap.of()

@@ -119,6 +127,18 @@
.map(value -> new NullableValue(getPrestoType(PartColumn.CONTAINER), value))
.collect(toSet());

private static final Map<String, List<ColumnMetadata>> EXTRA_TABLES = ImmutableMap.<String, List<ColumnMetadata>>builder()
.put(
"lineitem_ext",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a comment here. We're going to remove it before master right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan We'll need something like this to test on master as well. One option would be to add a new TpchPlusConnector and leave TpchConnector being purely TPC-H. Another option would be to allow extended TPC-H connector like this.

private static final List<PropertyMetadata<?>> sessionProperties = ImmutableList.of(
booleanProperty(
PUSHDOWN_SUBFIELDS,
"Enable bucket-aware execution: only use a single worker per bucket",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems incorrect

@@ -27,14 +30,22 @@
{
private final String columnName;
private final Type type;
private final Optional<List<SubfieldPath>> subfieldPaths;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we represent emptiness with the list? Empty list rather than Optional.empty

{
SubfieldPathTokenizer tokenizer = new SubfieldPathTokenizer(path);
List<PathElement> elements = new ArrayList<>();
while (tokenizer.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (tokenizer.hasNext()) {
tokenizer.forEachRemaining(elements::add);

@JsonSubTypes.Type(value = LongSubscript.class, name = "longSubscript"),
@JsonSubTypes.Type(value = StringSubscript.class, name = "stringSubscript"),
@JsonSubTypes.Type(value = AllSubscripts.class, name = "allSubscripts"),
})
public abstract static class PathElement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like this change

@Override
public Map<ColumnHandle, ColumnHandle> pushdownSubfieldPruning(Session session, TableHandle tableHandle, Map<ColumnHandle, List<SubfieldPath>> desiredSubfields)
{
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any harm in returning empty map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan All other methods here throw, hence, I did the same.

@@ -1652,6 +1652,33 @@ public boolean supportsMetadataDelete(ConnectorSession session, ConnectorTableHa
return true;
}

@Override
public Map<ColumnHandle, ColumnHandle> pushdownSubfieldPruning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I would add a comment discouraging its use before it reaches master.

@mbasmanova mbasmanova force-pushed the pushdown-subscript-paths branch from f9379c7 to 75dc441 Compare April 8, 2019 17:31
@mbasmanova
Copy link
Contributor Author

@tdcmeehan Tim, thank you for review. Comments addressed.

Rename PushdownSubscripts rule into PushdownSubfields. Change its logic to create
project node on top of scan using new filter_by_subfield_paths function instead
of modifying column handles in table scan node via calls to
ColumnHandle.createSubfieldPruningColumnHandle.

Remove ColumnHandle.createSubfieldPruningColumnHandle API.
Add an optimizer rule to pushdown subfield pruning into
connectors. The new rule uses the new metadata API
pushdownSubfieldPruning.
@mbasmanova mbasmanova force-pushed the pushdown-subscript-paths branch from 75dc441 to 2de138c Compare April 9, 2019 00:00
@mbasmanova mbasmanova merged this pull request into prestodb:aria-scan-research Apr 9, 2019
@mbasmanova mbasmanova deleted the pushdown-subscript-paths branch May 24, 2019 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aria Presto Aria performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants