Hide table layouts from engine #363

martint · 2019-03-02T07:26:06Z

In preparation for https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations, this change removes support for multiple layouts. It's a feature that's not used by any known connector, complicates the work of the optimizer and just gets in the way. In the future, we may add this functionality again but in a different form.

dain

One comment, but otherwise looks good.

dain · 2019-03-02T21:47:27Z

presto-main/src/main/java/io/prestosql/sql/planner/plan/TableScanNode.java

    // Used during predicate refinement over multiple passes of predicate pushdown
    // TODO: think about how to get rid of this in new planner
    private final TupleDomain<ColumnHandle> currentConstraint;

    private final TupleDomain<ColumnHandle> enforcedConstraint;

+    public static TableScanNode newInstance(


This is a strange design, which I think predates this change. Maybe add a comment on why this factory is different from the constructor with the same signature.

Ah, thanks for pointing it out. I need to see if I can get rid of it. Without it, two of the constructors had the same shape (but ended up doing different things). I may be able to consolidate them now that we don’t have table layouts.

It turns out I can't. I tried making the static factory method the one that's used to deserialized from json, but it needs to set some fields to null, which is disallowed by the full constructor. Oh well.. I added a comment to clarify.

But we used to have such constructor and it used to work. What was wrong with it?

kokosing

Can you please refresh my memory and explain why TableLayout* were introduced in the first place and why it no longer holds?

kokosing · 2019-03-04T10:48:45Z

presto-main/src/main/java/io/prestosql/metadata/TableHandle.java


 import static java.util.Objects.requireNonNull;

 public final class TableHandle
 {
    private final ConnectorId connectorId;
    private final ConnectorTableHandle connectorHandle;
+    private final ConnectorTransactionHandle transaction;
+    private final Optional<ConnectorTableLayoutHandle> layout;


To me it looks like to much of information to be stored in TableHandle. To me TableHandle represents an handle to a table, same way like a column handle. It is expected to be lightweight. What you have here is designed for very specific use case, hence maybe a new entity should be created for that.

Notice that you are planning to put even more information here (aggregations, joins, projections), the more you add here the less "table"-like it becomes, instead it becomes a more generic "relation" (no longer a table). Also notice that TableHandle is used for insert (perfect fit usage) and now with your changes it becomes questionable.

To me it looks like to much of information to be stored in TableHandle. To me TableHandle represents an handle to a table, same way like a column handle. It is expected to be lightweight. What you have here is designed for very specific use case, hence maybe a new entity should be created for that.

Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.

But at the end of the day, yes, the plan is for TableHandle to potentially represent more than raw tables. The object doesn't need to be heavyweight -- it's up to the connector to decide what it holds on to internally (it could be just an id to an stored definition, etc). As to whether it becomes more "relation"-like, that's not much of a concern. In fact, the SQL spec treats everything as some form of a table:

A table is a collection of zero or more rows where each row is a sequence of one or more column values. [...] A table is either a base table, a derived table, or a transient table. [...] A derived table is a table derived directly or indirectly from one or more other tables by the evaluation of an expression, such as a <joined table>, <data change delta table>, <query expression>, or <table expression>. A <query expression> can contain an optional <order by clause>

Also notice that TableHandle is used for insert (perfect fit usage) and now with your changes it becomes questionable.

That's ok, because whether a table can be inserted into is decided at analysis time. So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into. BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables? For example, there's an entire section describing "updatable views".

I see your point. Anyway, I still think that TableHandle was a simple wrapper around ConnectorTableHandle and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.

Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.

But there will be something else that will replace TableLayout. So TableHandle won't become any simpler.

So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into.

Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).

BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables?

Mind blowing. Anyway, I don't think that TableHandle that is suited for read will match that.

martint · 2019-03-04T17:35:06Z

Can you please refresh my memory and explain why TableLayout* were introduced in the first place and why it no longer holds?

It's a feature we added a few years ago for a very specific use case at FB to allow connectors to provide multiple physical organizations of a table for the engine to use during optimization. No one else is using it. Since it makes it hard to introduce new features like complex operation pushdown, we're removing it for now. We may reintroduce it later in some other shape (materialized query tables/materialized views, etc).

kokosing · 2019-03-04T19:45:38Z

Would you mind to sort commits, that way it will be easier to review.

martint · 2019-03-04T20:14:42Z

Would you mind to sort commits, that way it will be easier to review.

done

kokosing

LGTM for all commits but Hide TableLayouts from engine

kokosing · 2019-03-05T11:02:51Z

presto-main/src/main/java/io/prestosql/metadata/TableHandle.java


 import static java.util.Objects.requireNonNull;

 public final class TableHandle
 {
    private final ConnectorId connectorId;
    private final ConnectorTableHandle connectorHandle;
+    private final ConnectorTransactionHandle transaction;
+    private final Optional<ConnectorTableLayoutHandle> layout;


I see your point. Anyway, I still think that TableHandle was a simple wrapper around ConnectorTableHandle and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.

Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.

But there will be something else that will replace TableLayout. So TableHandle won't become any simpler.

So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into.

Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).

BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables?

Mind blowing. Anyway, I don't think that TableHandle that is suited for read will match that.

kokosing · 2019-03-05T11:06:10Z

presto-main/src/main/java/io/prestosql/metadata/TableLayoutResult.java

@@ -29,15 +29,22 @@

 public class TableLayoutResult
 {
+    private final TableHandle newTableHandle;


why newTableHanlde and not just tableHandle?

It's just to indicate that this is the new table handle to be used to refer to the result of pushing the predicates into the table scan. It's named that way for clarity, but it doesn't matter in the medium term since I'm planning to remove this class altogether once the replacement from https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations is in place and we give connectors some time to migrate.

kokosing · 2019-03-05T11:06:25Z

presto-main/src/main/java/io/prestosql/metadata/TableLayoutResult.java

    {
+        this.newTableHandle = newTable;


kokosing · 2019-03-05T11:13:39Z

...o-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushPredicateIntoTableScan.java

@@ -90,20 +90,14 @@ public PickTableLayout(Metadata metadata, SqlParser parser)
    {
        return ImmutableSet.of(
                checkRulesAreFiredBeforeAddExchangesRule(),


is this still required? I would replace with comment in PlanOptimizers. There is no reason to prevent this class to be invoked later.

Probably not. I'll remove it.

...o-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushPredicateIntoTableScan.java

kokosing · 2019-03-05T11:40:03Z

presto-main/src/main/java/io/prestosql/sql/planner/plan/TableScanNode.java

    // Used during predicate refinement over multiple passes of predicate pushdown
    // TODO: think about how to get rid of this in new planner
    private final TupleDomain<ColumnHandle> currentConstraint;

    private final TupleDomain<ColumnHandle> enforcedConstraint;

+    public static TableScanNode newInstance(


But we used to have such constructor and it used to work. What was wrong with it?

martint · 2019-03-05T16:34:39Z

But we used to have such constructor and it used to work. What was wrong with it?

Before, we had two constructors:

@JsonCreator
public TableScanNode(
        @JsonProperty("id") PlanNodeId id,
        @JsonProperty("table") TableHandle table,
        @JsonProperty("outputSymbols") List<Symbol> outputs,
        @JsonProperty("assignments") Map<Symbol, ColumnHandle> assignments,
        @JsonProperty("layout") Optional<TableLayoutHandle> tableLayout)

and

public TableScanNode(
        PlanNodeId id,
        TableHandle table,
        List<Symbol> outputs,
        Map<Symbol, ColumnHandle> assignments)

After removing table layouts, both constructors have the same signature, but do something different.

martint · 2019-03-05T16:46:00Z

I see your point. Anyway, I still think that TableHandle was a simple wrapper around ConnectorTableHandle and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.

It will still be a simple wrapper around ConnectorTableHandle once we finish removing table layouts from the SPI. But we can'd do that quite yet -- we need to 1) add the new APIs and mechanism for predicate pushdown 2) have a grace period for connectors to adapt (or do it automatically, if possible)

But there will be something else that will replace TableLayout. So TableHandle won't become any simpler.

No, there won't. It will be just TableHandles wrapping ConnectorTableHandles. It's up to connectors to decide what they need to track in a ConnectorTableHandle based on what can be pushed down into the connector.

Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).

That's really hard to do in general. It forces entire parallel hierarchies and APIs to represent similar objects with different constraints (e.g., we'd need separate APIs to resolve a table during analysis for each possible use so we can get a table handle for insert, a table handle for delete, etc.). Also, where do we stop? Would we have one of each expression kind for each SQL type to prevent misuse? It's a fine balance, and I think the current concepts provide good separation between action (expression/plan node type), object (columns, tables, arguments) and attributes (physical or logical properties) without requiring a NxMxR explosion of valid combinations.

sopel39

well done!

presto-main/src/main/java/io/prestosql/sql/planner/optimizations/MetadataQueryOptimizer.java

presto-main/src/main/java/io/prestosql/metadata/Metadata.java

presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java

presto-main/src/main/java/io/prestosql/metadata/TableHandle.java

sopel39 · 2019-03-07T10:19:09Z

presto-main/src/main/java/io/prestosql/metadata/TableLayoutResult.java

@@ -25,15 +25,22 @@

 public class TableLayoutResult
 {
+    private final TableHandle newTableHandle;


Why do we need this field since we still have layout?

Layout doesn't have a reference to the ConnectorTableHandle that's needed to construct the new TableHandle. I could add that, but it seemed conceptually cleaner to do it this way.

presto-main/src/main/java/io/prestosql/split/SplitManager.java

presto-main/src/main/java/io/prestosql/sql/planner/optimizations/HashGenerationOptimizer.java

presto-main/src/test/java/io/prestosql/operator/BenchmarkScanFilterAndProjectOperator.java

More accurately, constructs a set of alternate plans with the filter pushed into the table scan based on available table layouts.

dain · 2019-03-07T18:39:13Z

presto-main/src/main/java/io/prestosql/sql/relational/SqlToRowExpressionTranslator.java

@@ -144,12 +145,14 @@ public static RowExpression translate(
            FunctionRegistry functionRegistry,
            TypeManager typeManager,
            Session session,
-            boolean optimize)
+            boolean optimize,
+            Map<Symbol, Integer> layout)


I'd put this below type

dain · 2019-03-07T18:52:52Z

presto-main/src/test/java/io/prestosql/operator/scalar/FunctionAssertions.java

+            }
+        });
+
+        Block block = Utils.nativeValueToBlock(expectedType, result);


add comment // convert result from stack type to Type ObjectValue

This test relies on a layout being picked during planning. For LocalQueryRunner this only happens because of a synthetic rule (PickLayout w/o predicate) that attaches a layout to the table scan. So, in a sense, it's just a coincidence that it works. In distributed execution, the job is done by AddExchanges, so we want to make sure we're testing that behavior.

It's just selecting a layout for the raw table scan, so no need to go through the logic for pushing a predicate, etc.

This change hides table layouts from the engine as a first-class concept. We keep the SPI as is for backward compatibility for now. When predicates are pushed into a table scan by PickLayout (now PushPredicateIntoTableScan) or AddExchanges, we now replace the table handle associated with the table scan with a new one that contains the reference to the ConnectorTableLayoutHandle under the covers.

It now contains a single rule, so no point in having it return a rule set.

In order to translate expression to row expressions, the code was first replacing all symbol references with field references for the corresponding ordinal inputs. This is unnecessary, as the translation can be done on demand as the expression is translated to a row expression.

The inferred type of the former expression is INTEGER, which doesn't match the signature of combineHash function call.

They were only being used in tests. The engine no longer relies on them for query execution.

There's no longer a conflict with analyzeExpressionsWithInputs so simplify the name

There's only one caller, so no need for an extra indirection.

rongrong · 2019-03-22T21:52:21Z

presto-main/src/main/java/io/prestosql/sql/planner/ExpressionInterpreter.java

@@ -252,10 +251,10 @@ public Object evaluate()
        return visitor.process(expression, new NoPagePositionContext());
    }

-    public Object evaluate(int position, Page page)
+    public Object evaluate(SymbolResolver inputs)


What's the difference between this evaluate and the optimize now? If you provide SymbolResolver, using optimize would be just fine, right?

cla-bot bot added the cla-signed label Mar 2, 2019

martint mentioned this pull request Mar 2, 2019

Allow connectors to participate in query optimization #18

Open

26 tasks

martint force-pushed the tl-wip3 branch from 402630b to 9b3c617 Compare March 2, 2019 17:14

martint changed the title ~~[WIP] Hide table layouts from engine~~ Hide table layouts from engine Mar 2, 2019

dain approved these changes Mar 2, 2019

View reviewed changes

martint force-pushed the tl-wip3 branch from 9b3c617 to aab3854 Compare March 2, 2019 23:55

kokosing reviewed Mar 4, 2019

View reviewed changes

martint force-pushed the tl-wip3 branch from aab3854 to e6e6e7a Compare March 4, 2019 20:13

kokosing reviewed Mar 5, 2019

View reviewed changes

martint force-pushed the tl-wip3 branch 2 times, most recently from 858edcb to 6773bb3 Compare March 7, 2019 03:54

martint mentioned this pull request Mar 7, 2019

[WIP] Complex pushdown #402

Closed

sopel39 reviewed Mar 7, 2019

View reviewed changes

martint added 2 commits March 7, 2019 09:47

Rename listTableLayouts method

d7cde8c

More accurately, constructs a set of alternate plans with the filter pushed into the table scan based on available table layouts.

Add null checks

c68109c

dain approved these changes Mar 7, 2019

View reviewed changes

martint added 5 commits March 7, 2019 10:59

Remove support for multiple table layouts

801423b

Remove unused parameter

c6abef5

Simplify unconditional PickLayout

1285a77

It's just selecting a layout for the raw table scan, so no need to go through the logic for pushing a predicate, etc.

Inline unnecessary method

4aa9539

martint force-pushed the tl-wip3 branch from 6773bb3 to c705d2d Compare March 7, 2019 19:04

kokosing approved these changes Mar 8, 2019

View reviewed changes

martint force-pushed the tl-wip3 branch 2 times, most recently from 2eebec6 to f10d5f2 Compare March 8, 2019 21:24

martint force-pushed the tl-wip3 branch from f10d5f2 to c9d4d04 Compare March 9, 2019 02:34

martint added 8 commits March 8, 2019 19:51

Make PushPredicateIntoTableScan top level rule

1249ff2

It now contains a single rule, so no point in having it return a rule set.

Fix expression type

7323922

The inferred type of the former expression is INTEGER, which doesn't match the signature of combineHash function call.

Remove interpreted page processors

eac0b47

They were only being used in tests. The engine no longer relies on them for query execution.

Remove unused functions

379f54e

Rename analyzeExpressionsWithSymbols method

4dabf12

There's no longer a conflict with analyzeExpressionsWithInputs so simplify the name

Inline analyzeExpressions method

feb70d7

There's only one caller, so no need for an extra indirection.

martint force-pushed the tl-wip3 branch from c9d4d04 to feb70d7 Compare March 9, 2019 03:53

martint merged commit 2a23e10 into trinodb:master Mar 9, 2019

rongrong reviewed Mar 22, 2019

View reviewed changes

wenleix mentioned this pull request Apr 21, 2019

Remove tableLayoutHandle prestodb/presto#12674

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hide table layouts from engine #363

Hide table layouts from engine #363

martint commented Mar 2, 2019 •

edited

Loading

dain left a comment

dain Mar 2, 2019

martint Mar 2, 2019

martint Mar 3, 2019

kokosing Mar 5, 2019

kokosing left a comment

kokosing Mar 4, 2019

martint Mar 4, 2019

kokosing Mar 5, 2019

martint commented Mar 4, 2019

kokosing commented Mar 4, 2019

martint commented Mar 4, 2019

kokosing left a comment

kokosing Mar 5, 2019

kokosing Mar 5, 2019

martint Mar 5, 2019

kokosing Mar 5, 2019

kokosing Mar 5, 2019

martint Mar 5, 2019

kokosing Mar 5, 2019

martint commented Mar 5, 2019

martint commented Mar 5, 2019

sopel39 left a comment

sopel39 Mar 7, 2019

martint Mar 7, 2019

dain Mar 7, 2019

dain Mar 7, 2019

rongrong Mar 22, 2019

Hide table layouts from engine #363

Hide table layouts from engine #363

Conversation

martint commented Mar 2, 2019 • edited Loading

dain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martint commented Mar 4, 2019

kokosing commented Mar 4, 2019

martint commented Mar 4, 2019

kokosing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martint commented Mar 5, 2019

martint commented Mar 5, 2019

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martint commented Mar 2, 2019 •

edited

Loading