Skip to content

Commit

Permalink
[Doc] Add view-based Rewrite (#44747)
Browse files Browse the repository at this point in the history
Signed-off-by: 絵空事スピリット <[email protected]>
  • Loading branch information
EsoragotoSpirit authored Apr 28, 2024
1 parent 1292550 commit 51034de
Show file tree
Hide file tree
Showing 4 changed files with 555 additions and 35 deletions.
Binary file added docs/en/assets/Rewrite-view-based.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
294 changes: 277 additions & 17 deletions docs/en/using_starrocks/query_rewrite_with_materialized_views.md
Original file line number Diff line number Diff line change
Expand Up @@ -710,34 +710,280 @@ As shown above, `agg_mv5` contains the data from partitions `p1` to `p7`, and th

## View-based materialized view rewrite

StarRocks supports creating materialized views based on views. Subsequent queries against the views can be transparently rewritten.
From v3.1.0 onwards, StarRocks supports creating materialized views based on views. Subsequent queries against the views can be rewritten if they are of the SPJG pattern. By default, queries against views are automatically transcribed into queries against the base tables of the views and then transparently matched and rewritten.

For example, create the following views:
However, in real-world scenarios, data analysts may perform data modeling upon complex, nested views, which cannot be directly transcribed. As a result, materialized views created based on such views cannot rewrite queries. To improve its capability in the preceding scenario, StarRocks optimizes the view-based materialized view query rewrite logic from v3.3.0 onwards.

### Fundamentals

In the previous query rewrite logic, StarRocks will transcribe queries against a view into queries against the base tables of the view. Query rewrite will encounter failures if the execution plan of the transcribed query mismatches the SPJG pattern.

To solve this problem, StarRocks introduces a new operator - LogicalViewScanOperator, to simplify the structure of the execution plan tree without transcribing the query. This operator seeks to match the execution plan tree with the SPJG pattern, therefore facilitating query rewrite.

The following example lists a query with an AGGREGATE sub-query, a view built upon the sub-query, the transcribed query based on the view, and the materialized view built upon the view:

```SQL
CREATE VIEW customer_view1
-- Original query:
SELECT
v1.a,
t2.b,
v1.total
FROM(
SELECT
a,
sum(c) AS total
FROM t1
GROUP BY a
) v1
INNER JOIN t2 ON v1.a = t2.a;

-- View:
CREATE VIEW view_1 AS
SELECT
t1.a,
sum(t1.c) AS total
FROM t1
GROUP BY t1.a;

-- Transcribed query:
SELECT
v1.a,
t2.b,
v1.total
FROM view_1 v1
JOIN t2 ON v1.a = t2.a;

-- Materialized view:
CREATE MATERIALIZED VIEW mv1
DISTRIBUTED BY hash(a)
REFRESH MANUAL
AS
SELECT c_custkey, c_name, c_address
FROM customer;
SELECT
v1.a,
t2.b,
v1.total
FROM view_1 v1
JOIN t2 ON v1.a = t2.a;
```

CREATE VIEW lineorder_view1
AS
SELECT lo_orderkey, lo_linenumber, lo_custkey, lo_revenue
FROM lineorder;
The execution plan of the original query, as shown on the left of the following diagram, mismatches the SPJG pattern due to the LogicalAggregateOperator within the JOIN. StarRocks does not support query rewrite for such cases. However, by defining a view based on the sub-query, the original query can be transcribed into a query against the view. With the LogicalViewScanOperator, StarRocks can transfer the mismatched part into the SPJG pattern, therefore allowing query rewrite under this circumstance.

![img](../assets/Rewrite-view-based.png)

### Usage

View-based materialized view query rewrite is disabled by default.

To enable this feature, you must set the following variable:

```SQL
SET enable_view_based_mv_rewrite = true;
```

Then, create the following materialized view based on the views:
### Use cases

#### Rewrite queries using single-view-based materialized views

StarRocks supports rewriting queries with a materialized view built upon a single view, including queries with aggregations.

For example, you can build the following view and materialized view for the TPC-H Query 18:

```SQL
CREATE MATERIALIZED VIEW join_mv1
DISTRIBUTED BY hash(lo_orderkey)
CREATE VIEW q18_view
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_name
FROM lineorder_view1 INNER JOIN customer_view1
ON lo_custkey = c_custkey;
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
GROUP BY
l_orderkey having
sum(l_quantity) > 315
)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice;

CREATE MATERIALIZED VIEW q18_mv
DISTRIBUTED BY hash(c_custkey, o_orderkey)
REFRESH MANUAL
AS
SELECT * FROM q18_view;
```

During query rewrite, queries against `customer_view1` and `lineorder_view1` are automatically expanded to the base tables and then transparently matched and rewritten.
The materialized view can rewrite both the following queries:

```Plain
mysql> EXPLAIN LOGICAL SELECT * FROM q18_view;
+-------------------------------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------------------------------+
| - Output => [2:c_name, 1:c_custkey, 9:o_orderkey, 10:o_orderdate, 13:o_totalprice, 52:sum] |
| - SCAN [q18_mv] => [1:c_custkey, 2:c_name, 52:sum, 9:o_orderkey, 10:o_orderdate, 13:o_totalprice] |
# highlight-start
| MaterializedView: true |
# highlight-end
| Estimates: {row: 9, cpu: 486.00, memory: 0.00, network: 0.00, cost: 243.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 1:c_custkey := 60:c_custkey |
| 2:c_name := 59:c_name |
| 52:sum := 64:sum(l_quantity) |
| 9:o_orderkey := 61:o_orderkey |
| 10:o_orderdate := 62:o_orderdate |
| 13:o_totalprice := 63:o_totalprice |
+-------------------------------------------------------------------------------------------------------+
```

```Plain
mysql> EXPLAIN LOGICAL SELECT c_name, sum(`sum(l_quantity)`) FROM q18_view GROUP BY c_name;
+-----------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------+
| - Output => [2:c_name, 59:sum] |
| - AGGREGATE(GLOBAL) [2:c_name] |
| Estimates: {row: 9, cpu: 306.00, memory: 306.00, network: 0.00, cost: 1071.00} |
| 59:sum := sum(59:sum) |
| - EXCHANGE(SHUFFLE) [2] |
| Estimates: {row: 9, cpu: 30.60, memory: 0.00, network: 30.60, cost: 306.00} |
| - AGGREGATE(LOCAL) [2:c_name] |
| Estimates: {row: 9, cpu: 61.20, memory: 30.60, network: 0.00, cost: 244.80} |
| 59:sum := sum(52:sum) |
| - SCAN [q18_mv] => [2:c_name, 52:sum] |
# highlight-start
| MaterializedView: true |
# highlight-end
| Estimates: {row: 9, cpu: 306.00, memory: 0.00, network: 0.00, cost: 153.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 2:c_name := 60:c_name |
| 52:sum := 65:sum(l_quantity) |
+-----------------------------------------------------------------------------------------------------+
```

#### Rewrite queries with JOIN using view-based materialized views

StarRocks supports rewriting queries with JOINs between views or between views and tables, including aggregations upon JOINs.

For example, you can create the following views and materialized view:

```SQL
CREATE VIEW view_1 AS
SELECT
l_partkey,
l_suppkey,
sum(l_quantity) AS total_quantity
FROM lineitem
GROUP BY
l_partkey,
l_suppkey;


CREATE VIEW view_2 AS
SELECT
l_partkey,
l_suppkey,
sum(l_tax) AS total_tax
FROM lineitem
GROUP BY
l_partkey,
l_suppkey;


CREATE MATERIALIZED VIEW mv_1
DISTRIBUTED BY hash(l_partkey, l_suppkey)
REFRESH MANUAL AS
SELECT
v1.l_partkey,
v2.l_suppkey,
total_quantity,
total_tax
FROM view_1 v1
JOIN view_2 v2 ON v1.l_partkey = v2.l_partkey
AND v1.l_suppkey = v2.l_suppkey;
```

The materialized view can rewrite both the following queries:

```Plain
mysql> EXPLAIN LOGICAL
-> SELECT v1.l_partkey,
-> v2.l_suppkey,
-> total_quantity,
-> total_tax
-> FROM view_1 v1
-> JOIN view_2 v2 ON v1.l_partkey = v2.l_partkey
-> AND v1.l_suppkey = v2.l_suppkey;
+--------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------+
| - Output => [4:l_partkey, 25:l_suppkey, 17:sum, 37:sum] |
| - SCAN [mv_1] => [17:sum, 4:l_partkey, 37:sum, 25:l_suppkey] |
# highlight-start
| MaterializedView: true |
# highlight-end
| Estimates: {row: 799541, cpu: 31981640.00, memory: 0.00, network: 0.00, cost: 15990820.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 17:sum := 43:total_quantity |
| 4:l_partkey := 41:l_partkey |
| 37:sum := 44:total_tax |
| 25:l_suppkey := 42:l_suppkey |
+--------------------------------------------------------------------------------------------------------+
```

```Plain
mysql> EXPLAIN LOGICAL
-> SELECT v1.l_partkey,
-> sum(total_quantity),
-> sum(total_tax)
-> FROM view_1 v1
-> JOIN view_2 v2 ON v1.l_partkey = v2.l_partkey
-> AND v1.l_suppkey = v2.l_suppkey
-> group by v1.l_partkey;
+--------------------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------------------+
| - Output => [4:l_partkey, 41:sum, 42:sum] |
| - AGGREGATE(GLOBAL) [4:l_partkey] |
| Estimates: {row: 196099, cpu: 4896864.00, memory: 3921980.00, network: 0.00, cost: 29521223.20} |
| 41:sum := sum(41:sum) |
| 42:sum := sum(42:sum) |
| - EXCHANGE(SHUFFLE) [4] |
| Estimates: {row: 136024, cpu: 489686.40, memory: 0.00, network: 489686.40, cost: 19228831.20} |
| - AGGREGATE(LOCAL) [4:l_partkey] |
| Estimates: {row: 136024, cpu: 5756695.20, memory: 489686.40, network: 0.00, cost: 18249458.40} |
| 41:sum := sum(17:sum) |
| 42:sum := sum(37:sum) |
| - SCAN [mv_1] => [17:sum, 4:l_partkey, 37:sum] |
# highlight-start
| MaterializedView: true |
# highlight-end
| Estimates: {row: 799541, cpu: 28783476.00, memory: 0.00, network: 0.00, cost: 14391738.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 17:sum := 45:total_quantity |
| 4:l_partkey := 43:l_partkey |
| 37:sum := 46:total_tax |
+--------------------------------------------------------------------------------------------------------------------+
```

#### Rewrite queries using materialized views built upon external table-based views

You can build views upon tables in external catalogs and then materialized views upon the views to rewrite queries. The usage is similar to that for internal tables.

## External catalog-based materialized view rewrite

Expand Down Expand Up @@ -802,10 +1048,24 @@ For asynchronous materialized views created based on an external catalog, you ca

## Limitations

In terms of materialized view-based query rewrite, StarRocks currently has the following limitations:
In terms of materialized view query rewrite, StarRocks currently has the following limitations:

- StarRocks does not support rewriting queries with non-deterministic functions, including rand, random, uuid, and sleep.
- StarRocks does not support rewriting queries with window functions.
- Materialized views defined with statements containing LIMIT, ORDER BY, UNION, EXCEPT, INTERSECT, MINUS, GROUPING SETS, WITH CUBE, or WITH ROLLUP cannot be used for query rewrite.
- Strong consistency of query results is not guaranteed between base tables and materialized views built on external catalogs.
- Asynchronous materialized views created on base tables in a JDBC catalog do not support query rewrite.

In terms of view-based materialized view query rewrite, StarRocks currently has the following limitations:

- Currently, StarRocks does not support Partition Union rewrite.
- Query rewrite is not supported if the view contains random functions, including rand(), random(), uuid(), and sleep().
- Query rewrite is not supported if the view contains columns with same names. You must assign different aliases for columns with the same names.
- Views that are used to create a materialized view must contain at least one column of the following data types: integer types, date types, and string types. For example, you cannot create a materialized that queries the view, because `total_cost` is a DOUBLE-type column.

```SQL
CREATE VIEW v1
AS
SELECT sum(cost) AS total_cost
FROM t1;
```
Binary file added docs/zh/assets/Rewrite-view-based.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 51034de

Please sign in to comment.