-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTER TABLE EXECUTE + Hive OPTIMIZE (explicit TableExecuteNode) #9665
Conversation
d2629fe
to
2851668
Compare
2851668
to
39a35ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lots of editorials, nothing material. good job
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/BeginTableWrite.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/BeginTableWrite.java
Outdated
Show resolved
Hide resolved
PlanNode source = rewriteModifyTableScan(((SemiJoinNode) node).getSource(), handle); | ||
return replaceChildren(node, ImmutableList.of(source, ((SemiJoinNode) node).getFilteringSource())); | ||
} | ||
if (node instanceof JoinNode) { | ||
PlanNode source = rewriteModifyTableScan(((JoinNode) node).getLeft(), handle); | ||
return replaceChildren(node, ImmutableList.of(source, ((JoinNode) node).getRight())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we required that updated TS is on the left side of Join / SemiJoin.
I think this is an important property. I don't imagine how we could get a wrong SemiJoin, but getting flipped Join is certainly possible. cc @kasiafi
Is it still validated somewhere? or are we going to hit
trino/core/trino-main/src/main/java/io/trino/operator/AbstractRowChangeOperator.java
Line 161 in 7c1a277
checkState(source.isPresent(), "UpdatablePageSource not set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that is still enforced by logic in findTableScanHandleForDeleteOrUpdate
core/trino-spi/src/main/java/io/trino/spi/connector/BeginTableExecuteResult.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/BeginTableExecuteResult.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveTableExecuteHandle.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/RemoveEmptyTableExecute.java
Show resolved
Hide resolved
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public final class TableExecuteHandle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this handle identify, exactly? It's not clear from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated javadoc a bit. Hope it is better now.
core/trino-main/src/main/java/io/trino/metadata/AbstractCatalogPropertyManager.java
Show resolved
Hide resolved
core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
@@ -21,6 +21,7 @@ | |||
import io.trino.operator.OperationTimer.OperationTiming; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit should be folded into whichever commit introduce execution for ALTER TABLE EXECUTE. Otherwise, that other commit would be "broken".
In fact, I would squash all the commits related to adding the new feature into a single one before it gets merged. You can leave the unrelated changes separate (as long as they have value and stand on their own)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would squash all the commits related to adding the new feature into a single one before it gets merged
Yeah - that is the plan.
Docs PR here, can make additional updates as needed but otherwise waiting until this code is merged. #9682 |
f42d8f4
to
fadee63
Compare
AC. I will squash commits that are not self-contained later today or tomorrow. @martint please let me know if you want to dig deeper into this one. |
de2bbc2
to
aed3887
Compare
Squashed implementation commmits and rebased on top of current master |
aed3887
to
8469bd4
Compare
Allow using different key than catalog name for storing properites metadata. This is preparatory work for introducing table procedures, where each table procedure, even from single catalog, may allow different set of properties.
Commit adds SPI and execution support for new table procedures. New syntax extends ALTER TABLE family allowing for proper semantic analysis of target table (unlike what we have with CALL when table schema and table name are passed as string literals via table arguments). New syntax example: * ALTER TABLE <table> EXECUTE * ALTER TABLE <table> EXECUTE(value1, value2, ...) * ALTER TABLE <table> EXECUTE(param1 => value1, param2 => value2, ...) WHERE ... New table procedures allow for rewriting table data which makes them feasible for implementing data cleansing routines like: * compacting small files into larger ones for HIVE table * changing files sorting or bucketing scheme for a table * Iceberg OPTIMIZE Currently exectuion flow which _does not_ rewrite table data is not implemented. It will be implemented as a followup. Then current procedures available via `CALL` will be migrated to new mechanism. Procedures are exposed via connectors to engine via a set of new SPI methods and classes: * io.trino.spi.connector.TableProcedureMetadata * io.trino.spi.connector.TableProcedureExecutionMode * io.trino.spi.connector.ConnectorTableExecuteHandle * io.trino.spi.connector.ConnectorMetadata#getTableHandleForExecute * io.trino.spi.connector.ConnectorMetadata#getLayoutForTableExecute * io.trino.spi.connector.ConnectorMetadata#beginTableExecute * io.trino.spi.connector.ConnectorMetadata#finishTableExecute
Add support for compacting small files for non-transactional, non-bucketed Hive tables. ALTER TABLE xxxxx EXECUTE OPTIMIZE WITH(file_size_threshold = ...)
8469bd4
to
57e48d2
Compare
PR adds SPI and execution support for new table procedures.
New syntax extends ALTER TABLE family allowing for proper semantic
analysis of target table (unlike what we have with CALL when table
schema and table name are passed as string literals via table
arguments).
New syntax example:
ALTER TABLE <table> EXECUTE procedure
ALTER TABLE <table> EXECUTE procedure(value1, value2, ...)
ALTER TABLE <table> EXECUTE procedure(param1 => value1, param2 => value2, ...) WHERE ...
New table procedures allow for rewriting table data which makes them
feasible for implementing data cleansing routines like:
Currently exectuion flow which does not rewrite table data is not
implemented. It will be implemented as a followup. Then current
procedures available via
CALL
will be migrated to new mechanism.Procedures are exposed via connectors to engine via a set of new SPI
methods and classes:
io.trino.spi.connector.TableProcedureMetadata
io.trino.spi.connector.TableProcedureExecutionMode
io.trino.spi.connector.ConnectorTableExecuteHandle
io.trino.spi.connector.ConnectorMetadata#getTableHandleForExecute
io.trino.spi.connector.ConnectorMetadata#getLayoutForTableExecute
io.trino.spi.connector.ConnectorMetadata#beginTableExecute
io.trino.spi.connector.ConnectorMetadata#finishTableExecute