Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL MERGE Fixes #14650

Merged
merged 10 commits into from
Oct 18, 2022
Merged

Conversation

djsstarburst
Copy link
Member

Description

This PR contains the accumulation of commits in PR #13926 that fix or improve the infrastructure developed to support SQL MERGE.

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 15, 2022
@djsstarburst djsstarburst requested a review from electrum October 15, 2022 22:02
@djsstarburst djsstarburst force-pushed the david.stryker/sql-merge-fixes branch 2 times, most recently from fea5a9e to f6f7981 Compare October 16, 2022 21:13
Copy link
Contributor

@ksobolew ksobolew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add regression tests for these fixes?

@djsstarburst djsstarburst force-pushed the david.stryker/sql-merge-fixes branch from f6f7981 to eabdc67 Compare October 17, 2022 18:47
@djsstarburst
Copy link
Member Author

djsstarburst commented Oct 17, 2022

Is it possible to add regression tests for these fixes?

We need to look at them one by one. With only one exception, we haven't seen these problems with SQL MERGE itself. Most of these problems showed up in tests in the PR that implements UPDATE and DELETE using MERGE plumbing, but migrating many of those tests would be difficult.

I'll take a look, and consult @electrum, and see what new tests it makes sense to add.

Thanks for looking @ksobolew!

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few minor comments

* - Values (0)
* </pre>
*/
public final class RemoveEmptyMergeWriterRuleSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this? Maybe TestRemoveEmptyDeleteRuleSet is a good example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be somewhat challenging. I spent an hour on it and didn't get the test working. I'll add the test in a second pass.

The bug showed up as an exception in BeginTableWrite.rewriteModifyTableScan.
However, that method doesn't need to be called for SQL MERGE.  It is only
needed if the connector operation requires an UpdatablePageSource, and
MERGE does not, so the bug is fixed by this one-liner.
Preprocess pages passed as inputs to MergeProcessorOperator
and MergeWriterOperator using the canonical pagePreprocessor.
This commit adds RemoveEmptyMergeWriterRuleSet, which
turns MergeWriter over an empty TableScan into an
empty values node, analogous to RemoveEmptyDeleteRuleSet.

Rename Pattern.merge() to Pattern.mergeWriter(), since
there are several kinds of merge nodes.
Null values aren't allowed in ImmutableLists, so use
an ArrayList for partition values, and update the
test to succeed when changing partition columns.
This commit keep track of the number of rows written
in DeltaLakeMergeSink.rewriteParquetFile, and only
commits the fileWriter if at least one row was written.
Otherwise it calls fileWriter.rollback().  This fixes
the Delte Lake tests that failed because of unexpected
filex.  I think this fix also applies to SQL MERGE.
In BeginTableWrite, when returning the MergeTarget,
get the table handle from the TableScanNode.
This commit adds HiveMetadata.getUpdateLayout and classes
HiveUpdateHandle and HiveUpdateBucketFunction.
KuduTableHandle had a boolean field isDeleteHandle.  If the
field was not set, rowIds would not be generated.  This
commit renames the field to more appropriate requiresRowId,
and sets it in KuduMetadata.beginMerge.  This fixes bugs
in delete and update using merge components.
@colebow
Copy link
Member

colebow commented Oct 19, 2022

Is there a release note that we should include for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants