Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Identity Column #1959

Closed
1 of 5 tasks
felipepessoto opened this issue Aug 3, 2023 · 14 comments
Closed
1 of 5 tasks

[Feature Request] Identity Column #1959

felipepessoto opened this issue Aug 3, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@felipepessoto
Copy link
Contributor

felipepessoto commented Aug 3, 2023

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Identity Column (writer version 6) as defined by https://github.com/delta-io/delta/blob/master/PROTOCOL.md#identity-columns.

Design doc: https://docs.google.com/document/d/1G8Vj6wOxswMx1JklllLoSn-obEpJ-iE_Lhpbd-RfIr4/edit?usp=sharing

PR:

Motivation

This is probably the biggest missing part in Open Source Spark Delta.

Further details

Willingness to contribute

@c27kwan volunteered to work on this feature and posted a design doc here.

@felipepessoto felipepessoto added the enhancement New feature or request label Aug 3, 2023
@felipepessoto
Copy link
Contributor Author

@dennyglee, @allisonport-db, do you have any update about this? This feature probably is the most important missing feature in OSS Delta.

Thanks.

@felipepessoto
Copy link
Contributor Author

@tdas any chance this can be prioritized for next release?

Thanks.

@keen85
Copy link

keen85 commented Feb 8, 2024

duplicate of #1072?

@felipepessoto
Copy link
Contributor Author

I think so. But I would update #1072 to be broader. The way the request is made it seems the Identity feature is already done, and it is only the DeltaTableBuilder API that is missing.

@c27kwan
Copy link
Contributor

c27kwan commented Mar 26, 2024

I'm interested on working on this!

@c27kwan
Copy link
Contributor

c27kwan commented Mar 27, 2024

I can't modify the main comment because i'm not a maintainer. Here's the design doc : https://docs.google.com/document/d/1G8Vj6wOxswMx1JklllLoSn-obEpJ-iE_Lhpbd-RfIr4/edit?usp=sharing

@felipepessoto
Copy link
Contributor Author

@c27kwan that is great.

Have you discussed with any of the maintainers about your intention to contribute? I’m asking because this is a big feature and I just want to make sure they aren’t internally working on it and they are open to accept your implementation.

Thanks.

@vkorukanti
Copy link
Collaborator

Hi @felipepessoto, we don't have anyone else working on this feature. Had an offline chat with @c27kwan before assigning the issue to @c27kwan. Feel free to look at the design doc and post any questions you have.

vkorukanti pushed a commit that referenced this issue Apr 11, 2024
## Description
This PR is part of #1959

In this PR, we introduce the IdentityColumnsTableFeature to test-only so
that we can start developing with it.

Note, we do not add support to minWriterVersion 6 yet to
properties.defaults.minWriterVersion because that will enable the table
feature outside of testing.

## How was this patch tested?
Existing tests pass. 

## Does this PR introduce _any_ user-facing changes?
No, this is a test-only change.
andreaschat-db pushed a commit to andreaschat-db/delta that referenced this issue Apr 16, 2024
## Description
This PR is part of delta-io#1959

In this PR, we introduce the IdentityColumnsTableFeature to test-only so
that we can start developing with it.

Note, we do not add support to minWriterVersion 6 yet to
properties.defaults.minWriterVersion because that will enable the table
feature outside of testing.

## How was this patch tested?
Existing tests pass. 

## Does this PR introduce _any_ user-facing changes?
No, this is a test-only change.
tdas pushed a commit that referenced this issue Apr 18, 2024
#### Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This PR is part of #1959

In this PR, we introduce IdentityColumn.scala, a common file which
contains most of the helpers for Identity Columns, necessary for
unblocking future PRs.

## How was this patch tested?
This PR commits dead code. Existing tests pass.

## Does this PR introduce _any_ user-facing changes?
No.
@tdas tdas added this to the 3.3.0 milestone Apr 19, 2024
scottsand-db pushed a commit that referenced this issue Apr 25, 2024
#### Which Delta project/connector is this regarding?

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

This PR is part of #1959

In this PR, we introduce the `GenerateIdentityValues` UDF used for
populating Identity Column values. The UDF is not used in Delta in this
PR yet.

`GenerateIdentityValues` is a simple non-deterministic UDF which keeps a
counter with the user specified `start` and `step`. It counts in
increments of `numPartitions` so that it can be parallelized in
different tasks.

## How was this patch tested?
New test suite and unit tests for the UDF.

## Does this PR introduce _any_ user-facing changes?
No.
allisonport-db pushed a commit that referenced this issue Apr 30, 2024
#### Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This PR is part of #1959
* We introduce `generateAlwaysAsIdentity` and
`generatedByDefaultAsIdentity`APIs into DeltaColumnBuilder so that users
can create Delta table with Identity column.
* We guard the creation of identity column tables with a feature flag
until development is complete.

## How was this patch tested?
New tests. 

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
Yes, we introduce `generateAlwaysAsIdentity` and
`generatedByDefaultAsIdentity` interfaces to DeltaColumnBuilder for
creating identity columns.
**Interfaces**
```
def generatedAlwaysAsIdentity(): DeltaColumnBuilder
def generatedAlwaysAsIdentity(start: Long, step: Long): DeltaColumnBuilder
def generatedByDefaultAsIdentity(): DeltaColumnBuilder
def generatedByDefaultAsIdentity(start: Long, step: Long): DeltaColumnBuilder
```
When the `start` and the `step` parameters are not specified, they
default to `1L`. `generatedByDefaultAsIdentity` allows users to insert
values into the column while a column specified
with`generatedAlwaysAsIdentity` can only ever have system generated
values.

**Example Usage**
```
// Creates a Delta identity column.
io.delta.tables.DeltaTable.columnBuilder(spark, "id")
      .dataType(LongType)
      .generatedAlwaysAsIdentity()
// Which is equivalent to the call
io.delta.tables.DeltaTable.columnBuilder(spark, "id")
      .dataType(LongType)
      .generatedAlwaysAsIdentity(start = 1L, step = 1L)
```
@c27kwan
Copy link
Contributor

c27kwan commented Jul 12, 2024

Sorry for the lack of update in 2.5 months -- I was on vacation for a month and haven't had opportunity to return to this. I've been talking to @zhipengmao-db and he volunteered to pick up the remainder of the implementation so we can make progress again. 🎉

allisonport-db pushed a commit that referenced this issue Jul 19, 2024
)

#### Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This PR is part of #1959

In this PR, we enable basic ingestion for Identity Columns. 
* We use a custom UDF `GenerateIdentityValues` to generate values when
not supplemented by the user.
* We introduce classes to help update and track the high watermark of
identity columns.
* We also do some cleanup/ improve readability for
ColumnWithDefaultExprUtils

Note: This does NOT enable Ingestion with MERGE INTO yet. That will come
in a follow up PR, to make this easier to review.

## How was this patch tested?
We introduce a new test suite IdentityColumnIngestionSuite.

## Does this PR introduce _any_ user-facing changes?
No.
vkorukanti pushed a commit that referenced this issue Aug 5, 2024
## Description
This PR is part of #1959 .
It adds support for clone and restore tables with identity columns.

## How was this patch tested?
Clone and restore related test cases.
vkorukanti pushed a commit that referenced this issue Aug 6, 2024
## Description
This PR is part of #1959
In this PR, we add SQL support for `ALTER TABLE ALTER COLUMN SYNC
IDENTITY`.

This is used for GENERATED BY DEFAULT Identity Columns, where a user may
want to manually update the identity column high watermark.

## How was this patch tested?
This PR adds a new test suite `IdentityColumnSyncSuite`.

## Does this PR introduce _any_ user-facing changes?
Yes. We introduce the SQL syntax `ALTER TABLE (ALTER| CHANGE) COLUMN?
<colName> SYNC IDENTITY` into Delta. This will update the high watermark
stored in the metadata for that specific identity column.
**Example Usage**
```
ALTER TABLE ALTER COLUMN id SYNC IDENTITY
ALTER TABLE CHANGE COLUMN id SYNC IDENTITY
ALTER TABLE ALTER id SYNC IDENTITY
ALTER TABLE CHANGE id SYNC IDENTITY
```

---------

Co-authored-by: zhipeng.mao <[email protected]>
Co-authored-by: Thang Long Vu <[email protected]>
scottsand-db pushed a commit that referenced this issue Aug 7, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

This PR is part of #1959
In this PR, we block unsupported operations on identity columns
including:

- ALTER TABLE ALTER COLUMN is not supported for IDENTITY columns.
- Providing values for GENERATED ALWAYS AS IDENTITY column <colName> is
not supported.
- PARTITIONED BY IDENTITY column <colName> is not supported.
- ALTER TABLE REPLACE COLUMNS is not supported for table with IDENTITY
columns.
- UPDATE on IDENTITY column <colName> is not supported.

## How was this patch tested?
A new test suite `IdentityColumnAdmissionScalaSuite` is added.

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?
Yes. The aforementioned operations on identity columns are blocked.

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
scottsand-db pushed a commit that referenced this issue Aug 8, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is part of #1959

In this PR, we extend the addColumn interface in DeltaTableBuilder to
allow for Identity Columns creation.

Resolves #1072

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

New tests.

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
We update the arguments of addColumn method: 
- Support a new data type for parameter `generatedAlwaysAs`. Users can
specify `generatedAlwaysAs` as `IdentityGenerator` to add an identity
column that is GENERATED ALWAYS.

- Add a new parameter `generatedByDefaultAs`. Users can specify
`generatedByDefaultAs` as `IdentityGenerator` to add an identity column
that is GENERATED BY DEFAULT.

- Users can optionally pass in `start` (default = 1) and `step` (default
= 1) values to construct `IdentityGenerator` object, which specify the
start and step value to generate the identity column.


Interface
```
 def addColumn(
        self,
        colName: str,
        dataType: Union[str, DataType],
        nullable: bool = True,
        generatedAlwaysAs: Optional[Union[str, IdentityGenerator]] = None,
        generatedByDefaultAs: Optional[IdentityGenerator] = None,
        comment: Optional[str] = None,
) -> "DeltaTableBuilder"
```
Example Usage

```
 DeltaTable.create()
    .tableName("tableName")
    .addColumn("id", dataType=LongType(), generatedAlwaysAs=IdentityGenerator())
    .execute()

 DeltaTable.create()
    .tableName("tableName")
    .addColumn("id", dataType=LongType(), generatedAlwaysAs=IdentityGenerator(start=1, step=1))
    .execute()

 DeltaTable.create()
    .tableName("tableName")
    .addColumn("id", dataType=LongType(), generatedByDefaultAs=IdentityGenerator())
    .execute()

 DeltaTable.create()
    .tableName("tableName")
    .addColumn("id", dataType=LongType(), generatedByDefaultAs=IdentityGenerator(start=1, step=1))
    .execute()
```

---------

Co-authored-by: Carmen Kwan <[email protected]>
scottsand-db pushed a commit that referenced this issue Aug 13, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is part of #1959.
The PR relaxes metadata conflict for identity column SYNC high water
mark operation. When winning transaction contains identity column
metadata change and the current transaction does not contain metadata
change, we mark the current transaction as no metadata conflict.

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
A new test suite.
## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
No.
scottsand-db pushed a commit that referenced this issue Aug 14, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This PR is part of #1959 .
It add more tests for Identity Column to test
- logging identity column properties and stats
- reading table should not see identity column properties
- compatibility with table of older protocols
- identity value generation starting at range boundaries of long data
type

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

## How was this patch tested?
Test only change.
<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
No.
allisonport-db pushed a commit that referenced this issue Aug 15, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

It makes `MergeIntoCommandBase` extend a trait
`SupportsNonDeterministicExpression` in Spark that logical plans can
extend to check whether it can allow non-deterministic expressions and
pass the CheckAnalysis rule.

`MergeIntoCommandBase` extends `SupportsNonDeterministicExpression` to
check whether all the conditions in the Merge command are deterministic.

This is harmless and allows more flexible usage of merge. For example,
we use a non-deterministic UDF to generate identity values for identity
columns, so it is required to allow non-deterministic expressions in
updated/inserted column values of merge statements in order to support
merge on target tables with identity columns. So this PR is part of
#1959.



## How was this patch tested?
New test cases.
<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->

Yes.
We are changing the behavior to allow non-deterministic expressions in
updated/inserted column values of merge statements. We still don't allow
non-deterministic expressions in conditions of merge statements.

e.g. 
We currently don't allow the merge statement to add a random noise to
the value that is inserted in merge

```
MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN UPDATE SET target.value = source.value + rand()
```

Now we are allowing this as this may be helpful in terms of data privacy
to not disclose the actual data while preserving the data properties
e.g. mean values etc.
allisonport-db pushed a commit that referenced this issue Aug 19, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is part of #1959 .
The change refactors `IdentityColumnTestUtils` to reuse
`createTableWithIdColAndIntValueCol` to create tables and to unify the
column names in identity column tests.
## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
This is test only change.
## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
No.
vkorukanti pushed a commit that referenced this issue Aug 22, 2024
…3594)

## Description
This PR is part of #1959 .
`IdentityColumnSuite` is flaky due to using duplicate table name
'identity_test' is used across tests. This PR generates all table names
in identity column related suites by using UUID to make them unique.

## How was this patch tested?
It is test only change.
vkorukanti pushed a commit that referenced this issue Aug 23, 2024
## Description
This PR is part of #1959 .

It supports MERGE command to provide system generated IDENTITY values in
INSERT and UPDATE actions. Unlike INSERT, where the identity columns
that needs writing are collected in
`WriteIntoDelta.writeAndReturnCommitData` exactly before writing in
`TransactionalWrite.writeFiles`, MERGE expressions are resolved earlier.

Specifically, we resolve the table's identity columns to track for high
water marks in `PreprocessTableMerge.apply`. The column set will be
passed to `OptimisticTransaction` and be written in
`TransactionalWrite.writeFiles`.

## How was this patch tested?
New test suite `IdentityColumnDMLScalaSuite`.
vkorukanti pushed a commit that referenced this issue Aug 26, 2024
## Description
This PR is part of #1959 .
We have implemented identity column support and all the tests passed. We
now can move identity column feature out of developer mode.

## How was this patch tested?
Existent tests.
longvu-db pushed a commit to longvu-db/delta that referenced this issue Aug 28, 2024
…3566)

## Description
This PR is part of delta-io#1959 .

It supports MERGE command to provide system generated IDENTITY values in
INSERT and UPDATE actions. Unlike INSERT, where the identity columns
that needs writing are collected in
`WriteIntoDelta.writeAndReturnCommitData` exactly before writing in
`TransactionalWrite.writeFiles`, MERGE expressions are resolved earlier.

Specifically, we resolve the table's identity columns to track for high
water marks in `PreprocessTableMerge.apply`. The column set will be
passed to `OptimisticTransaction` and be written in
`TransactionalWrite.writeFiles`.

## How was this patch tested?
New test suite `IdentityColumnDMLScalaSuite`.
longvu-db pushed a commit to longvu-db/delta that referenced this issue Aug 28, 2024
## Description
This PR is part of delta-io#1959 .
We have implemented identity column support and all the tests passed. We
now can move identity column feature out of developer mode.

## How was this patch tested?
Existent tests.
@tigerhawkvok
Copy link

Very exciting! Will this make it to the 3.2.1 release?

@zhipengmao-db
Copy link
Contributor

Very exciting! Will this make it to the 3.2.1 release?

It will be in 3.3 release.

@felipepessoto
Copy link
Contributor Author

@zhipengmao-db do you know the ETA to release 3.3.0?

If you are not planning additional changes, should we close this as done?

Thanks

@zhipengmao-db
Copy link
Contributor

@felipepessoto The ETA for 3.3.0 is 11/20. We could close it as done already. Thanks!

@felipepessoto
Copy link
Contributor Author

Completed #3598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

7 participants