Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Table Features to specify the features needed to read/write to a table #1408

Closed
allisonport-db opened this issue Oct 3, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@allisonport-db
Copy link
Collaborator

allisonport-db commented Oct 3, 2022

Motivation

Delta table specifies two integers, called “protocol versions”, indicating the minimum required reader and writer version. The higher the number is, the more features are required to read or write the table correctly. However, one increment of the protocol version bundles multiple features, and therefore a reader/writer has to implement support for all bundled features to jump from one version to the next one – which causes significant work to Delta developers and thus gets pushback. Furthermore, the protocol versions are linear, so an upgrade to protocol version N+2 also requires implementing all features from version N+1, even if those features are not important and not commonly used.

Overview

This proposal is about replacing Delta reader/writer versions (though we still use it to indicate the existence of this new approach) in favor of Table Features, which is a list of features that must be supported to read/write a table properly. As a result, connectors can selectively implement certain features of their interest, instead of having to work on all of them.

Further details

See the design doc here for more details https://docs.google.com/document/d/1UZ4W4nnKH4x9t3hy0eh68P0RFRAchoHihdXftdnycDQ/edit?usp=sharing

@allisonport-db allisonport-db added the enhancement New feature or request label Oct 3, 2022
allisonport-db pushed a commit that referenced this issue Dec 16, 2022
This PR implements Table Features proposed in the feature request (#1408) and the PROTOCOL doc (#1450).

This PR implements the basic functionality, including
- The protocol structure and necessary APIs
- Protocol upgrade logic
- Append-only feature ported to Table Features
- Protocol upgrade path
- User-facing APIs, such as allowing referencing features manually
- Partial test coverage

Not covered by this PR:
- Adapt all features
- Full test coverage
- Make `DESCRIBE TABLE` show referenced features
- Enable table clone and time travel paths

Table Features support starts from reader protocol version `3` and writer version `7`. When supported, features can be **referenced** by a protocol by placing a `DeltaFeatureDescriptor` into the protocol's `readerFeatures` and/or `writerFeatures`.

A feature can be one of two types: writer-only and reader-writer. The first type means that only writers must care about such a feature, while the latter means that in addition to writers, readers must also be aware of the feature to read the data correctly. We now have the following features released:

- `appendOnly`: legacy, writer-only
- `invariants`: legacy, writer-only
- `checkConstriants`: legacy, writer-only
- `changeDataFeed`: legacy, writer-only
- `generatedColumns`: legacy, writer-only
- `columnMapping`: legacy, reader-writer
- `identityColumn`: legacy, writer-only
- `deletionVector`: native, reader-writer

Some examples of the `protocol` action:

```scala
// Valid protocol. Both reader and writer versions are capable.
Protocol(
  minReaderVersion = 3,
  minWriterVersion = 7,
  readerFeatures = {(columnMapping,enabled), (changeDataFeed,enabled)},
  writerFeatures = {(appendOnly,enabled), (columnMapping,enabled), (changeDataFeed,enabled)})

// Valid protocol. Only writer version is capable. "columnMapping" is implicitly enabled by readers.
Protocol(
  minReaderVersion = 2,
  minWriterVersion = 7,
  readerFeatures = None,
  writerFeatures = {(columnMapping,enabled)})

// Invalid protocol. Reader version does enable "columnMapping" implicitly.
Protocol(
  minReaderVersion = 1,
  minWriterVersion = 7,
  readerFeatures = None,
  writerFeatures = {(columnMapping,enabled)})
```

When reading or writing a table, clients MUST respect all enabled features.

Upon table creation, the system assigns the table a minimum protocol that satisfies all features that are **automatically enabled** in the table's metadata. This means the table can be on a "legacy" protocol with both `readerFeatures` and `writerFeatures` set to `None` (if all active features are legacy, which is the current behavior) or be on a Table Features-capable protocol with all active features explicitly referenced in `readerFeatures` and/or `writerFeatures` (if one of the active features is Table Features-native or the user has specified a Table Features-capable protocol version).

It's possible to upgrade an existing table to support table features. The update can be either for writers or for both readers and writers. During the upgrade, the system will explicitly reference all legacy features that are implicitly supported by the old protocol.

Users can mark a feature to be required by a table by using the following commands:
```sql
-- for an existing table
ALTER TABLE table_name SET TBLPROPERTIES (delta.feature.featureName = 'enabled')
-- for a new table
CREATE TABLE table_name ... TBLPROPERTIES (delta.feature.featureName = 'enabled')
-- for all new tables
SET spark.databricks.delta.properties.defaults.feature.featureName = 'enabled'
```
When some features are set to `enabled` in table properties and some others in Spark sessions, the final table will enable all features defined in two places:
```sql
SET spark.databricks.delta.properties.defaults.feature.featureA = 'enabled';
CREATE TABLE table_name ... TBLPROPERTIES (delta.feature.featureB = 'enabled')
--- 'table_name' will have 'featureA' and 'featureB' enabled.
```
Closes #1520

GitOrigin-RevId: 2b05f397b24e57f1804761b3242a0f29098a209c
scottsand-db pushed a commit that referenced this issue Jan 18, 2023
## Description

This PR proposes a change to the Delta Protocol to accommodate Table Features discussed in [[Feature Request] Table Features to specify the features needed to read/write to a table #1408](#1408).

TOC is updated using `doctoc`.

Not needed.

Closes #1450

Signed-off-by: Paddy Xu <[email protected]>
GitOrigin-RevId: 66a0b89a12d0c387a6ac03a1458ab4d64af5ac3d
@allisonport-db
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant