Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Doc changes for InCommitTimestamps #3978

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions docs/source/delta-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,37 @@ Each time a checkpoint is written, Delta automatically cleans up log entries old
.. note::
Due to log entry cleanup, instances can arise where you cannot time travel to a version that is less than the retention interval. <Delta> requires all consecutive log entries since the previous checkpoint to time travel to a particular version. For example, with a table initially consisting of log entries for versions [0, 19] and a checkpoint at verison 10, if the log entry for version 0 is cleaned up, then you cannot time travel to versions [1, 9]. Increasing the table property `delta.logRetentionDuration` can help avoid these situations.

### In-Commit Timestamps

#### Overview
<Delta> 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modification timestamps. These modification timestamps are needed for various usecases e.g. time-travel to a specific time in the past. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication.

#### Feature Details
In-Commit Timestamps stores modification timestamps within the commit itself, ensuring they remain unchanged regardless of file system operations. This provides several benefits:

- **Immutable History**: Timestamps become part of the table's permanent commit history
- **Consistent Time Travel**: Queries using timestamp-based time travel produce reliable results even after table migration

Without the In-Commit Timestamp feature, <Delta> uses file modification timestamps as the commit timestamp. This approach has various limitations:

1. Data Migration Issues: When tables were moved between storage locations, file modification timestamps would change, potentially disrupting historical tracking
2. Replication Scenarios: Timestamp inconsistencies could arise when replicating data across different environments
3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries

#### Enabling the Feature
This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`:

```sql
ALTER TABLE <table_name>
SET TBLPROPERTIES ('delta.enableInCommitTimestamps' = 'true');
```

After enabling In-Commit Timestamps:
- Only new write operations will include the embedded timestamps
- File modification timestamps will continued to be used for historical commits performed before enablement

See the [Versioning](./versioning) section for more details around compatibility.

<a id="deltadataframewrites"></a>

## Write to a table
Expand Down
2 changes: 1 addition & 1 deletion docs/source/delta-drop-feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ You can drop the following Delta table features:
- `deletionVectors`. See [_](delta-deletion-vectors.md).
- `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in <Delta> 3.2.0 and above.
- `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in <Delta> 3.1.0 and above.

- `inCommitTimestamp`. See [In-Commit Timestamps Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps)
You cannot drop other [Delta table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).

## How are Delta table features dropped?
Expand Down
11 changes: 11 additions & 0 deletions docs/source/table-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,17 @@ properties are set. Available Delta table properties include:
| |
| Default: `classic` |
+-------------------------------------------------------------------------------------------+
| `delta.enableInCommitTimestamps` |
| |
| `true` for enabling the InCommitTimestamps table feature. |
| |
| |
| See [_](delta-batch.md#in--commit-timestamps). |
| |
| Data type: `Boolean` |
| |
| Default: `false` |
+-------------------------------------------------------------------------------------------+

.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
2 changes: 2 additions & 0 deletions docs/source/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ The following <Delta> features break forward compatibility. Features are enabled
Row Tracking, [Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-row-tracking.md)
Type widening (Preview),[Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-type-widening.md)
Identity columns, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns)
In-Commit Timestamps, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns)

<a id="table-protocol"></a>

Expand Down Expand Up @@ -113,6 +114,7 @@ The following table shows minimum protocol versions required for <Delta> feature
Vacuum Protocol Check,7,3,[Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check)
Row Tracking,7,3,[_](/delta-row-tracking.md)
Type widening (Preview),7,3,[_](/delta-type-widening.md)
In-Commit Timestamps,7,3,[In-Commit Timestamps Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps)

<a id="upgrade"></a>

Expand Down
Loading