Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add open/closed range arguments for incremental #1991

Merged
merged 8 commits into from
Dec 10, 2024

Conversation

steinitzu
Copy link
Collaborator

Description

This allows configuring whether the incremental range is open or closed on both sides (start/end value).
Translates to changing the operators between > | >= / < | <= in sql database source and in incremental filtering.

Default is start_range=closed and end_range=open, meaning the exact initial value is included and the exact end value is excluded.
With start_range=open you get WHERE cursor > last_value instead of >=.
With end_range=closed you get ... cursor <= end_value so non-overlapping chunks are still possible without the start_value deduplication logic.

Related Issues

Additional Context

Copy link

netlify bot commented Oct 25, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 4388b73
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67524fff08e68f0008e6e6f8

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks pretty complete to me! let's work on docs and a few proposed improments

@@ -111,6 +112,8 @@ class Incremental(ItemTransform[TDataItem], BaseConfiguration, Generic[TCursorVa
row_order: Optional[TSortOrder] = None
allow_external_schedulers: bool = False
on_cursor_value_missing: OnCursorValueMissing = "raise"
range_start: TIncrementalRange = "closed"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not forget docstrings and docs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add this to TIncrementalArgs (so we can define those in REST API)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also when start_range is open - disable boundary deduplication. there's no reason to deduplicate. there's no boundary overlap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all here

dlt/extract/incremental/transform.py Show resolved Hide resolved
dlt/extract/incremental/transform.py Outdated Show resolved Hide resolved
filter_op = operator.le
filter_op_end = operator.gt
filter_op = operator.le if self.range_start == "closed" else operator.lt
filter_op_end = operator.gt if self.range_end == "open" else operator.ge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should document this in sql_database docs. we have separate chapter for incremental loading.
for example if we load incrementally by id or high resolution timestamp or we do not expect stuff to be added (ie. if we have a cursor on day) it is better to keep range open (that disables deduplication and produces faster code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a section on the sql docs, hope it's clear.

@steinitzu steinitzu force-pushed the incremental-open-closed-ranges branch from 762d7cf to 10e0770 Compare December 6, 2024 00:58
@steinitzu steinitzu marked this pull request as ready for review December 6, 2024 01:24
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very good now!

@rudolfix rudolfix merged commit 51b11d2 into devel Dec 10, 2024
58 of 59 checks passed
@rudolfix rudolfix deleted the incremental-open-closed-ranges branch December 10, 2024 22:35
donotpush pushed a commit that referenced this pull request Dec 11, 2024
* Add open/closed range arguments for incremental

* Docs for incremental range args

* Docstring

* Typo

* Ensure deduplication is disabled when range_start=='open'

* Cache transformer settings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support to customize incremental compare operators for sql_table
2 participants