-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add open/closed range arguments for incremental #1991
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks pretty complete to me! let's work on docs and a few proposed improments
@@ -111,6 +112,8 @@ class Incremental(ItemTransform[TDataItem], BaseConfiguration, Generic[TCursorVa | |||
row_order: Optional[TSortOrder] = None | |||
allow_external_schedulers: bool = False | |||
on_cursor_value_missing: OnCursorValueMissing = "raise" | |||
range_start: TIncrementalRange = "closed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not forget docstrings and docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add this to TIncrementalArgs
(so we can define those in REST API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also when start_range is open - disable boundary deduplication. there's no reason to deduplicate. there's no boundary overlap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added all here
filter_op = operator.le | ||
filter_op_end = operator.gt | ||
filter_op = operator.le if self.range_start == "closed" else operator.lt | ||
filter_op_end = operator.gt if self.range_end == "open" else operator.ge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document this in sql_database docs. we have separate chapter for incremental loading.
for example if we load incrementally by id or high resolution timestamp or we do not expect stuff to be added (ie. if we have a cursor on day) it is better to keep range open (that disables deduplication and produces faster code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a section on the sql docs, hope it's clear.
762d7cf
to
10e0770
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very good now!
* Add open/closed range arguments for incremental * Docs for incremental range args * Docstring * Typo * Ensure deduplication is disabled when range_start=='open' * Cache transformer settings
Description
This allows configuring whether the incremental range is open or closed on both sides (start/end value).
Translates to changing the operators between
> | >=
/< | <=
in sql database source and in incremental filtering.Default is
start_range=closed
andend_range=open
, meaning the exact initial value is included and the exact end value is excluded.With
start_range=open
you getWHERE cursor > last_value
instead of>=
.With
end_range=closed
you get... cursor <= end_value
so non-overlapping chunks are still possible without the start_value deduplication logic.Related Issues
Additional Context