Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial cdc options for sql databases #1643

Merged
merged 5 commits into from
Jan 12, 2024
Merged

Add initial cdc options for sql databases #1643

merged 5 commits into from
Jan 12, 2024

Conversation

kartik4949
Copy link
Contributor

Description

#1614

Related Issues

Checklist

  • Is this code covered by new or existing unit tests or integration tests?
  • Did you run make unit-testing and make integration-testing successfully?
  • Do new classes, functions, methods and parameters all have docstrings?
  • Were existing docstrings updated, if necessary?
  • Was external documentation updated, if necessary?

Additional Notes or Comments

@kartik4949 kartik4949 requested a review from blythed January 4, 2024 16:45
@kartik4949 kartik4949 self-assigned this Jan 4, 2024
@kartik4949 kartik4949 marked this pull request as draft January 4, 2024 16:46
stop_event: Event,
identifier: 'str' = '',
timeout: t.Optional[float] = None,
strategy: t.Dict = {'strategy': 'polling', 'options': {'frequency': 3600, 'auto_increment_field': None}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this get injected into the class? From CFG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm .. good question
no its not injected in CFG

Copy link
Collaborator

@blythed blythed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is the strategy to use the current CDC code, but simply with a new producer?

@kartik4949 kartik4949 marked this pull request as ready for review January 8, 2024 19:56
@kartik4949 kartik4949 requested a review from blythed January 8, 2024 19:56
@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (34830a7) 80.33% compared to head (87828d8) 67.42%.
Report is 1374 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1643       +/-   ##
===========================================
- Coverage   80.33%   67.42%   -12.92%     
===========================================
  Files          95      118       +23     
  Lines        6602     8371     +1769     
===========================================
+ Hits         5304     5644      +340     
- Misses       1298     2727     +1429     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kartik4949
Copy link
Contributor Author

So, is the strategy to use the current CDC code, but simply with a new producer?

@blythed exactly!

Copy link
Collaborator

@jieguangzhou jieguangzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

I think we should add a simple integration test to test the fault tolerance of CDC tasks, such as the behavior after obtaining the latest batch of data and predicting errors in certain two data.

It doesn’t have to be added now, it can be used as a TODO item.

superduperdb/backends/ibis/cdc/listener.py Outdated Show resolved Hide resolved
superduperdb/backends/ibis/cdc/listener.py Show resolved Hide resolved
test/integration/test_cdc.py Outdated Show resolved Hide resolved
superduperdb/backends/ibis/cdc/listener.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blythed blythed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good!

Some questions:

How do we activate this class via configuration?

Can we document the developer journey to setting up this class to work
with some SQL database?

@kartik4949
Copy link
Contributor Author

Looking really good!

Some questions:

How do we activate this class via configuration?

Can we document the developer journey to setting up this class to work with some SQL database?

@blythed
Sure, I can add a documentation as separate pr

so following is basic usage:

from superduperdb import superduper

db = superduper('SQL/URI')

db.cdc.start()

#or

table = Table('my_table')
strategy = PollingStrategy(type='incremental', frequency=0.5, auto_incremental_field='id_field')

'''
# Here type could be either 
`incremental` meaning user has a incremental field in table

or 
`join_id` meaning user does not have an incremental field in table and we create separate metadata table where we store processed ids and `ANTI left join` on user table.
'''

db.cdc.listen(on=table, strategy=strategy)

Copy link
Collaborator

@blythed blythed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

@kartik4949 kartik4949 merged commit 56c4524 into superduper-io:main Jan 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants