Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix prepare temp tables fails if production table has no values #1039

Merged

Conversation

KatunaNorbert
Copy link
Member

@KatunaNorbert KatunaNorbert commented May 14, 2024

Fixes #1038.

  • Fixed lake raw drop and lake etl drop so that they both work with pdr lake raw drop ppss.yaml sapphire-network 2023-01-01
  • You can now drop rows from raw, or etl, and then resume building from where you left off, getting all the data loaded from the CSV
  • You can very quickly execute different loops of:
  1. update raw and view results, then drop part of raw: raw update -> describe -> raw drop partial
  2. update raw again and verify it went to full: raw update (to full) -> describe (see full)
  3. now update etl (to build full bronze tables) -> describe (see full bronze tables)
  4. now run lake raw update to get latest from gql and sync raw data to duckdb
  5. then run lake etl update to update the bronze tables based on the latest data you just fetched... only processing the new rows that have been fetched.
  6. or you can lake etl drop everything, then run lake etl update and have your bronze tables rebuild in a second.
  • ETL is using the right etl_tables (start_ts) and raw_tables (end_ts) mixture to calculate how to process data, and only process the new data required. Both bronze SQL queries also enforce this.

image

@KatunaNorbert KatunaNorbert marked this pull request as ready for review May 14, 2024 15:31
@idiom-bytes idiom-bytes self-requested a review May 15, 2024 03:29
)
pds.move_table_data(TempTable(table.table_name), table.table_name)

assert len(pds.query_data("SELECT * FROM {}".format(table.table_name))) == 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy to see this test written @KatunaNorbert!!! This covers the use-case bang on, and gives me confidence that things are working like we expected them to.

Note: Please look at the fixtures inside test_etl.py. Perhaps try re-writing this test after to use the existing setup and reduce the amount of code you need to write.

Copy link
Member

@idiom-bytes idiom-bytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @KatunaNorbert

Thank you for isolating everything in here, writing test coverage, and writing enough details throughout the ticket/PR for me to review and complete everything with ease

@idiom-bytes idiom-bytes merged commit 7b4c0f4 into issue685-duckdb-integration May 15, 2024
4 checks passed
@idiom-bytes idiom-bytes deleted the issue1038-update-fails-on-empty-table branch May 15, 2024 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants