Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce TableLastModifiedMetadataBatch and implement BaseAdapter.calculate_freshness_from_metadata_batch #127

Merged
merged 29 commits into from
Apr 12, 2024

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Mar 11, 2024

resolves #138
docs dbt-labs/docs.getdbt.com/#

Problem

No base abstraction / implementation exists for calculating metadata-based freshness in batch

Solution

  1. Introduce TableLastModifiedMetadataBatch capability, for adapter implementations to indicate whether a batch version of metadata-based freshness calculation is available
    2.Implement BaseAdapter.calculate_freshness_from_metadata_batch
  • groups sources into information schemas and runs 1 query / information schema under the hood, returning a response per query and a mapping of relations -> freshness results
  • I also considered updating calculate_freshness_from_metadata but the interfaces were different enough that it would have been difficult to do in a backward-compatible way. However, calculate_freshness_from_metadata was rewritten in terms of calculate_freshness_from_metadata_batch so there are some wins in terms of code reuse here.
  1. Added needs_conn to BaseAdapter.execute_macro for callers to signal whether a macro will require an open connection or not. By default, this is false (preserving existing behaviour), but calculate_freshness_from_metadata_batch sets it to true.

Checklist

Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@MichelleArk MichelleArk force-pushed the batch-metadata-freshness branch from 9f8da16 to c541ca2 Compare March 11, 2024 18:25
@MichelleArk MichelleArk force-pushed the batch-metadata-freshness branch from c541ca2 to 727093c Compare March 11, 2024 20:35
@MichelleArk MichelleArk force-pushed the batch-metadata-freshness branch from b4cdd48 to ed81529 Compare March 25, 2024 20:57
@MichelleArk MichelleArk changed the title [wip] Batch metadata freshness Introduce TableLastModifiedMetadataBatch and implement BaseAdapter.calculate_freshness_from_metadata_batch Mar 25, 2024
@MichelleArk MichelleArk marked this pull request as ready for review March 25, 2024 22:07
@MichelleArk MichelleArk requested a review from mikealfare March 25, 2024 22:07
Copy link
Contributor

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments/questions, some might even be useful.

dbt/adapters/base/impl.py Show resolved Hide resolved
dbt/adapters/base/impl.py Show resolved Hide resolved
dbt/adapters/base/impl.py Show resolved Hide resolved
dbt/adapters/base/impl.py Outdated Show resolved Hide resolved
dbt/adapters/base/impl.py Show resolved Hide resolved
@MichelleArk MichelleArk merged commit b65b761 into main Apr 12, 2024
13 checks passed
@MichelleArk MichelleArk deleted the batch-metadata-freshness branch April 12, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Applied State] Implement batch strategy for metadata-based source freshness computation
3 participants