Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] make staging log model an append-strategy incremental table #82

Open
2 of 4 tasks
fivetran-jamie opened this issue May 11, 2023 · 2 comments
Open
2 of 4 tasks

Comments

@fivetran-jamie
Copy link
Contributor

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

Add logic to stg_fivetran__log to make it an incremental model. Since past records should not change at all once they are inserted, an append strategy would work great here and hopefully limit runtime a bunch!

Describe alternatives you've considered

  • keep it non-incremental
    • the log table can get super big and unwieldy
  • use another incremental strategy:
    • too complicated for what we need, though we need to be very careful to not insert duplicate records

Are you interested in contributing this feature?

  • Yes.
  • Yes, but I will need assistance and will schedule time during your office hours for guidance.
  • No.

Anything else?

No response

@fivetran-jamie fivetran-jamie mentioned this issue May 11, 2023
19 tasks
@fivetran-edkoo
Copy link

I have a Fivetran customer who is using our Fivetran Platform models but is seeing large costs in their BQ destination due to Fivetran log processing. They say

"I'm not sure which specific [query] may be responsible but looking at the BigQuery job logs directly I can see these three queries account for around $400 of that. These are the queries that start with the headers of:

/* {"app": "dbt", "dbt_version": "1.3.1", "profile_name": "fivetran", "target_name": "prod", "node_id": "model.fivetran_log.stg_fivetran_log__log"} /
/
{"app": "dbt", "dbt_version": "1.3.1", "profile_name": "fivetran", "target_name": "prod", "node_id": "model.fivetran_log.fivetran_log__connector_status"} /
/
{"app": "dbt", "dbt_version": "1.3.1", "profile_name": "fivetran", "target_name": "prod", "node_id": "model.fivetran_log.fivetran_log__schema_changelog"} */

I think primarily this all comes back to the source Fivetran log table (in BigQuery) being > 200 GB in size and the queries in these models reading all that data on every run, rather than perhaps just the past couple of days worth. I see a lot of tables being fully rebuilt every time."

Would this PR address this issue and help reduce the cost of processing the staging tables of the Fivetran Platform package?

@fivetran-catfritz
Copy link
Contributor

@fivetran-joemarkiewicz bumping this issue following our discussion on performance for this package and also based on my work on #126. stg_fivetran_log__log and the downstreams would make sense to make incremental, though probably not append strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants