Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - fivetran_log__audit_table should be partitioned #27

Closed
3 of 9 tasks
CraigWilson-ZOE opened this issue Dec 13, 2021 · 4 comments
Closed
3 of 9 tasks

BUG - fivetran_log__audit_table should be partitioned #27

CraigWilson-ZOE opened this issue Dec 13, 2021 · 4 comments
Labels
type:bug Something is broken or incorrect

Comments

@CraigWilson-ZOE
Copy link

Are you a current Fivetran customer?
Craig Wilson, ZOE, Data Engineer

Describe the bug
We are monitoring how long each model in our dbt pipeline takes to process and the fivetran_log__audit_table model is one of the longest running that we have. The average execution time is 490 seconds.
Looking at the code I believe this model would benefit from being partitioned and processing only the latest day, rather than all data.

Steps to reproduce

  1. Run the dbt fivetran_log package.
  2. In the screenshot you can see the average run time per day for the package, and you can see this is steadily increasing slightly. Looking at this trend I would think it would increase indefinitely.

Screenshot 2021-12-13 at 13 35 20

Expected behavior
I would expect the time of execution to be more constant and not be as high as it currently is

Project variables configuration
only copying configuration for the relevant section due to security.

    # Fivetran log package configuration
    fivetran_log:
      fivetran_log_database: xxxxxx    # hidden for security
      fivetran_log_schema: fivetran_log
      fivetran_log_using_transformations: false # this will disable all transformation + trigger_table logic
      fivetran_log_using_triggers: false # this will disable only trigger_table logic

Package Version

packages:
  # includes dbt_utils, thus no need to seperately import it
  - package: calogica/dbt_date
    version: [">=0.4.0", "<0.5.0"]
  - package: fivetran/mixpanel
    version: [">=0.4.0", "<0.5.0"]
  - package: calogica/dbt_expectations
    version: [">=0.4.0", "<0.5.0"]
  - package: dbt-labs/codegen
    version: 0.4.0
  - package: data-mie/dbt_profiler
    version: 0.1.4
  - package: fivetran/stripe_source
    version: 0.4.3
  - package: fivetran/stripe
    version: 0.5.0
  - package: fivetran/fivetran_log
    version: [">=0.4.0", "<0.5.0"]

Warehouse

  • BigQuery
  • Redshift
  • Snowflake
  • Postgres
  • Databricks
  • Other (provide details below)

Additional context
N/A

Screenshots
Attached higher up

Please indicate the level of urgency
This isn't super urgent but it is taking more and more time, and is impacting cost as we are processing more and more rows each day.

Are you interested in contributing to this package?

  • Yes, I can do this and open a PR for your review.
  • Possibly, but I'm not quite sure how to do this. I'd be happy to do a live coding session with someone to get this fixed.
  • No, I'd prefer if someone else fixed this. I don't have the time and/or don't know what the root cause of the problem is.
@fivetran-jamie
Copy link
Contributor

hey @CraigWilson-ZOE -- we've added some incremental + partitioning logic to the audit table model in the feature/audit-incrementality working branch. would you mind testing the branch out to see how the runtime is affected? the first run will probably be a full-refresh and won't make a difference, but hopefully we see a big difference with the incremental runs

# packages.yml
  - git: https://github.com/fivetran/dbt_fivetran_log.git
    revision: feature/audit-incrementality

@CraigWilson-ZOE
Copy link
Author

Hi Jamie,

Sorry for the delay, just back from the holidays.

I will try this out and get back to you, thanks.

@fivetran-jamie
Copy link
Contributor

no worries -- if you have a chance to test it out soon, we were aiming to release the fix before our sprint ends this week 🙂

@CraigWilson-ZOE
Copy link
Author

Hey Jamie,

Just managed to try this, we had a few issues upgrading to v1.0.1.

The package ran OK and I can see a smaller number of records processed for the audit_table model.

Thanks

@fivetran-sheringuyen fivetran-sheringuyen added the type:bug Something is broken or incorrect label Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something is broken or incorrect
Projects
None yet
Development

No branches or pull requests

3 participants