Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/redshift json parse #114

Merged
merged 47 commits into from
Feb 20, 2024
Merged

Bug/redshift json parse #114

merged 47 commits into from
Feb 20, 2024

Conversation

fivetran-catfritz
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz commented Feb 12, 2024

PR Overview

This PR will address the following Issue/Feature:

This PR will result in the following new package version:

  • v1.5.0: I made this breaking mostly since I changed the partitioning for bigquery and cluster for snowflake. It won't cause errors, but the warehouses won't change the partition unless there's a full refresh. Also I was feeling cautious since we are changing the logic of the json_parse macro and also adding the lookback for the incremental fivetran_platform__audit_table. Though tested, these are updates that could introduce unexpected behavior.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Breaking Changes

  • The following changes are marked as a breaking change out of caution, as a full refresh may be required if you are experiencing issues after the update.
  • For Bigquery and Databricks users, updated the partition_by config to coordinate with the filter used in the incremental logic.
  • For Snowflake users, added a cluster_by config for performance.

Feature Updates

  • Updated incremental logic for fivetran_platform__audit_table so that it looks back 7 days to catch any late arriving records.
  • Updated json parsing logic to prevent run failures when incoming json-like strings are invalid.
  • Added filter to fivetran_platform__connector_status so only necessary log records will be parsed.

Under The Hood

  • Added macros:
    • fivetran_log_json_parse to handle the updated json parsing.
    • fivetran_log_lookback for use in fivetran_platform__audit_table.
  • Updated testing of invalid json strings.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

  • See validation worksheet

If you had to summarize this PR in an emoji, which would it be?

🔍

@fivetran-catfritz fivetran-catfritz self-assigned this Feb 15, 2024
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz Thanks so much for working through this update!! I have only a few final questions to be addressed below. Coincidentally, when testing these changes I ran into a real world scenario where I had late arriving sync events from Fivetran. If we were using the old method those late events would have been missed; however, with this update they were captured successfully!

Additionally, one last request would be to doubly confirm with the original customers data that the JSON fix proposal will address this issue. I have 99% confidence that this does, but it would be great to fully validate that the compiled code works as expected with the customers data.

Let me know if you have any questions!

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
models/fivetran_platform__connector_status.sql Outdated Show resolved Hide resolved

{% macro default__fivetran_log_lookback(from_date, datepart='day', interval=7, default_start_date='2010-01-01') %}

coalesce(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a strange question, but what scenario would we need a default_start_date? Shouldn't there always by a max(date) on incremental runs? What scenario would there be where we need the coalesce to select the default start date?

Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I trialed removing this and it seems to compile and run fine. However for now I will leave this pending discussion on Tuesday.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just wanted to make sure there likely wouldn't be a scenario where this field would actually be null. I don't believe including it will hurt so I am comfortable leaving it in.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz thanks again for working through these updates! This PR looks good to go!

@fivetran-catfritz fivetran-catfritz merged commit 1627958 into main Feb 20, 2024
9 checks passed
Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants