-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/postgres performance #126
Conversation
fixed links
patch/update-macro-readme
Update README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-catfritz Thanks for investigating and working through this update! It is also great to see the performance improvements it brought with our sample data set. I am hopeful this will save runtimes for our customers using Postgres.
I do have one small callout to include a change in the CHANGELOG and for your eyes on my thoughts around a changing datatype in the json parse macro. Those are relatively small and I am comfortable approving this release. Let me know if you have any questions around my comment. Thanks!
@@ -28,7 +28,7 @@ | |||
{% macro postgres__fivetran_log_json_parse(string, string_path) %} | |||
|
|||
case when {{ string }} ~ '^\s*[\{].*[\}]?\s*$' -- Postgres has no native json check, so this will check the string for indicators of a JSON object | |||
then {{ string }}::json #>> '{ {%- for s in string_path -%}{{ s }}{%- if not loop.last -%},{%- endif -%}{%- endfor -%} }' | |||
then {{ string }}::jsonb #>> '{ {%- for s in string_path -%}{{ s }}{%- if not loop.last -%},{%- endif -%}{%- endfor -%} }' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is incredibly small, but we should call this out as a change in the CHANGELOG. Also, I am fairly certain this won't be breaking, but can you confirm that this datatype change won't cause any datatype changes for Postgres users? I tested myself and didn't see any issues when running prod then dev right after, but just want to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to under the hood. I also tested this and it would make sense there is no issue since the end results in either version are strings, and the cast to json is an intermediate step.
CHANGELOG.md
Outdated
- Updated the sequence of JSON parsing for model `fivetran_platform__audit_table` to reduce runtime. | ||
|
||
## Bug Fixes | ||
- Updated model `fivetran_platform__audit_user_activity` to correct the JSON parsing used to determine column `email`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be good to note that was causing fivetran_platform__audit_user_activity
to potentially have 0 rows
PR Overview
This PR will address the following Issue/Feature:
fivetran_log_json_parse.string_path
arg should be an array infivetran_platform__audit_user_activity
model #124This PR will result in the following new package version:
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
See internal ticket for data quality validation
Runtime comparison:
log
data limited to about 250k rows. This was done by filtering the staging log model using filterwhich produces these results:
v1.7.2 run: ~ 1hr
v1.7.3 run: ~ 10 mins
If you had to summarize this PR in an emoji, which would it be?
🌳