Improve Performance of Source/Model/Exposure extraction #20

kgpayne · 2021-04-15T17:59:56Z

With ~1 year of historical manifest.json and run_results.json data, we have started experiencing timeouts running --full-refresh of dbt_artifacts.


1832 | 2021-04-15 15:43:49: 2021-04-15 15:43:49,243 - root - INFO - Database Error in model dim_dbt__sources (models/incremental/dim_dbt__sources.sql)
-- | --
2021-04-15 15:43:49: 2021-04-15 15:43:49,243 - root - INFO - 000630 (57014): Statement reached its statement or warehouse timeout of 1,200 second(s) and was canceled.

Possible solutions:

Archive old data.
Replace the 3 individual models that extract Sources, Models and (soon) Exposures into 1 wide table, so json extraction happens once per artefact, rather than at least 3 times.
Snapshot the extracted json(?)
Unpack all required fields in the COPY command executed by either the run-operation or by Snowpipe. This is probably the most performant but least flexible. What happens when we want to extract more details from previous runs? Gets a bit difficult 😬 It does however solve the case where JSON objects become too big to fit into a single VARIANT (16mb).
Do something with the orchestration layer to avoid ever running full-refresh on these models - effectively relying on incremental models to only extract JSON values once per artefact. We'd probably use tags.

The text was updated successfully, but these errors were encountered:

kgpayne · 2021-07-07T11:09:15Z

@NiallRees FYI. We had resolved to try unpacking required fields in the COPY command here, in the hope of solving our 'too much history' problem at the same time as #29 🤔 Its less flexible, but ensures that json extract happens on load, meaning the base tables are flat and therefore not a problem during full refresh.

alanmcruickshank · 2022-02-07T10:34:19Z

I propose that I try and fix this at the same time as #62 . Proposed approach there.

All new and modified files for invocations vertical

alanmcruickshank mentioned this issue Feb 12, 2022

Flatten artifacts on load #84

Merged

NiallRees closed this as completed Mar 13, 2022

NiallRees pushed a commit that referenced this issue Jul 21, 2022

Merge pull request #20 from brooklyn-data/invocations_models

37d307e

All new and modified files for invocations vertical

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Performance of Source/Model/Exposure extraction #20

Improve Performance of Source/Model/Exposure extraction #20

kgpayne commented Apr 15, 2021 •

edited

Loading

kgpayne commented Jul 7, 2021

alanmcruickshank commented Feb 7, 2022

Improve Performance of Source/Model/Exposure extraction #20

Improve Performance of Source/Model/Exposure extraction #20

Comments

kgpayne commented Apr 15, 2021 • edited Loading

kgpayne commented Jul 7, 2021

alanmcruickshank commented Feb 7, 2022

kgpayne commented Apr 15, 2021 •

edited

Loading