Improve the Performance Characteristics of add_test_edges() #11092

peterallenwebb · 2024-12-03T22:41:48Z

Resolves #10950

Problem

The existing add_test_edges() function, executed on every dbt build, has poor performance characteristics. In the average case, it increases the number of edges in the graph by a factor of five. In extreme cases the factor can be 100 or more. This is causing slow run times and high memory usage for a small but important subset users. In the extreme case that kicked off this investigation, over 8,000,000 edges are being added, taking several minutes and consuming over a gigabyte of memory.

Solution

Create a new version of the add_test_edges() function, with the same overall behavior (as defined and explained in the code comments) but taking a faster approach which also adds fewer edges.

For now, this new behavior is behind the --use-fast-test-edges flag, also accessible via the DBT_USE_FAST_TEST_EDGES=True env var.

The before and after results across a set of >7K real world graph structures is recorded in this spreadsheet.

For each graph in the test set, the old algorithm and the new algorithm were run separately to produce two result graphs. The transitive closure of the result graphs were calculated and confirmed to be equal, meaning they impose the exact same restrictions on execution order.

Across the test set, the new algorithm saved a median of 134 edges and an average of 8227 edges, emphasizing the outsized role played by the worst-case graphs. The new algorithm was strictly faster on graphs of appreciable size (>20 nodes). The average speedup was 17x with a median of 9x.

In the most extreme case, the new algorithm added ~96,000 edges instead of ~8,000,000, and it completed in 0.27s instead of 140s.

Checklist

I have read the contributing guide and understand what's expected of me.
I have run this code in development, and it appears to resolve the stated issue.
This PR includes tests, or tests are not required or relevant for this PR.
This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
This PR includes type annotations for new and modified functions.

github-actions · 2024-12-03T22:42:05Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

github-actions · 2024-12-03T22:42:06Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2024-12-03T22:44:06Z

Codecov Report

Attention: Patch coverage is 22.89157% with 64 lines in your changes missing coverage. Please review.

Project coverage is 88.89%. Comparing base (1b7d9b5) to head (234956d).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11092      +/-   ##
==========================================
- Coverage   89.18%   88.89%   -0.29%     
==========================================
  Files         183      183              
  Lines       23783    23864      +81     
==========================================
+ Hits        21211    21215       +4     
- Misses       2572     2649      +77

Flag	Coverage Δ
integration	`86.22% <22.89%> (-0.35%)`	⬇️
unit	`62.02% <20.48%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`62.02% <20.48%> (-0.15%)`	⬇️
Integration Tests	`86.22% <22.89%> (-0.35%)`	⬇️

gshank

Yay! Looks great. Good comments. Nice optimizations.

New function to add graph edges.

7c13c39

cla-bot bot added the cla:yes label Dec 3, 2024

Clean up, leave out flag temporarily for testing.

333294f

peterallenwebb marked this pull request as ready for review December 4, 2024 20:32

peterallenwebb requested a review from a team as a code owner December 4, 2024 20:32

peterallenwebb added 2 commits December 4, 2024 15:34

Put new test edge behavior behind flag.

7557712

Final draft of documentaiton.

234956d

gshank approved these changes Dec 5, 2024

View reviewed changes

peterallenwebb merged commit afe25a9 into main Dec 5, 2024
52 of 56 checks passed

peterallenwebb deleted the paw/add_better_edges branch December 5, 2024 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the Performance Characteristics of add_test_edges() #11092

Improve the Performance Characteristics of add_test_edges() #11092

peterallenwebb commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

codecov bot commented Dec 3, 2024 •

edited

Loading

gshank left a comment

Improve the Performance Characteristics of add_test_edges() #11092

Improve the Performance Characteristics of add_test_edges() #11092

Conversation

peterallenwebb commented Dec 3, 2024 • edited Loading

Problem

Solution

Checklist

github-actions bot commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

codecov bot commented Dec 3, 2024 • edited Loading

Codecov Report

gshank left a comment

Choose a reason for hiding this comment

peterallenwebb commented Dec 3, 2024 •

edited

Loading

codecov bot commented Dec 3, 2024 •

edited

Loading