You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In dbt v0.17.0-rc1, it appears that performance degrades greatly when a nontrivial number of sources are added to a project. This is a regression: I could not reproduce this performance failure mode in dbt v0.16.x.
I tested this out by repeatedly running dbt ls and recording runtimes. The data looks like:
sources
runtime (s)
time per node (s)
25
4.049509287
0.1619803715
50
7.128911018
0.1425782204
75
12.29483604
0.1639311473
100
18.71185374
0.1871185374
125
29.37486005
0.2349988804
150
38.06821609
0.2537881072
175
50.92429304
0.2909959602
200
67.39295316
0.3369647658
225
79.97533703
0.3554459423
This data indicates that a dbt ls command with 225 sources takes 1m20s to run. A corresponding dbt ls on 0.16.1 runs in 4s!
There may be some not-ideal algorithmic complexity issues to look into here. Additionally, the fixed cost for parsing a single source is super high. Most of this latency appears to come from the serialization and deserialization of data in the source patching part of the codebase.
The patch_source method accounts for the majority of the runtime of this dbt ls command, but notably, there are no sources to patch in my example project!
Can we skip the source patching code if the source is not patched?
The slowest parts of this execution are around serialization and deserialization in hologram (I think). Is there an easy way to make this serialization/deserialization significantly faster?
The output of dbt --version:
dbt v0.17.0-rc1
The operating system you're using: macOS
The output of python --version: 3.7.7
The text was updated successfully, but these errors were encountered:
I assure you, there's no way dbt 0.17.0rc1 is running with python 2.7.7. 😄
Maybe we should have people run dbt debug instead, so we can capture homebrew/virtualenv installs?
Describe the bug
In dbt v0.17.0-rc1, it appears that performance degrades greatly when a nontrivial number of sources are added to a project. This is a regression: I could not reproduce this performance failure mode in dbt v0.16.x.
I tested this out by repeatedly running
dbt ls
and recording runtimes. The data looks like:This data indicates that a
dbt ls
command with 225 sources takes 1m20s to run. A correspondingdbt ls
on 0.16.1 runs in 4s!There may be some not-ideal algorithmic complexity issues to look into here. Additionally, the fixed cost for parsing a single source is super high. Most of this latency appears to come from the serialization and deserialization of data in the source patching part of the codebase.
The
patch_source
method accounts for the majority of the runtime of thisdbt ls
command, but notably, there are no sources to patch in my example project!The relevant part of the codebase is around here:
https://github.com/fishtown-analytics/dbt/blob/75dbb0bc19376b2905d5bbb66284b9be3bf3c93c/core/dbt/parser/sources.py#L44-L68
Possible resolutions
The output of
dbt --version
:The operating system you're using: macOS
The output of
python --version
: 3.7.7The text was updated successfully, but these errors were encountered: