-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] 1.8 slower partial parsing than 1.7 #10127
Comments
@d-cole Thank you so much for the exceptional repro steps and analysis! I am looking in to this. |
My local evaluation (on an M1 MacBook) shows that partial parsing of jaffle_shop_duckdb is slightly faster on 1.8 than it was on 1.7.3:
I think the profiles you attached are correct, but I don't think they give the full picture for this small project. Python's built-in profiler ignores time spent loading Python modules, which is a significant fraction of dbt's runtime on short runs. In addition, we improved startup time in 1.8 by moving a significant amount of work from module load time to post-load run time. So I think the amount of work stayed about the same, but now the built-in profiler actually notices it. I believe your claim that partial parsing slowed down for the larger project you described, but I think we'll need more information to get to the bottom of it. If you're able to supply profiles for the larger project, I might be able to find a solution. Ideally, I would recommend you use py-spy to collect the profiles in speedscope format, which would look like this:
I'm attaching the profiles I gathered during my investigation for reference. These can be viewed at https://www.speedscope.app/ dbt-1_7_3-parse-no-parrtial-parse.json |
@peterallenwebb Thank you!
Makes sense! On the real dbt project I initially gathered times using a script to wrap a parse call and took the median over 10 runs, I believe this won't have the same issues as the profiler. start_time = time.time()
subprocess.run(command, shell=True, check=True)
elapsed_time = time.time() - start_time Using py-spy we see similar results where the write_manifest time is longer in 1.8 (see screenshot below). I've attached the partial parse profiles below. Thanks again for taking a look! |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this your first time submitting a feature request?
Performance issue
In a large dbt project we noticed a 26% increase in partial parse time updating from 1.7.3 - 1.8.0. This was replicated in a smaller reproducible project. Measurements below are the median time over 10 runs of the same command:
Steps to reproduce
The results can be replicated in a reproducible way using https://github.com/dbt-labs/jaffle_shop_duckdb.
dbt run -r 1.7_parse_timing.txt parse
dbt run -r 1.7_partial_parse_timing.txt parse
dbt run -r 1.8_parse_timing.txt parse
dbt run -r 1.8_partial_parse_timing.txt parse
Results from partial parses are attached below.
1.7_partial_parse_timing.txt
1.8_partial_parse_timing.txt
The results shown in the record timing for dbt-duckdb match what we see in our large dbt project. That is, the parse_manifest time is significantly greater in 1.8 (5.6x greater in duckdb example 0.0568s -> 0.320s). This increase can be broken down into:
Who will this benefit?
Users of dbt 1.8+
The text was updated successfully, but these errors were encountered: