-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
degraded performance in docker container due to slow node parsing #2948
Comments
The slowdown inside the container is definitely not expected. I'm wondering if the relative time reflects a slower availability of Snowflake connections, or rather a slower parse + rendering time of each node in the project, for which the There was another thread about this log message a few weeks ago. dbt may need a database connection while parsing, so it acquires one—more eagerly than it should. While parsing each node, dbt renders it to understand its calls to As part of rendering, dbt runs Which ultimately logs the message you see: I believe that dbt should only be getting one novel connection, and then "re-acquiring" it in each subsequent message—but I may be wrong, and in any case, the logging here isn't all that intuitive. One obvious (though tricky) way around this is to avoid rendering most nodes, if we don't need to, by instead tapping into the Jinja AST and statically analyzing instances of @gshank I'd love to get your eyes on this issue, since you've been down these code paths more than I have |
It sounds like dbt is simply having a harder time parsing nodes in the container. The longest parsing time is for the One other thing I noticed in airflow were these logs:
Some of these connections are taking 10+ seconds to open/close...in the logs I posted initially, these calls take about 1-2s. Again, not sure if these are actual connections being opened in snowflake or if it's simply the logging. I'll rename this issue accordingly if it's not actually a problem with making calls to snowflake. |
I have an update here: When running dbt in a kubernetes pod operator, set this as the resources argument:
I believe the default levels are much too low for dbt, which is why the initial compilation takes so long. Now our project compiles in kube nodes within 20 seconds. |
@jtcohen6 shall i close this issue? |
@jtalmi Yes please!! Amazing find, thank you for contributing back your newfound knowledge. |
Describe the bug
dbt --no-partial-parse run
takes 20x longer in a lightweight docker container than locally, because of the snowflake connections calls in the parsing/compiling stage (everything before pre-hook). i've tested this running the container locally and deployed as a kubernetespodoperator in airflow.tl;dr: The time to get from:
Running with dbt=0.18.1
toRunning 1 on-run-start hook
takes a lot longer in my container than locally because of certain snowflake callsSteps To Reproduce
Here is my docker image:
This occurs with any dbt run, e.g.:
dbt --no-partial-parse run -m mymodel
I've narrowed it down to the specific snowflake calls that take 20 times longer in the container than locally.
Locally:
Acquiring new snowflake connection...
logs takes 33s overallIn container:
Acquiring new snowflake connection...
logs take 3 mins. Interestingly, the biggest jump is here is when acquiring the snowflake connection for the pre hook:^ three minute gap
Full logs
In container:
Expected behavior
dbt should work as fast in the container as it does locally.
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
Mac OS big sur (although container gets deployed on the astronomer kubernetes cluster)
The output of
python --version
:Additional context
Would love to have more context on what is actually happening with the "acquiring new snowflake connection" calls, and I recognize that this may be a simple issue with my docker image configuration
The text was updated successfully, but these errors were encountered: