Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption in docs generation #2009

Closed
drewbanin opened this issue Dec 16, 2019 · 0 comments · Fixed by #2037
Closed

Reduce memory consumption in docs generation #2009

drewbanin opened this issue Dec 16, 2019 · 0 comments · Fixed by #2037
Labels
enhancement New feature or request performance

Comments

@drewbanin
Copy link
Contributor

Describe the feature

The dbt docs generate command fetches all of the columns in all of the source and target schemas touched by dbt. As such, even moderately sized dbt projects can have catalogs containing hundreds of thousands of columns. In practice, we see that the dbt docs generate command can consume hundreds of megabytes of memory which causes challenges for tools which orchestrate dbt.

Let's do some profiling here to gauge the memory usage patterns of the dbt docs generate command and inspect parts of the codebase that might be responsible for ballooning memory usage. If there are any straightforward changes we can make to the dbt codebase to reduce the memory footprint of this command, we should try to implement them.

Describe alternatives you've considered

Download more RAM

Additional context

This doesn't appear to be database specific. The absolute numbers aren't so useful here, but I have seen a report of a catalog query which returns 140k records consuming over 600mb of memory. Some of the Agate operations that happen on the dataframe returned by the database might be likely culprits here.

@drewbanin drewbanin added enhancement New feature or request performance labels Dec 16, 2019
@drewbanin drewbanin added this to the Barbara Gittings milestone Dec 16, 2019
beckjake added a commit that referenced this issue Feb 6, 2020
…lake-catalogs

Feature: faster snowflake catalogs (#2009)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant