Improve memory efficiency of process_results by iterating. #217

peterallenwebb · 2024-05-16T14:54:29Z

resolves #218

Problem

The process of returning query results from execute() is memory inefficient, as multiple intermediate copies of the result data are maintained simultaneously.

In the case of docs generate, we are sometimes querying for information about every column in a schema. This can mean that a million or more records are returned in more extreme cases, resulting in gigabytes of memory allocation. In this scenario, maintaining multiple copies of the results, even temporarily, is untenable.

Solution

Yield data rows one by one from process_results() rather than returning every row as a list, to eliminate one full copy of the result table. We could still do more work in this direction, but I documented a 33% reduction in memory associated with the get_catalog query with this approach.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development, and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX

github-actions · 2024-05-16T14:54:44Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

ChenyuLInx

Thanks Peter!

Improve memory efficiency of process_results by iterating.

a3d1e17

cla-bot bot added the cla:yes The PR author has signed the CLA label May 16, 2024

Add changelog entry.

2941243

peterallenwebb marked this pull request as ready for review May 16, 2024 21:04

peterallenwebb requested a review from a team as a code owner May 16, 2024 21:04

ChenyuLInx approved these changes May 16, 2024

View reviewed changes

colin-rogers-dbt approved these changes May 16, 2024

View reviewed changes

colin-rogers-dbt merged commit fd33aaf into main May 16, 2024
15 checks passed

colin-rogers-dbt deleted the paw/process-results-iteration branch May 16, 2024 21:27

colin-rogers-dbt mentioned this pull request Sep 10, 2024

update dbt-common pin to >=1.8 #299

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory efficiency of process_results by iterating. #217

Improve memory efficiency of process_results by iterating. #217

peterallenwebb commented May 16, 2024 •

edited

Loading

github-actions bot commented May 16, 2024

ChenyuLInx left a comment

Improve memory efficiency of process_results by iterating. #217

Improve memory efficiency of process_results by iterating. #217

Conversation

peterallenwebb commented May 16, 2024 • edited Loading

Problem

Solution

Checklist

github-actions bot commented May 16, 2024

ChenyuLInx left a comment

Choose a reason for hiding this comment

peterallenwebb commented May 16, 2024 •

edited

Loading