-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to skip relation cache population #7307
Changes from all commits
65e29ed
a9356bd
150f9f2
e06eb00
46bc735
eb53272
80ea55c
78b1a0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Features | ||
body: Add --no-populate-cache to optionally skip relation cache population | ||
time: 2023-04-10T11:55:38.360997-05:00 | ||
custom: | ||
Author: stu-k | ||
Issue: "1751" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -371,6 +371,9 @@ def _mark_dependent_errors(self, node_id, result, cause): | |
self._skipped_children[dep_node_id] = cause | ||
|
||
def populate_adapter_cache(self, adapter, required_schemas: Set[BaseRelation] = None): | ||
if not self.args.populate_cache: | ||
return | ||
Comment on lines
+374
to
+375
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense to me! out of curiosity - is there no real difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no real difference, no. |
||
|
||
start_populate_cache = time.perf_counter() | ||
if get_flags().CACHE_SELECTED_ONLY is True: | ||
adapter.set_relations_cache(self.manifest, required_schemas=required_schemas) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 this actually means we are still retrieving all relations under a schema even if we are only running one models. Just at a later time.
It is better than before since compile now doesn't cache all relations. I am happy that this part doesn't change in this PR, but this is probably something we should consider another approach sometime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenyuLInx Good callout, and worth rethinking in the future. For now, this schema-level behavior is baked into how the cache works:
database.schema
, and up to a certain size, the slow bit is running that query in the DWH versus loading more information than strictly necessary into memorydatabase.schema
, and do all our lookups on that basis. If we trimmed down the relations we were looking for, we'd risk a false negative, where we think a relation isn't present in the schema but it actually is.I think something like this is definitely worth doing as a shorter-term win for "catalog" queries run during
docs generate
: