Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabled Models in schema files #5868

Merged
merged 22 commits into from
Sep 29, 2022
Merged

Disabled Models in schema files #5868

merged 22 commits into from
Sep 29, 2022

Conversation

emmyoop
Copy link
Member

@emmyoop emmyoop commented Sep 16, 2022

resolves #3992

Description

  • throw correct exception for models disabled in schema files
  • fix bug when chained disabled models incorrectly throw exception
  • Add tests

Checklist

@cla-bot cla-bot bot added the cla:yes label Sep 16, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@emmyoop
Copy link
Member Author

emmyoop commented Sep 16, 2022

This feels a little hacky but does resolve the exceptions thrown to be correct. I opened #5869 to explore the possible underlying issue.

@emmyoop emmyoop marked this pull request as ready for review September 16, 2022 16:08
@emmyoop emmyoop requested review from a team as code owners September 16, 2022 16:08
@emmyoop emmyoop requested review from nathaniel-may, VersusFacit and gshank and removed request for nathaniel-may and VersusFacit September 16, 2022 16:08
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a few changes, as described in the comments.

core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
# If this yaml file is enabled but the project config is not, we need to move
# the node from disabled to manifest.nodes
if patch.config.get("enabled"):
test_from = {"key": block.target.yaml_key, "name": block.target.name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "test_from" variation on add_node is only for tests, not regular nodes. I don't think we should need to do "add_node" plus "remove_node'. I would just do something like popping it from "disabled" and adding it to "nodes". Also we can't really do this if there are multiple disabled nodes, since we wouldn't know which one to enable. So I think we'll have to limit this to cases where there's only one disabled node with this unique_id. You also shouldn't need to do a ref_lookup to get the unique_id, the node will already have a unique_id in it. So there would be an if/else after "if patch.config.get("enabled") with len(found_nodes) == 1, plus throwing an error if len is more than 1.

I think it might be possible to just add the node to ref_lookup by manifest.ref_lookup().add_node(node) too. Also need to remove the node from disabled_lookup. Could either rebuild it or add a function to remove node. Probably simpler to rebuild it.

core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On thinking about this more, I think we're not handling some other cases. The precedence order is 1) dbt_project.yml config, 2) schema yaml config, 2) model config. So in theory the config in the model file could override the schema yaml config, and we have no code to move around the disabled/not disabled nodes for that case.

I really think we should switch to not having a separate dictionary for disabled, but that's probably for a later release.

For now I think that it might be best to always apply the patches to disabled nodes, except for the case where enabled is set to True and there is more than 1 matching disabled node (I kind of wish we never allowed that...) where we raise an error. Then when schema parsing is done, before refs are resolved, loop through nodes and disabled and make sure the nodes are in the right dictionary.

In addition, I think there's a hole in the "add_disabled" code that we didn't handle in the ticket for disabling metrics and exposures. It needs to be updated to check for test nodes (which used to be the only nodes from a SchemaSourceFile that could be updated), because the "test_from" piece only applies to tests.

Can you think of any additional holes?

@emmyoop emmyoop force-pushed the er/ct-232-model-schema-configs branch from bf44806 to e464f5d Compare September 26, 2022 15:49
@emmyoop emmyoop requested a review from gshank September 26, 2022 21:21
core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
@emmyoop emmyoop requested a review from gshank September 28, 2022 18:55
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment from last time seems to have gotten lost. The processing in parse_patch looks okay, but I think the 'process_nodes' code still has issues.

if node.config.enabled:
for dis_index, dis_node in enumerate(disabled):
# Remove node from disabled and unique_id from disabled dict if necessary
enable_nodes[dis_index] = dis_node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on this last time, but it looks like github lost it somehow... Making the index the key in the 'enable_nodes' dictionary means that if you have multiple enabled nodes at the same index, the first one will be overwritten.

I'm not sure why you are saving these in separate structures. It seems like for both looping through the nodes and looping through the disabled you could just move them when encountered.

Copy link
Member Author

@emmyoop emmyoop Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gshank I'm saving them as a separate structure because python doesn't like when you modify the dict you are looping through.

disabled on line 947 is the list of nodes for the unique id. Probably worth renaming if it's confusing. So the index is just the index of the list item. There can't be multiple nodes at a single index since each index represents a single node.. Now I see it. Will fix it.

@emmyoop emmyoop requested a review from gshank September 29, 2022 01:09
Comment on lines 893 to 900
# There are multiple disabled nodes for this model and the schema file wants to enable one.
# We have no way to know which one to enable.
msg = (
f"Found {len(found_nodes)} matching disabled nodes for '{patch.name}'. "
"Multiple nodes for the same unique id cannot be disabled in the schema "
"file. They must be disabled in `dbt_project.yml` or in the sql files."
)
raise ParsingException(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 can I get some input on the error message here, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tagging me in — this looks mostly good to me!

Could we include the resource type? Is that available from the patch?

How about:

Found {len(found_nodes)} matching disabled nodes for {patch.resource_type} '{patch.name}'.
Multiple nodes with the same unique_id cannot be disabled in yaml resource properties.

If you need to have multiple disabled nodes with the same names, you should instead disable them
using in-file config, or resource-path config in `dbt_project.yml`.

core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
core/dbt/parser/schemas.py Show resolved Hide resolved
core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
core/dbt/parser/schemas.py Outdated Show resolved Hide resolved
# make sure the nodes are in the manifest.nodes or the disabled dict,
# correctly now that the schema files are also parsed
disable_node_copy = deepcopy(self.manifest.nodes)
for node in disable_node_copy.values():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned by the overhead of deepcopying the entire nodes dictionary. Maybe we could have a list of disabled unique_ids, leave the 'add_disabled_nofile' in place, and just remove the disabled nodes afterward? I'm not so concerned by the disabled dictionary because it will be much smaller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's cleaner than creating a list of all nodes. 👍

@emmyoop emmyoop merged commit fa4f9d3 into main Sep 29, 2022
@emmyoop emmyoop deleted the er/ct-232-model-schema-configs branch September 29, 2022 20:24
github-actions bot pushed a commit that referenced this pull request Oct 3, 2022
* clean up debugging

* reword some comments

* changelog

* add more tests

* move around the manifest.node

* fix typos

* all tests passing

* move logic for moving around nodes

* add tests

* more cleanup

* fix failing pp test

* remove comments

* add more tests, patch all disabled nodes

* fix test for windows

* fix node processing to not overwrite enabled nodes

* add checking disabled in pp, fix error msg

* stop deepcopying all nodes when processing

* update error message

(cherry picked from commit fa4f9d3)
emmyoop added a commit that referenced this pull request Oct 3, 2022
* clean up debugging

* reword some comments

* changelog

* add more tests

* move around the manifest.node

* fix typos

* all tests passing

* move logic for moving around nodes

* add tests

* more cleanup

* fix failing pp test

* remove comments

* add more tests, patch all disabled nodes

* fix test for windows

* fix node processing to not overwrite enabled nodes

* add checking disabled in pp, fix error msg

* stop deepcopying all nodes when processing

* update error message

(cherry picked from commit fa4f9d3)

Co-authored-by: Emily Rockman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-232] Setting enabled to False in schema config does not move model to disabled
4 participants