Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/yaml selections #2640

Merged
merged 3 commits into from
Jul 24, 2020
Merged

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Jul 22, 2020

resolves #2172

Description

This PR implements the advanced node selection syntax, mostly as described in #2172. I've made some changes to accommodate yaml syntax things, but I tried to keep with the spirit of the thing.

Here's a selectors.yml example file (taken from tests):

selectors:
  - name: views-or-foos-not-bars
    definition:
      union:
        - method: config.materialized
          value: view
        - tag: foo
        - exclude:
          - tag: bar

It defines a selector that is the all models materialized as views or tagged foo, except without any models tagged bar.

I didn't use hologram for parsing here - hologram (well, actually Python's type system) doesn't support recursive type definitions. Hologram also does a poor job of handling ambiguous specifications like this one: it's very hard to get hologram to support both the semi-arbitrary {'method': 'tag', 'value': 'foo'} spec and the fully-arbitrary-but-one-key {'tag': 'foo'} spec.

Selector definitions

There are 3 ways to define a simple selector:

definition:
  tag: foo

This parses to a list entry containing a dict like {'tag': 'foo'}. It's converted, but you can't use modifiers like @ or + here. We could add support for modifiers by examining the keys and values and taking any prefixes/suffixes, or more likely doing ':'.join([key, value]) and passing that in to the string parsing logic. The more I think about this, the more I think it would be good, if only for consistency with the next form.
We could also add support for exclude in this syntax if we wanted, though I think it makes the subtle distinction between tag: foo and tag:foo much more confusing.

definition:
  tag:foo

This parses to a string entry like 'tag:foo', which is then parsed like CLI arguments are. You can use modifiers like @ or + here, though yaml will want you to quote them: (- "@tag:foo")

definition:
  method: tag
  value: foo
  childrens_parents: true

This parses to a dictionary entry like {'method': 'tag', 'value': 'foo', 'childrens_parents': True}. This is what the string form is converted into, and then it goes down the same conversion route.

Combinations

Internally, this code still uses the same basic ideas introduced in previous PRs: You can combine values as unions, differences, and intersections of sets of selector definitions. The union and intersection combinations are themselves selector definitions, and can be used anywhere. Set differences are discussed below, but basically exclude can exist anywhere a selector definition could, or within a simple selector definition (which makes it... not so simple).

Only the third form of simple selector definition can be used with exclude. For example, to do what dbt run --exclude @tag:foo does:

- method: fqn
   value: "*"
   exclude:
   - "@tag:foo"

Of course, you can define a one-element union with exclusions if you prefer that syntax:

- union:
    - "fqn:*"
    - exclude:
      - "@tag:foo"

Set differences

The 'exclude' key is the only way to specify set differences. It accepts a list of definitions that are then unioned together. I could definitely be convinced that that's wrong and the value should instead be just a definition. My reasoning rests on the assertion that (at least in Python!) difference(a, union(b, c, d)) is the same as difference(a, b, c, d), which I feel reasonably confident about.

I think it'd be reasonable to add a difference key that acts as an explicit set difference. It would be its own value, as opposed to exclude, which I think of as modifying its parents: {method: 'fqn', value: '*', exclude: ['@tag:foo']} actually becomes difference("fqn:*", "@tag:foo").

Exclude syntax

The syntax isn't immensely satisfying to me, especially around exclude. Currently, a union with exclusions has exclude: as one of its elements. Does it make more sense for exclude: to live on the same level as union?:

name: my-selector
definition:
  union:
    - tag:foo
    - config.materialized: view
  exclude:
    - tag: bar

I don't feel like this is really better at all (I think the lack of indentation makes it hard to parse mentally at a glance), but I don't have great taste on this kind of thing.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

Jacob Beck added 2 commits July 22, 2020 07:58
This doesn't use hologram, as it needs more permissive schemas than what hologram provides.
Added some unit tests
Added tests
Added RPC and CLI support
@cla-bot cla-bot bot added the cla:yes label Jul 22, 2020
@beckjake beckjake marked this pull request as ready for review July 22, 2020 16:22
@beckjake beckjake requested review from jtcohen6 and kwigley July 22, 2020 16:22
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very excited about this! I read through the description yesterday, wrote up some examples for docs last night, and then checked to see if my pseudo-syntax worked in practice. Almost perfect.

I'm fine with using exclude syntax for set differences, for now. I agree that it feels more intuitive to have exclude live as a nested object within union (or single-arg definition). "Give me all of this/these, except this carved-out subset."

I think this syntax will get us a lot of mileage before we need to add any more complexity. I'm confident there are talented community members who will find interesting places to go with these selectors + YML anchors.

@beckjake beckjake merged commit 3ec911b into dev/marian-anderson Jul 24, 2020
@beckjake beckjake deleted the feature/yaml-selections branch July 24, 2020 20:30
@mlavoie-sm360
Copy link

Can we make suggestions?

When executing a run using a --select, it would be nice if the logs printed the expanded version of that selector, i.e.

selectors:
  - name: my_selector
    definition:
      union:
        - method: tag
          value: something
          parents: true
        - exclude:
            - method: tag
              value: something_else

Would then print out something like:

dbt run --select my_selector
Running with dbt=0.18.0-rc1
Found 562 models, 578 tests, 8 snapshots, 177 analyses, 323 macros, 4 operations, 40 seed files, 198 sources

15:40:52 | Executing dbt run --models +tag:something --exclude tag:something_else
15:40:52 | 
15:40:52 | Running 2 on-run-start hooks
...

@jtcohen6
Copy link
Contributor

@mlavoie-sm360 Suggestions very welcome! There's a separate open issue (#2700) around making YAML selectors user-friendlier, specifically for better logging / errors. Would you mind posting your comment there so we don't lose track of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Advanced node selection syntax
4 participants