Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to perform dbt code/lint checks without actually connecting #3135

Closed
robbruce opened this issue Mar 1, 2021 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@robbruce
Copy link

robbruce commented Mar 1, 2021

Describe the feature

Currently, dbt compile connects to where the data is stored, this was unexpected as normally when code is compiled its not connecting to anything, it's just compiling.

A feature to allow checking if the engineer has written valid dbt code, so correct parameters passed into macros, .yml files structurally correct would be very useful.

Proposing a dbt check command for this.

Describe alternatives you've considered

Tried dbt compile, but it does more than just compile, it connects to the database.

Additional context

As part of a CI/CD pipeline, having the ability for code checks allows for system generated feedback to be provided as part of a peer review process.

Who will this benefit?

DBT Developers

Are you interested in contributing this feature?

Would contribute, but don't know where to start and if this is actually doable.

@robbruce robbruce added enhancement New feature or request triage labels Mar 1, 2021
@jtcohen6 jtcohen6 removed the triage label Mar 2, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Mar 2, 2021

Hey @robbruce, thanks for the thoughtful issue. You're right that the words here can be a bit confusing, and we could do more to clarify exactly what's happening.

dbt invocations, such as dbt run, are comprised of a few steps:

  1. Parsing your project: reading and validating your files, constructing a DAG and manifest
  2. Running metadata queries against your database to build a runtime cache
  3. Stepping through the DAG, node by node. For each, dbt:
    a. compiles its SQL
    b. executes that SQL

The dbt compile command goes all the way to step 3a: It performs Jinja compilation of all models/tests/snapshots/etc in the project. Since some of those models may have dynamic templates, i.e. requiring metadata from the database, or introspective queries against objects already in the database, dbt compile requires a valid database connection. The only thing it doesn't do is make mutative changes to the database (materialization DDL/DML).

Instead, it sounds like what you want is step 1 only: parse the project, and make sure there are no Jinja/YAML syntax errors. There are two commands you can use today that do not require database connections, and which will catch that category of error:

  1. dbt ls (docs): Returns a list of resources in your project matching a set of criteria. Additionally, writes target/manifest.json (unless you pass another flag).
  2. dbt parse: Outputs detailed timing information to target/perf_info.json. (We need to add this to the docs!)

Does either of dbt ls or dbt parse get you what you're after?

@robbruce
Copy link
Author

robbruce commented Mar 2, 2021

Hi @jtcohen6

I don't think either command does a code check for what we're asking for.

dbt parse still tries to connect, only tested this with Snowflake, but when using the --debug flag and against a non-existing Snowflake (ie. will dummy values), then these lines in the debug logs.

2021-03-02 10:37:50.069677 (MainThread): Acquiring new snowflake connection "fully qualified model"

dbt ls does the code checks, so gives what we need. It is limited by Jinja2's parsing, but that's understandable.

    • checks to see if parameters are missing when using a macro results in cannot pickle 'Undefined' object
    • checks to see if a macro was ended properly by removing the closing }} results in unexpected end of template, expected 'end of print statement'.
    • checks to see if a macro was started properly, by removing the leading {{ results in no code failures

Scenario 3 doesn't raise an error if --strict flag is used either. However, the ls command will help in what we need!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Mar 2, 2021

Thanks for checking those out @robbruce!

The dbt parse command does not require a database connection. It does require a valid connection profile, defined in profiles.yml and specified by name in dbt_project.yml, but it doesn't actually do anything with it. You should able to run this command with your Internet turned off :) That logline has been confusing folks for a while, so I just opened an issue to reword or remove it (#3137).

checks to see if a macro was started properly, by removing the leading {{ results in no code failures

If you're calling my_macro in one of your models as select my_macro(arg1, arg2) instead of select {{ my_macro(arg1, arg2) }}, it's not checked/compiled at all—it's just a string in the template, i.e. SQL. It would be hard for dbt/Jinja to help out here.

FYI the --strict flag does not do much of anything on newer versions of dbt (>0.15.0), since we moved to using python dataclasses + hologram for internal type checking and object validation. You can read this comment for a full enumeration of its behavior. As such, it doesn't surprise me that the behavior with and without the --strict flag is identical.

I'm going to close this issue, since it sounds like the primary thing you're asking for—raise Jinja/YAML errors without connecting to the database—can be accomplished with existing commands.

@jtcohen6 jtcohen6 closed this as completed Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants