Add the ability to perform dbt code/lint checks without actually connecting #3135

robbruce · 2021-03-01T16:22:23Z

Describe the feature

Currently, dbt compile connects to where the data is stored, this was unexpected as normally when code is compiled its not connecting to anything, it's just compiling.

A feature to allow checking if the engineer has written valid dbt code, so correct parameters passed into macros, .yml files structurally correct would be very useful.

Proposing a dbt check command for this.

Describe alternatives you've considered

Tried dbt compile, but it does more than just compile, it connects to the database.

Additional context

As part of a CI/CD pipeline, having the ability for code checks allows for system generated feedback to be provided as part of a peer review process.

Who will this benefit?

DBT Developers

Are you interested in contributing this feature?

Would contribute, but don't know where to start and if this is actually doable.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-03-02T10:49:33Z

Hey @robbruce, thanks for the thoughtful issue. You're right that the words here can be a bit confusing, and we could do more to clarify exactly what's happening.

dbt invocations, such as dbt run, are comprised of a few steps:

Parsing your project: reading and validating your files, constructing a DAG and manifest
Running metadata queries against your database to build a runtime cache
Stepping through the DAG, node by node. For each, dbt:
a. compiles its SQL
b. executes that SQL

The dbt compile command goes all the way to step 3a: It performs Jinja compilation of all models/tests/snapshots/etc in the project. Since some of those models may have dynamic templates, i.e. requiring metadata from the database, or introspective queries against objects already in the database, dbt compile requires a valid database connection. The only thing it doesn't do is make mutative changes to the database (materialization DDL/DML).

Instead, it sounds like what you want is step 1 only: parse the project, and make sure there are no Jinja/YAML syntax errors. There are two commands you can use today that do not require database connections, and which will catch that category of error:

dbt ls (docs): Returns a list of resources in your project matching a set of criteria. Additionally, writes target/manifest.json (unless you pass another flag).
dbt parse: Outputs detailed timing information to target/perf_info.json. (We need to add this to the docs!)

Does either of dbt ls or dbt parse get you what you're after?

robbruce · 2021-03-02T14:17:26Z

Hi @jtcohen6

I don't think either command does a code check for what we're asking for.

dbt parse still tries to connect, only tested this with Snowflake, but when using the --debug flag and against a non-existing Snowflake (ie. will dummy values), then these lines in the debug logs.

2021-03-02 10:37:50.069677 (MainThread): Acquiring new snowflake connection "fully qualified model"

dbt ls does the code checks, so gives what we need. It is limited by Jinja2's parsing, but that's understandable.

- checks to see if parameters are missing when using a macro results in cannot pickle 'Undefined' object
- checks to see if a macro was ended properly by removing the closing }} results in unexpected end of template, expected 'end of print statement'.
- checks to see if a macro was started properly, by removing the leading {{ results in no code failures

Scenario 3 doesn't raise an error if --strict flag is used either. However, the ls command will help in what we need!

jtcohen6 · 2021-03-02T17:36:48Z

Thanks for checking those out @robbruce!

The dbt parse command does not require a database connection. It does require a valid connection profile, defined in profiles.yml and specified by name in dbt_project.yml, but it doesn't actually do anything with it. You should able to run this command with your Internet turned off :) That logline has been confusing folks for a while, so I just opened an issue to reword or remove it (#3137).

checks to see if a macro was started properly, by removing the leading {{ results in no code failures

If you're calling my_macro in one of your models as select my_macro(arg1, arg2) instead of select {{ my_macro(arg1, arg2) }}, it's not checked/compiled at all—it's just a string in the template, i.e. SQL. It would be hard for dbt/Jinja to help out here.

FYI the --strict flag does not do much of anything on newer versions of dbt (>0.15.0), since we moved to using python dataclasses + hologram for internal type checking and object validation. You can read this comment for a full enumeration of its behavior. As such, it doesn't surprise me that the behavior with and without the --strict flag is identical.

I'm going to close this issue, since it sounds like the primary thing you're asking for—raise Jinja/YAML errors without connecting to the database—can be accomplished with existing commands.

robbruce added enhancement New feature or request triage labels Mar 1, 2021

jtcohen6 removed the triage label Mar 2, 2021

jtcohen6 mentioned this issue Mar 2, 2021

Reword or remove "Acquiring new ... connection" logline #3137

Closed

jtcohen6 closed this as completed Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to perform dbt code/lint checks without actually connecting #3135

Add the ability to perform dbt code/lint checks without actually connecting #3135

robbruce commented Mar 1, 2021

jtcohen6 commented Mar 2, 2021

robbruce commented Mar 2, 2021

jtcohen6 commented Mar 2, 2021

Add the ability to perform dbt code/lint checks without actually connecting #3135

Add the ability to perform dbt code/lint checks without actually connecting #3135

Comments

robbruce commented Mar 1, 2021

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

jtcohen6 commented Mar 2, 2021

robbruce commented Mar 2, 2021

jtcohen6 commented Mar 2, 2021