Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1584] [Feature] New top level commands: interactive compile #6358

Closed
Tracked by #6356
ChenyuLInx opened this issue Dec 2, 2022 · 3 comments · Fixed by #7008
Closed
Tracked by #6356

[CT-1584] [Feature] New top level commands: interactive compile #6358

ChenyuLInx opened this issue Dec 2, 2022 · 3 comments · Fixed by #7008
Assignees
Labels
cli enhancement New feature or request python_api Issues related to dbtRunner Python entry point

Comments

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Dec 2, 2022

Describe the feature(Updated Feb 9 after chatting with @aranke )

In dbt-rpc and dbt-server, we support using some kind of manifest object(in mem object for dbt-rpc and manifest.msgpack for dbt-server) that regenerated whenever a dbt project file got modified to to support interactive compile.
Here's the link of how it is done in dbt-rpc and dbt-server

For input options

One of the goals of creating such command is to get rid of the custom code we are doing in lib.py and dbt-server, and have dbt-server go through a proper interface in dbt-core for this functionalities. There are two main options we want to support

  1. --allow-introspection/--no-allow-introspection (name tbd)
  2. compile one existing model vs provide a --in-line(name tbd) option to allow user compile some code that's not attached to a specific model
    • the in line code supplied by user doesn't belong to a specific node in the current project, so we will have to create a temp node for it, resolve the ref/source/config, then do the compilation.(link)
    • for inline code things like {this} probably will be wrong
    • for a existing model we don't need to create a separate temp node

For output

  • all commands should log to stdout, and return the compiled code via the result object
  • for compile one mode, we should update the file in target dir

Other non functional requirement

  • We should have functional test test the input options
  • We should remove the existing code linked above to make the previous mentioned function happen or create followup ticket for removing them if directly removing those would cause issue downstream.
@ChenyuLInx ChenyuLInx added enhancement New feature or request triage labels Dec 2, 2022
@github-actions github-actions bot changed the title [Feature] New top level commands: interactive compile [CT-1584] [Feature] New top level commands: interactive compile Dec 2, 2022
@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Dec 2, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 2, 2022

a compile by any other name

What do we think of creating this "interactive compile" as a modification / extension of the existing compile command?

$ dbt compile --code "select * from {{ ref('my_model') }}" --no-allow-introspection
  • Should the argument to --code be base64 encoded (same as dbt-rpc), to support a full range of characters?
  • How should dbt-core "return" the compiled code? My instinct here is, a log event that shows up in stdout (for CLI users) and can be programmatically parsed (for dbt-server). But we could find a way to do what dbt-server expects right now, which is returning the RemoteCompileResult as a Python object.
  • We'll want a flag/argument for allowing/disallowing introspective queries, to achieve this conditional behavior

There are a few important ways in which this would be different from the dbt compile command that exists today.
If those differences feel too big/confusing, let's create a totally new command instead (e.g. dbt compile_code).

Differences:

  • dbt compile --code is mutually exclusive with selection criteria (dbt compile --select/--exclude/--selector). If the --code argument is provided, dbt should skip over node selection entirely.
  • If dbt compile receives the --code argument, it should not populate the adapter cache. This is a behavior of standard dbt compile, but we don't want the added overhead for interactively compiling exactly this query. (Should this be another optional flag? We used to have a --bypass-cache/--no-use-cache flag before v1.0; we could add it back in.)

Similarities:

  • Under the hood, the GenericSqlRunner we're using for this today inherits from CompileRunner and operates mostly the same. The distinctions are around exception handling (seems fine, we could support this), and no-op'ing ephemeral models (? not sure when/how this would come up).
  • I think dbt compile --code would still want the ability to use --defer, though! Imagine that you've built some models in your dev schema, the rest are in prod, and you want to interactively compile a query that touches both:
select * from {{ ref('changed_model_in_dev') }}
union all
select * from {{ ref('unchanged_model_in_prod') }}
$ dbt compile --defer --state path/to/prod/artifacts/ --code "c2VsZWN0ICogZnJvbSB7eyByZWYoJ2NoYW5nZWRfbW9kZWxfaW5fZGV2JykgfX0KdW5pb24gYWxsCnNlbGVjdCAqIGZyb20ge3sgcmVmKCd1bmNoYW5nZWRfbW9kZWxfaW5fcHJvZCcpIH19"
select * from dbt_jcohen.changed_model_in_dev
union all
select * from analytics.unchanged_model_in_prod

re: manifests

IMO, we don't need to support a CLI --manifest flag, as a way to skip over parsing. For local runs, this is already the implicit behavior achieved via partial parsing (using target/partial_parse.msgpack). We should support passing in an existing manifest only when accessing the top-level commands programmatically, via dbt-core's Python API. And, in that sense, we should support it in the exact same way for every single command that uses a manifest.

how do we imagine update that manifest.msgpack when something changes but before user do another command that could generate that file? Should we just load the outdated ones? or should we try to do something like partial parsing to check project and determine whether we need a full parsing?

IMO this is not in scope for dbt-core. We can operate under the assumption that, if the user is modifying the manifest at the same time they're running commands, it's their responsibility to manage state for those manifests, and pass in the appropriate one to the appropriate command. In that case, the role of dbt-core is to support that concurrency by not mucking with globals, and ensuring that Manifest A being used by Command X won't be incidentally mutated by Command Y which wants to use Manifest B.

last but not least

As part of this work, we should also deprecate the existing "sql tasks": https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/task/sql.py

@jtcohen6 jtcohen6 added python_api Issues related to dbtRunner Python entry point cli Team:Execution and removed Refinement Maintainer input needed labels Dec 2, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 5, 2022

Another consideration, not mentioned above: We should support interactive compilation of a specific model, within the context of that model, while still being able to skip over cache population. That will enable us to avoid this issue: dbt-labs/dbt-rpc#46

Think something like:

$ dbt compile --select specific_model --no-populate-cache --allow-introspection

That model's compiled SQL should then be included in the logs, and possibly also "returned" from the method (?) if called directly / programmatically.

@ChenyuLInx
Copy link
Contributor Author

ChenyuLInx commented Feb 9, 2023

for --allow-introspection vs --no-allow-introspection(name tbd), here's how different runner implement it. Link for without connect to warehouse, link for being able to connect to warehouse. And how we switch between those two now link
The only difference seems to be removing line

with self.adapter.connection_for(self.node):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement New feature or request python_api Issues related to dbtRunner Python entry point
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants