Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate more complete compilation output #588

Closed
drewbanin opened this issue Nov 8, 2017 · 5 comments
Closed

Investigate more complete compilation output #588

drewbanin opened this issue Nov 8, 2017 · 5 comments
Assignees
Milestone

Comments

@drewbanin
Copy link
Contributor

drewbanin commented Nov 8, 2017

Presently, compiled output in the target/ directory only shows the main statement in a dbt resource. This is undesirable for mulit-step resources like archives, incremental models, or table models with flags applied (eg. --non-destructive). Instead, dbt should show every executed SQL statement involved in creating a resource, as well as pre- and post- hooks.

dbt writes rendered sql to the target/ directory twice: once during parsing and then again during model running.

Both "parsing" and "running" invoke the statement block. In this same Statement block, there's a line that says if name == 'main', then write the compiled output to a file. Instead, this block should write the compiled output to a file for every statement. The "main" statement should still be used to determine the overall "status" for the resource.

The write function exists in the dbt context, and is responsible for writing compiled sql out to a file. This method should be changed (or a new method should be created) which writes the output of multiple statements into the same file. This could either be implemented by 1) writing statement sql to a file using append mode or 2) buffering the entire compiled sql output for a resource, then writing the whole thing to a file when the resource has completed running.

@drewbanin drewbanin self-assigned this Nov 8, 2017
@drewbanin drewbanin added the bug Something isn't working label Nov 8, 2017
@drewbanin drewbanin added this to the 0.9.1 milestone Nov 10, 2017
@drewbanin drewbanin removed this from the 0.9.1 milestone Jan 2, 2018
@drewbanin drewbanin added this to the Jinja Improvements milestone Mar 6, 2018
@cmcarthur
Copy link
Member

add comments to each statement explaining what it is

@lewish
Copy link
Contributor

lewish commented Jul 10, 2018

Would it be possible to output all the compiled statements during dbt compile?
Currently only the raw select statements are written out during compile, so at the moment there isn't really any way to check what's going to run before actually running it.

Alternatively something like a --dry-run flag would be useful to run the parsing stage only, this might be preferable as compiling all the materialization macros, checking for existing tables etc, is possibly going to trigger a lot more DB calls than just a regular compile!

@drewbanin
Copy link
Contributor Author

hey @lewish! Yeah - I really want to do this. We prioritized it and deprioritized it for 0.9.1 because it proved to be a bigger chunk to bite off than we could manage at the time. This is scheduled for our next minor release, 0.11.0.

dbt previously had a --dry-run flag, but it proved to be pretty ineffective in our initial implementation. Our current thinking for --dry-run might look more like running an explain on the SQL to get some quick validation. More info on that here if you're interested

@drewbanin
Copy link
Contributor Author

@lewish thinking about this further, it's not exactly clear how dbt should handle introspective adapter functions during compilation. If you have a model like:

select *
from events

{% if adapter.already_exists(this.schema, this.table) %}
  where created_at > (select max(created_at) from {{ this }}
{% endif %}

I think our only real option here is to run the query. Ideally, we'd be able to cache this query so that it's only executed once during the whole parsing stage. Just wanted to point out that there's a bit of a challenge here, and there's sort of a tradeoff between speed and correctness.

@cmcarthur cmcarthur changed the title Compiled sql should show _all_ sql run for a model Investigate more complete compilation output Aug 29, 2018
@cmcarthur cmcarthur added estimate: 4 and removed bug Something isn't working labels Aug 29, 2018
@drewbanin
Copy link
Contributor Author

Example branch: https://github.com/fishtown-analytics/dbt/compare/investigate/compiled-output?expand=1

This is very doable, but will require non-trivial changes to the materialization code. Closing this as the "investigation" is complete. We should prioritize an issue in the future to actually implement the functionality described here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants