-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn static parser on by default with the ability to shut it off. #3939
Conversation
308f591
to
e46c6c0
Compare
I have to sign off, but the only thing I can't get to pass yet is the testing the Opening the PR for reviews though so I can address any concerns when I return from PTO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly, looking really good! No blockers from me, just a couple of questions
logger.debug( | ||
f"1602: parser fallback to jinja because of extractor failure for {node.path}" | ||
f"1602: parser fallback to jinja rendering on {node.path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're turning on the experimental parser by default, worth calling out: we see a lot of this debug message in full re-parses. You figure ~40% of models, in a project with 1k models in it, means ~400 log lines like this. That's not necessarily a bad thing, and it's definitely preferable to the other (very misleading) log line that we currently print once per node (#3137)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh good call out. This is why I do not like that we're cornered into parsing the log for testing in this area of the code which is the only reason this message is here. We could add another log level "Test" that we put this stuff in for tests to read so we don't muddy the debug log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see how it's needed for tests. Ok, it's decidedly debug-level logging, I don't feel a need to be too precious about taking up space. We can reserve the right to swing back later and cut it.
) | ||
return "cannot_parse" | ||
|
||
def run_experimental_parser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is an exact dupe of the logic in run_static_parser
, except subbing out "static" for "experimental." I get that, at a future point where we have more experimental features to sample/enable, we'll want to do other things here. Is this how we want the code to be in the meantime? Honest q, I'm happy with either answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the team wanted this to logic to be deduplicated, I can totally get behind that. However, the rationale for this decision was to leave the scaffolding for what this feature actually needs to maximize maintainability. It's not just for the rest of the team, but also for future me when I forget what the plan was when we go to change this a few months down the line 😅
6a8b9a5
to
d8e1588
Compare
e1e4f6c
to
6925ceb
Compare
To resolve my previous comment, @gshank helped me figure out how to handle the environment variable issue I was having. Everything should work exactly as intended now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I do agree that all of those debug lines in the logs could make for a lot of noise. Could you open a testing ticket to add a 'test' log output? Just so we don't forget about it.
resolves #3377
Description
This change turns the static parser on by default, but also adds the ability to turn it off by introducing
--no-static-parser
which can be set via the cli, or an env var (STATIC_PARSER=false
) and in the profile config. The sampling logic used to help determine the correctness of the experimental parser is disabled as part of this PR, but the code paths remain intact. The plan is that the "experimental parser" will be the next version of the static parser that includes new features that need to be sampled for correctness in the wild. For this same reason, I chose to duplicate logic between the two functionsrun_static_parser
andrun_experimental_parser
so that we can easily add a next iteration of the static parser behind the experimental flag with minimal changes.To test this change, I heavily rely on parsing debug log lines to get at the internals of the function because this pattern is used in other difficult-to-test places as well. I chose to use numerical codes in these log lines so that the user-facing wording can change. I have no good reason behind my choice of the 1600 block of numbers for these codes. One disadvantage of this pattern as pointed out by @jtcohen6 is that debug log lines are actually user-facing, and this will generate one per model file which could be a lot. In a future change we could consider adding a new log level for tests to reduce noise for users debugging their dbt runs.
Reviewers
The feedback I would like the most is for the quality of my test coverage. Because of the sensitivity of the correctness of this feature, I would like to cover any gaps that you might see.
render_update
is a little bit complex now, and I would like any recommendations you have around making the flow simpler.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.