Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn static parser on by default with the ability to shut it off. #3939

Merged
merged 1 commit into from
Sep 29, 2021

Conversation

nathaniel-may
Copy link
Contributor

@nathaniel-may nathaniel-may commented Sep 22, 2021

resolves #3377

Description

This change turns the static parser on by default, but also adds the ability to turn it off by introducing --no-static-parser which can be set via the cli, or an env var (STATIC_PARSER=false) and in the profile config. The sampling logic used to help determine the correctness of the experimental parser is disabled as part of this PR, but the code paths remain intact. The plan is that the "experimental parser" will be the next version of the static parser that includes new features that need to be sampled for correctness in the wild. For this same reason, I chose to duplicate logic between the two functions run_static_parser and run_experimental_parser so that we can easily add a next iteration of the static parser behind the experimental flag with minimal changes.

To test this change, I heavily rely on parsing debug log lines to get at the internals of the function because this pattern is used in other difficult-to-test places as well. I chose to use numerical codes in these log lines so that the user-facing wording can change. I have no good reason behind my choice of the 1600 block of numbers for these codes. One disadvantage of this pattern as pointed out by @jtcohen6 is that debug log lines are actually user-facing, and this will generate one per model file which could be a lot. In a future change we could consider adding a new log level for tests to reduce noise for users debugging their dbt runs.

Reviewers

The feedback I would like the most is for the quality of my test coverage. Because of the sensitivity of the correctness of this feature, I would like to cover any gaps that you might see.

render_update is a little bit complex now, and I would like any recommendations you have around making the flow simpler.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Sep 22, 2021
@nathaniel-may nathaniel-may changed the title bump sampling back to 100 turn static parser on by default Sep 22, 2021
@nathaniel-may nathaniel-may force-pushed the static-parser-by-default branch 3 times, most recently from 308f591 to e46c6c0 Compare September 22, 2021 20:21
@nathaniel-may
Copy link
Contributor Author

nathaniel-may commented Sep 22, 2021

I have to sign off, but the only thing I can't get to pass yet is the testing the DBT_NO_STATIC_PARSER environment flag in test_postgres_env_no_static_parser. It looks like it's not actually setting the flag to true.

Opening the PR for reviews though so I can address any concerns when I return from PTO.

@nathaniel-may nathaniel-may marked this pull request as ready for review September 22, 2021 21:13
@nathaniel-may nathaniel-may changed the title turn static parser on by default Turn static parser on by default with the ability to shut it off. Sep 22, 2021
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly, looking really good! No blockers from me, just a couple of questions

core/dbt/main.py Outdated Show resolved Hide resolved
core/dbt/parser/models.py Outdated Show resolved Hide resolved
logger.debug(
f"1602: parser fallback to jinja because of extractor failure for {node.path}"
f"1602: parser fallback to jinja rendering on {node.path}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're turning on the experimental parser by default, worth calling out: we see a lot of this debug message in full re-parses. You figure ~40% of models, in a project with 1k models in it, means ~400 log lines like this. That's not necessarily a bad thing, and it's definitely preferable to the other (very misleading) log line that we currently print once per node (#3137)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh good call out. This is why I do not like that we're cornered into parsing the log for testing in this area of the code which is the only reason this message is here. We could add another log level "Test" that we put this stuff in for tests to read so we don't muddy the debug log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see how it's needed for tests. Ok, it's decidedly debug-level logging, I don't feel a need to be too precious about taking up space. We can reserve the right to swing back later and cut it.

)
return "cannot_parse"

def run_experimental_parser(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is an exact dupe of the logic in run_static_parser, except subbing out "static" for "experimental." I get that, at a future point where we have more experimental features to sample/enable, we'll want to do other things here. Is this how we want the code to be in the meantime? Honest q, I'm happy with either answer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the team wanted this to logic to be deduplicated, I can totally get behind that. However, the rationale for this decision was to leave the scaffolding for what this feature actually needs to maximize maintainability. It's not just for the rest of the team, but also for future me when I forget what the plan was when we go to change this a few months down the line 😅

@nathaniel-may nathaniel-may force-pushed the static-parser-by-default branch 3 times, most recently from 6a8b9a5 to d8e1588 Compare September 28, 2021 17:04
@nathaniel-may
Copy link
Contributor Author

To resolve my previous comment, @gshank helped me figure out how to handle the environment variable issue I was having. Everything should work exactly as intended now.

@nathaniel-may nathaniel-may removed the request for review from kwigley September 29, 2021 13:59
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I do agree that all of those debug lines in the logs could make for a lot of noise. Could you open a testing ticket to add a 'test' log output? Just so we don't forget about it.

@nathaniel-may
Copy link
Contributor Author

#3977

@nathaniel-may nathaniel-may merged commit 38eb46d into develop Sep 29, 2021
@nathaniel-may nathaniel-may deleted the static-parser-by-default branch September 29, 2021 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Tracking] Static / Experimental Parser (Tree Sitter)
3 participants