Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pass string to process_template #31329

Merged
merged 3 commits into from
Dec 7, 2024
Merged

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Dec 6, 2024

SUMMARY

When extracting tables from a Jinja template we're passing a jinja2.nodes.Template object to the process_template method, but it expects a string. This can cause errors depending on the ENABLE_TEMPLATE_PROCESSING feature flag and the SQL dialect.

This PR fixes the logic to re-render the modified template before passing it as a string to process_template.

(Note that a jinja2.nodes.Template is an internal object, different from a jinja2.environment.Template.)

Why wasn't this caught before?

This only happens when ENABLE_TEMPLATE_PROCESSING is off. When the feature flag is ON we use the JinjaTemplateProcessor, which parses the string passed to process_template using env.from_string. Despite the name, env.from_string also works on jinja2.nodes.Template objects, so the template is rendered correctly and process_template returns valid SQL.

When the feature flag is OFF we use the NoOpTemplateProcessor to extract tables, which just returns str(sql) in its process_template. This converts the jinja2.nodes.Template into a string that can't be parsed correctly most of the time:

Template(body=[Output(nodes=[TemplateData(data='SELECT 1 FROM t')])])

Confusingly, the string above is a valid SQL expression depending on the dialect:

>>> import sqlglot
>>> sqlglot.parse("Template(body=[Output(nodes=[TemplateData(data='SELECT 1 FROM t')])])")
[Anonymous(
  this=Template,
  expressions=[
    EQ(
      this=Column(
        this=Identifier(this=body, quoted=False)),
      expression=Array(
        expressions=[
          Anonymous(
            this=Output,
            expressions=[
              EQ(
                this=Column(
                  this=Identifier(this=nodes, quoted=False)),
                expression=Array(
                  expressions=[
                    Anonymous(
                      this=TemplateData,
                      expressions=[
                        EQ(
                          this=Column(
                            this=Identifier(this=data, quoted=False)),
                          expression=Literal(this=SELECT 1 FROM t, is_string=True))])]))])]))])]
>>> sqlglot.parse("Template(body=[Output(nodes=[TemplateData(data='SELECT 1 FROM t')])])", "tsql")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/beto/.pyenv/versions/superset/lib/python3.11/site-packages/sqlglot/__init__.py", line 102, in parse
    return Dialect.get_or_raise(read or dialect).parse(sql, **opts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/beto/.pyenv/versions/superset/lib/python3.11/site-packages/sqlglot/dialects/dialect.py", line 919, in parse
    return self.parser(**opts).parse(self.tokenize(sql), sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/beto/.pyenv/versions/superset/lib/python3.11/site-packages/sqlglot/parser.py", line 1398, in parse
    return self._parse(
           ^^^^^^^^^^^^
  File "/Users/beto/.pyenv/versions/superset/lib/python3.11/site-packages/sqlglot/parser.py", line 1470, in _parse
    self.raise_error("Invalid expression / Unexpected token")
  File "/Users/beto/.pyenv/versions/superset/lib/python3.11/site-packages/sqlglot/parser.py", line 1511, in raise_error
    raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 68.
  Template(body=[Output(nodes=[TemplateData(data='SELECT 1 FROM t')])])

Fixes #31307 and #31183.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

I added a unit test capturing the bug. It failed before this fix.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@betodealmeida betodealmeida changed the title fix: pass string to process_template fix: pass string to process_template Dec 6, 2024
@pull-request-size pull-request-size bot added size/S and removed size/XS labels Dec 6, 2024
@betodealmeida betodealmeida marked this pull request as ready for review December 6, 2024 20:43
@pull-request-size pull-request-size bot added size/M and removed size/S labels Dec 7, 2024
@betodealmeida betodealmeida merged commit 9315a88 into master Dec 7, 2024
33 of 34 checks passed
@rusackas rusackas deleted the fix-sql-parsing-template branch December 9, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to parse SQL (Dialects.POSTGRES)
2 participants