Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-549] [CT-2577] [Regression] Error Only %s and %% are supported in the query #441

Closed
2 tasks done
craigchurch opened this issue May 12, 2023 · 27 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working regression

Comments

@craigchurch
Copy link

craigchurch commented May 12, 2023

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

URLs in markdown files that have a percent sign are causing the error

Only %s and %% are supported in the query.

For example this markdown will trigger the error message:

{% docs some_doc %}
Please see this URL:  [Reporting and Analytics Projects](https://somewebsite.com/Reporting%20and%20Analytics%20Projects/AllItems.aspx)
{% enddocs %}

Also a single percent sign in the .md file will trigger the error. For example:

{% docs some_doc %}
80% of statistics are made up on the spot
{% enddocs %}

Also a single percent sign in the sql file will trigger the error. Such as a percent sign in a comment in the SQL like this:

select *
from a_table --this is a table where 50% of the records have a null start_date

Expected/Previous Behavior

These errors were not present in version dbt 1.2.

Steps To Reproduce

Build the dbt project

Relevant log output

2023-05-11T22:27:30.313-06:00	�[0m04:27:30 �[33mRuntime Error in model reporting_and_analytics_projects_init (models/work/initial/sharepoint/reporting_and_analytics_projects_init/reporting_and_analytics_projects_init.sql)�[0m

2023-05-11T22:27:30.313-06:00	�[0m04:27:30 Only %s and %% are supported in the query.

Environment

- OS: debian linux
- Python:Python 3.8.16
- dbt (working version): 1.2.1
- dbt (regression version): 1.5.0

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@craigchurch craigchurch added bug Something isn't working regression triage labels May 12, 2023
@github-actions github-actions bot changed the title [Regression] Error Only %s and %% are supported in the query [CT-2577] [Regression] Error Only %s and %% are supported in the query May 12, 2023
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core May 12, 2023
@github-actions github-actions bot changed the title [CT-2577] [Regression] Error Only %s and %% are supported in the query [ADAP-549] [CT-2577] [Regression] Error Only %s and %% are supported in the query May 12, 2023
@dbeatty10
Copy link
Contributor

Thank you for reporting this @craigchurch !

In dbt 1.5 we switched from using psycopg2 to redshift_connector, and I suspect your report has a common root cause as #432.

This is a delightful example of the regression -- 100% guaranteed to get a laugh 😂

80% of statistics are made up on the spot

@dbeatty10 dbeatty10 removed the triage label May 12, 2023
@jaklan
Copy link

jaklan commented May 15, 2023

@dbeatty10 we are actually not laughing, because that's yet another regression in 1.5 which broke our models... Could we expect patch release anytime soon or do we need to downgrade to 1.4?

@dbeatty10
Copy link
Contributor

@jaklan we do not have a fix for this regression yet, and I would recommend downgrading to 1.4 in the meantime.

I'm sorry

We take it seriously whenever users are affected by a bug in a release. I am very sorry if it seemed that I was laughing that this regression broke your models, especially if you have experienced multiple issues with dbt-redshift 1.5.

I want to clarify that my intention was purely to show appreciation for the clever way that @craigchurch wrote up the following in the original bug report:

Also a single percent sign in the .md file will trigger the error. For example:
{% docs some_doc %}
80% of statistics are made up on the spot
{% enddocs %}

I didn't fully consider how my words could be interpreted, and I want to extend my heartfelt apology.

Context

We made a significant change for dbt-redshift between 1.4 and 1.5 by switching from psycopg2 to redshift_connector as the Python driver / connector.

Prior to the April 27, 2023 1.5 release, we did five public releases and release candidates between Feb 22, 2023, and we fixed all known regressions prior to then.

This issue was present since 1.5.0b1, but unfortunately it wasn't included in our automated test cases, and also wasn't discovered and reported by any end users until just a few days ago.

@jaklan
Copy link

jaklan commented May 15, 2023

@dbeatty10 no worries, we are just a bit tired of addressing next and next problems after the upgrade (i.a. this one, dbt-labs/dbt-core#7465, #427), so that's why I just wanted to bring the discussion back to the issue 😉

@jaklan
Copy link

jaklan commented May 15, 2023

@dbeatty10 maybe you would consider reverting the switch and returning to psycopg2 until redshift_connector-related issues are resolved and covered with tests, so it could be released again with 1.6? We can downgrade to 1.4, but then we couldn't start utilising great new features provided by 1.5.

@dbeatty10
Copy link
Contributor

@jaklan That is totally understandable that you are tired especially experiencing several issues during this upgrade. Thank you for sticking with us, raising these issues, and sharing how they are affecting you.

We've got line of sight to resolve all known redshift_connector-related regressions by dbt-redshift 1.5.2, so we're planning to stay on that path rather than reverting the switch (which would bring its own risks and affect other users).

@jaklan
Copy link

jaklan commented May 15, 2023

@dbeatty10 got it, do you have any planned timeline for releasing 1.5.2? Going back to 1.4 would also imply some work on our side, so that could help us decide whether we should downgrade or rather find temporary workarounds. I mean - you already mentioned you don't have a fix yet, but maybe you have at least any deadline defined?

@dbeatty10
Copy link
Contributor

@Fleid do you have an expected timeline for dbt-redshift 1.5.2?

@dbeatty10 dbeatty10 removed the triage label May 15, 2023
@dbeatty10
Copy link
Contributor

dbeatty10 commented May 24, 2023

I'm assuming that this issue has the same underlying cause as #432, and we could consider closing one as a duplicate of the other.

Also guessing that it has to do with paramstyle being 'pyformat' in psycopg2 but defaulting to 'format' in redshift_connector.

We should try setting redshift_connector.paramstyle = 'pyformat' within connections.py and see if that resolves the reported issues.

@jiezhen-chen
Copy link
Contributor

@dbeatty10 Thanks for the suggested solution. Working on implementing the suggestion and will open up a PR for this!

@dbeatty10
Copy link
Contributor

@jiezhen-chen I gave redshift_connector.paramstyle = 'pyformat' a shot, but it didn't work for me.

I suspect there is a bug that affects redshift_connector (and probably pg8000 too) where a comment like the following doesn't work:

  comment on table customers is '95% of customer records';

In the meantime, adding the following to here could be a workaround :

{#
  By using dollar-quoting like this, users can embed anything they want into their comments
  (including nested dollar-quoting), as long as they do not use this exact dollar-quoting
  label. It would be nice to just pick a new one but eventually you do have to give up.
#}
{% macro postgres_escape_comment(comment) -%}
  {% if comment is not string %}
    {% do exceptions.raise_compiler_error('cannot escape a non-string: ' ~ comment) %}
  {% endif %}
  {%- set magic = '$dbt_comment_literal_block$' -%}
  {%- if magic in comment -%}
    {%- do exceptions.raise_compiler_error('The string ' ~ magic ~ ' is not allowed in comments.') -%}
  {%- endif -%}
  {#- -- escape % until the underlying issue is fixed -#}
  {%- set comment = comment|replace("%", "%%") -%}
  {{ magic }}{{ comment }}{{ magic }}
{%- endmacro %}

@jiezhen-chen
Copy link
Contributor

jiezhen-chen commented May 24, 2023

@dbeatty10 Seems like both pg8000 and redshift_connector has this convert_paramstyle method that is raising this error. Another potential solution would be to replace % to %% in connections.py itself

@dbeatty10
Copy link
Contributor

@jiezhen-chen I saw that! Read a lot more pg8000 code in the last 24 hours than I ever thought I would trying to learn the history of various code like that.

The InterfaceError is not affecting a bunch of other SQL queries I tried that contain a % -- I could only get it to raise that error when creating a comment.

It might be that this error only pops up with multi-line comments -- I didn't try both single line and multi-line comments -- only tried multi-line comments using dollar-quoted string constants like:

comment on table customers is $dbt_comment_literal_block$Dianne's 100% horse
$dbt_comment_literal_block$;

@dbeatty10
Copy link
Contributor

ooooooh, looking at the comment above, we can see why it's in the section of code that is raising that error. It's 'cause it doesn't have any single quotes (or any of the rest of these):

    INSIDE_SQ: int = 1  # inside single-quote string '...'
    INSIDE_QI: int = 2  # inside quoted identifier   "..."
    INSIDE_ES: int = 3  # inside escaped single-quote string, E'...'
    INSIDE_PN: int = 4  # inside parameter name eg. :name
    INSIDE_CO: int = 5  # inside inline comment eg. --

So I'm guessing redshift_connector (and pg8000) would need to add logic to handle the additional case of dollar-quoted string constants.

@dbeatty10 dbeatty10 removed the triage label May 24, 2023
@dbeatty10
Copy link
Contributor

@jiezhen-chen here's some code that roughly matches the coding style of convert_paramstyle and might be able to be merged into the existing implementation within redshift_connector to add handling for dollar-quoted string constants:

Code

dollar_quoting.py

import typing


def convert_paramstyle(style: str, query) -> typing.Tuple[str, typing.Any]:

    OUTSIDE: int = 0  # outside quoted string
    INSIDE_SQ: int = 1  # inside single-quote string '...'
    INSIDE_QI: int = 2  # inside quoted identifier   "..."
    INSIDE_ES: int = 3  # inside escaped single-quote string, E'...'
    INSIDE_PN: int = 4  # inside parameter name eg. :name
    INSIDE_CO: int = 5  # inside inline comment eg. --
    INSIDE_DQ: int = 6  # inside dollar-quoted tag eg. $tag$ inside $tag$
    OPENING_DQ: int = 7  # opening a dollar-quoted tag eg. $tag$
    CLOSING_DQ: int = 8  # closing a dollar-quoted tag eg. $tag$ inside $tag$

    string_constant: str = ""  # TODO - remove
    starting_tag: str = ""
    closing_tag: str = ""
    output_query: typing.List[str] = []
    state: int = OUTSIDE

    for i, c in enumerate(query):
        if state == OUTSIDE:
            if c == "$":
                # starting tag for a dollar-quoted string constant
                output_query.append(c)
                state = OPENING_DQ
                starting_tag = ""
            else:
                output_query.append(c)
        elif state == OPENING_DQ:
            if c == "$":
                # end of a tag for a dollar-quoted string constant
                # and start of the actual string constant
                output_query.append(c)
                state = INSIDE_DQ
            else:
                output_query.append(c)
                starting_tag += c
        elif state == INSIDE_DQ:
            string_constant += c  # TODO - remove
            if c == "$":
                # potential closing tag for a dollar-quoted string constant
                output_query.append(c)
                state = CLOSING_DQ
            else:
                output_query.append(c)

        elif state == CLOSING_DQ:
            string_constant += c  # TODO - remove
            if c == "$" and closing_tag == starting_tag:
                # end of a closing tag for a dollar-quoted string constant

                # TODO - remove the next 3 lines - only for demo purposes
                string_constant = string_constant[:-(len(closing_tag) + 2)]
                print(f"string_constant: {string_constant}")
                string_constant = ""

                output_query.append(c)
                closing_tag = ""
                state = OUTSIDE
            else:
                output_query.append(c)
                closing_tag += c

                if not starting_tag.startswith(closing_tag):
                    # failed potential closing tag
                    state = INSIDE_DQ
                    closing_tag = ""

        else:
            output_query.append(c)

    return "".join(output_query)


query = "$$Dan's cats$$ $<tag>$Dianne's $$horse$$ $<tag$ $<tagg $<tag> $ $<tag>$ $animal$ Dani's dog $animal$"
result  = convert_paramstyle('faux style', query)
print(result)
python dollar_quoting.py
$ python dollar_quoting.py

string_constant: Dan's cats
string_constant: Dianne's $$horse$$ $<tag$ $<tagg $<tag> $ 
string_constant:  Dani's dog 

$$Dan's cats$$ $<tag>$Dianne's $$horse$$ $<tag$ $<tagg $<tag> $ $<tag>$ $animal$ Dani's dog $animal$

@dbeatty10
Copy link
Contributor

dbeatty10 commented May 25, 2023

Re-opening since I believe #441 only mitigates certain use-cases but doesn't resolve this issue. The ultimate solution is probably to fully support dollar-quoted string literals (which will need to be done within redshift_connector unless we monkey patch it 🙉 🙈 🙊).

@dbeatty10 dbeatty10 reopened this May 25, 2023
@jiezhen-chen
Copy link
Contributor

jiezhen-chen commented May 25, 2023

@dbeatty10 Seems like the only time this is an issue is when '%' is used instead of '%%' in a COMMENT. Is that correct? I haven't been able to reproduce the same error running other types of queries that contain '%'.

To consolidate our findings, and correct me if I'm wrong - COMMENT statements that contain one single '%' are triggering the error in redshift_connector code here, because of these reasons:

  1. redshift_connector does not handle dollar-quoted string literals in convert_paramstyle.
  2. dbt wraps COMMENTs in dollar-quotes here

These options can be viable paths forward:

  1. Add the macro postgres_escape_comment to replace % with %% in redshift adapters.sql (solution mentioned in your comment here.

  2. Detect COMMENT queries in add_query in connections.py, and wrangle % into %%.

  3. Have redshift_connector support dollar-quoted string literals.

@sathiish-kumar and I are leaning towards option2 - which is to handle this within python rather than relying on jinja. Option3 would introduce unnecessary engineering cost for the redshift driver team. Considering that I dug around and haven't found any complaints of this issue for redshift_connector, I think this may be a dbt-redshift specific issue that can be handled within dbt.

@jiezhen-chen
Copy link
Contributor

@dbeatty10 Thanks for merging 466 so quickly!

While the change in 466 takes care of the issue in comments, you're right about it failing with cases like the one below:

select length($MyCoolString$
80% of statistics are made up on the spot, and you miss 100% of the shots you don't take.
$MyCoolString$); 

We could try detecting dollar-quoted string literals in add_query in connections.py, and replace instances of % to %%.

@dataders
Copy link
Contributor

@craigchurch, I haven't been able to reproduce the scenario where there's a percent sign in a SQL comment of a model .sql file

Also a single percent sign in the sql file will trigger the error. Such as a percent sign in a comment in the SQL like this:

select *
from a_table --this is a table where 50% of the records have a null start_date

@jiezhen-chen
Copy link
Contributor

jiezhen-chen commented May 26, 2023

@dataders Seems like inline comments are handled by convert_paramstyle in redshift_connector. % triggers an error when a comment is wrapped in dollar-quotes or in /* */ because convert_paramstyle does not handle these cases. I haven't been able to reproduce this error with inline comments either.

@dataders
Copy link
Contributor

@jiezhen-chen would you be interested in doing a proof of concept of amalgamating @dbeatty10's code snippet within #441 (comment) into the existing convert_paramstyle def as a PR on the redshift connector?

This seems more and more to be a bug in the connector library as opposed to a bug in dbt-redshift. For example, aws/amazon-redshift-python-driver#156 looks to be the same case. @Brooke-white does this track for you as well? I'd appreciate your help here

@jiezhen-chen
Copy link
Contributor

jiezhen-chen commented May 26, 2023

@dbeatty10's change in 466 mitigated this issue for the COMMENT keyword, but this issue pertains when parsing a comment within a query, if the comment is wrapped in /* */.

I agree that the ultimate solution is to have redshift_connector support dollar-quoted string literals and comments in /* */. I will initiate a discussion with the driver folks and figure out next steps. Meanwhile @sathiish-kumar and I can work on a solution within the dbt-redshift code to solve this issue, by replacing % with %% in query comments before the query is executed.

@dbeatty10
Copy link
Contributor

@jiezhen-chen I agree with you that the ultimate solution is having redshift_connector support dollar-quoted string literals and multi-line comments 👍

Unless #466 is reverted first, I'd be leery of doing further replacements of % with %% within dbt-redshift . Otherwise, too many %% could be the result.

My suggestion would be to focus our effort towards implementing multiline + $$ support directly within redshift_connector or adding the relevant monkey patch within dbt-redshift. The former seems much preferable to the latter.

@stlee-aurora
Copy link

I just ran into this issue with a model in which the author included an apostrophe in a multi-line comment:

/*
  Let's write a query!
*/

SELECT * FROM t WHERE c LIKE 'hello%'

The convert_paramstyle throws up on the same exception, because when it parses the apostrophe in the comment block, it thinks it's moved into a single-quoted string, which it only exits when it reaches the first apostrophe of 'hello%', leaving the % in the OUTSIDE state and throwing the exception.

Hilariously, if only the author had included a second apostrophe in their comment, things would run fine!

@shanehodgkins
Copy link

It doesn't need to be part of a comment. This also happens if you use % for modulo operations. Something a simple as select 3 % 2 will trigger the Only %s and %% are supported in the query. error.

@Fleid Fleid removed the triage label May 31, 2023
@Brooke-white
Copy link

Brooke-white commented Jun 1, 2023

Hi folks, I maintain redshift-connector. This is a long-standing but just recently discovered bug in our convert_paramstyle method like a few of you identified above :) . We are tracking it in aws/amazon-redshift-python-driver#156 and have a fix ready for our next release scheduled for next week (week of June 5th). Thank you all for your patience!

@dataders
Copy link
Contributor

dataders commented Jun 6, 2023

@craigchurch @jaklan @stlee-aurora @shanehodgkins thanks so much for both your
patience and support here. can you all please confirm if this issue is resolved by installed the latest patch of both dbt-redshift and redshift_connector?

pip install dbt-redshift~=1.5.4 redshift_connector~=2.0.911

I'm going to close this issue, but will happily re-open if you report more issues. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression
Projects
None yet
Development

No branches or pull requests

9 participants