Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for quoted string backslash escaping #1177

Merged
merged 1 commit into from
Apr 21, 2024

Conversation

iffyio
Copy link
Contributor

@iffyio iffyio commented Mar 14, 2024

This adds support for parsing string literals on
dialects that treat backslash character as an escape
character. As an example, the following previously failed
to parse by dialects like BigQuery where the syntax is valid.

SELECT 'a\'b';

Moves the SQL like and similar_to tests from individual
dialects to common since the tests were identical.

@coveralls
Copy link

coveralls commented Mar 14, 2024

Pull Request Test Coverage Report for Build 8678508768

Details

  • 157 of 176 (89.2%) changed or added relevant lines in 9 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.03%) to 88.061%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/dialect/bigquery.rs 1 2 50.0%
src/dialect/clickhouse.rs 1 2 50.0%
src/dialect/mod.rs 3 4 75.0%
src/dialect/mysql.rs 1 2 50.0%
src/dialect/snowflake.rs 1 2 50.0%
tests/sqlparser_snowflake.rs 9 11 81.82%
src/tokenizer.rs 51 55 92.73%
tests/sqlparser_common.rs 88 96 91.67%
Files with Coverage Reduction New Missed Lines %
src/dialect/mod.rs 1 81.89%
Totals Coverage Status
Change from base Build 8660968190: -0.03%
Covered Lines: 20948
Relevant Lines: 23788

💛 - Coveralls

@iffyio iffyio force-pushed the escaped-string-literals branch from 1b9ff2a to 7273ded Compare March 23, 2024 08:12
@iffyio iffyio changed the title Add support for quoted string escaping Add support for quoted string backslash escaping Mar 23, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @iffyio -- I am sorry it took me so long to review it properly. I am a little concerned about the difference between how this PR works and how it works for MySqlDialect

Is there any way we can unify the behavior?

/// ```sql
/// SELECT '\';
/// ```
fn supports_string_literal_backslash_escape(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ and 😍 for the doc comments

src/tokenizer.rs Outdated
@@ -1235,6 +1245,11 @@ impl<'a> Tokenizer<'a> {
'\\' => {
// consume
chars.next();

if allow_escape {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior seems different than how MySqlDialect behaves

Specifically, with MySQL the escape characters are transformed into their literal values (e.g. 'a"b'would be parsed toa"bwhile this PR would parse it toa"b`

What do you think about making this consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I'll take a closer look at this to keep the same behavior for Mysql

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Looking into updating this and turns out I didn't follow the comment entirely and was unable to infer the inconsistency for mysql - could you clarify the problem once more? It seems Github/markdown unfortunately reformatted the expected and desired output in your example so that they became identical 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in I think the tokenizer for mysql (assuming self.unescape is true) actually does the unescaping -- so a string with an escape character ("\x20") would actually be tokenized as a space (" ")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! I've updated this have the new logic respect the unescape similar to mysql - let me know if that's what you had in mind!

This adds support for parsing string literals on
dialects that treat backslash character as an escape
character. As an example, the following previously failed
to parse by dialects like BigQuery where the syntax is valid.
```sql
SELECT 'a\'b';
```

Moves the SQL `like` and `similar_to` tests from individual
dialects to common since the tests were identical.
@iffyio iffyio force-pushed the escaped-string-literals branch from a9fdd33 to 0458e4b Compare April 14, 2024 05:29
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @iffyio 🙏

@alamb alamb merged commit d2c2b15 into apache:main Apr 21, 2024
10 checks passed
JichaoS pushed a commit to luabase/sqlparser-rs that referenced this pull request May 7, 2024
@iffyio iffyio deleted the escaped-string-literals branch July 16, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants