Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing infix REGEXP #863

Open
max-sixty opened this issue May 2, 2023 · 8 comments
Open

Representing infix REGEXP #863

max-sixty opened this issue May 2, 2023 · 8 comments

Comments

@max-sixty
Copy link
Contributor

MySQL & Sqlite have unusual regex operators:

 SELECT 'a' REGEXP '^[a-d]';

i.e. it's an infix function.

We are big users (and big fans!) of sqlparser-rs in PRQL We have a PR PRQL/prql#2458 for implementing regex in the other dialects.

Could I ask — is there some way of representing an infix function in sqlparser-rs AST?

@alamb
Copy link
Contributor

alamb commented May 2, 2023

@max-sixty
Copy link
Contributor Author

Thanks for the swift response @alamb

👋 How about https://docs.rs/sqlparser/0.33.0/sqlparser/ast/enum.Expr.html#variant.BinaryOp ?

The only one there which is custom is PGCustomBinaryOperator. The official docs of that don't allow for words (i.e. REGEXP):

The operator name is a sequence of up to NAMEDATALEN-1 (63 by default) characters from the following list:

      • / < > = ~ ! @ # % ^ & | ` ?

...though maybe sqlparser-rs don't enforce that and we can squeeze it in?

@alamb
Copy link
Contributor

alamb commented May 2, 2023

...though maybe sqlparser-rs don't enforce that and we can squeeze it in?

Since sqlparser-rs parses many different dialects I don't think there is any reason to limit sqlparser to posgres's limitations

I think https://github.com/sqlparser-rs/sqlparser-rs#syntax-vs-semantics may be relevant here

@max-sixty
Copy link
Contributor Author

I think https://github.com/sqlparser-rs/sqlparser-rs#syntax-vs-semantics may be relevant here

OK — I had thought that allowed characters might be syntax rather than semantics, but very happy to take it!

Thanks @alamb , I'll close

@max-sixty
Copy link
Contributor Author

Sorry, actually I spoke too soon — that seems to write OPERATOR when writing it out.

This would be the diff I need — happy to put a PR in — but:

  • It doesn't parse anything — I'm not sure how to parse SELECT 'a' REGEXP '^[a-d]'; with REGEXP as the operator... (but fine for my narrow purpose)
  • Where would I add tests?
diff --git a/src/ast/operator.rs b/src/ast/operator.rs
index 75877c9..2981799 100644
--- a/src/ast/operator.rs
+++ b/src/ast/operator.rs
@@ -85,6 +85,7 @@ pub enum BinaryOperator {
     BitwiseOr,
     BitwiseAnd,
     BitwiseXor,
+    Custom(String),
     PGBitwiseXor,
     PGBitwiseShiftLeft,
     PGBitwiseShiftRight,
@@ -122,6 +123,7 @@ fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
             BinaryOperator::BitwiseOr => f.write_str("|"),
             BinaryOperator::BitwiseAnd => f.write_str("&"),
             BinaryOperator::BitwiseXor => f.write_str("^"),
+            BinaryOperator::Custom(s) => f.write_str(s),
             BinaryOperator::PGBitwiseXor => f.write_str("#"),
             BinaryOperator::PGBitwiseShiftLeft => f.write_str("<<"),
             BinaryOperator::PGBitwiseShiftRight => f.write_str(">>"),

Thanks!

@max-sixty max-sixty reopened this May 7, 2023
max-sixty added a commit to max-sixty/sqlparser-rs that referenced this issue May 7, 2023
From apache#863

- It doesn't parse anything — I'm not sure how to parse ` SELECT 'a' REGEXP '^[a-d]';` with `REGEXP` as the operator... (but fine for my narrow purpose)
- If we need tests, where would I add them?
@ankrgyl
Copy link
Contributor

ankrgyl commented May 9, 2023

@max-sixty is my understanding correct that in #868 you're adding a "catch all" custom operator and intend to capture the (built-in) regex syntax in MySQL that way?

If so, why not add a regex operator and then for the MySQL/sqlite dialects parse it as a binary op?

@eitsupi
Copy link
Contributor

eitsupi commented May 9, 2023

Given something like MySQL's DIV operator, I am wondering if merging #868 makes sense.
https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html

SELECT 5 DIV 2, -5 DIV 2, 5 DIV -2, -5 DIV -2;

@max-sixty
Copy link
Contributor Author

If so, why not add a regex operator and then for the MySQL/sqlite dialects parse it as a binary op?

Initial effort at #874 !

alamb added a commit that referenced this issue May 17, 2023
* feat: Add custom operator

From #863

- It doesn't parse anything — I'm not sure how to parse ` SELECT 'a' REGEXP '^[a-d]';` with `REGEXP` as the operator... (but fine for my narrow purpose)
- If we need tests, where would I add them?

* Update src/ast/operator.rs

---------

Co-authored-by: Andrew Lamb <[email protected]>
serprex pushed a commit to serprex/sqlparser-rs that referenced this issue Nov 6, 2023
* Support identifiers beginning with digits in MySQL (apache#856)

* support COPY INTO in snowflake (apache#841)

Signed-off-by: Pawel Leszczynski <[email protected]>

* Add `dialect_from_str` and improve `Dialect` documentation (apache#848)

* Add `dialect_from_str` and improve `Dialect` documentation

* cleanup

* fix compilation with nostd

* Support multiple-table DELETE syntax (apache#855)

* Support `DISTINCT ON (...)` (apache#852)

* Support "DISTINCT ON (...)"

* a test

* fix the merge

* Test trailing commas (apache#859)

* test: add tests for trailing commas

* tweaks

* Add support for query source in COPY .. TO statement (apache#858)

* Add support for query source in COPY .. TO statement

* Fix compile error

* Fix logical merge conflict (apache#865)

* Fix tiny typo in custom_sql_parser.md (apache#864)

* Make Expr::Interval its own struct (apache#872)

* Make Expr::Interval its own struct

* Add test interval display

* Fix cargo fmt

* Include license file in published crate (apache#871)

* Add support for multiple expressions, order by in aggregations (apache#879)

* Add support for multiple expressions, order by in aggregations

* Fix formatting errors

* Resolve linter errors

* Add parse_multipart_identifier function to parser (apache#860)

* Add parse_multipart_identifier function to parser

* Update doc for parse_multipart_identifier

* Fix conflict

* feat: Add custom operator (apache#868)

* feat: Add custom operator

From apache#863

- It doesn't parse anything — I'm not sure how to parse ` SELECT 'a' REGEXP '^[a-d]';` with `REGEXP` as the operator... (but fine for my narrow purpose)
- If we need tests, where would I add them?

* Update src/ast/operator.rs

---------

Co-authored-by: Andrew Lamb <[email protected]>

* feat: Support MySQL's `DIV` operator (apache#876)

* feat: MySQL's DIV operator

* fix: do not use `_` prefix for used variable

---------

Co-authored-by: Andrew Lamb <[email protected]>

* truncate: table as optional keyword (apache#883)

Signed-off-by: Maciej Obuchowski <[email protected]>

* feat: add DuckDB dialect (apache#878)

* feat: add DuckDB dialect

* formatting

* fix conflict

* support // in GenericDialect

* add DucDbDialect to all_dialects

* add comment from suggestion

Co-authored-by: Andrew Lamb <[email protected]>

* fix: support // in GenericDialect

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Add support for first, last aggregate function parsing (apache#882)

* Add order by parsing to functions

* Fix doc error

* minor changes

* Named window frames (apache#881)

* after over clause, named window can be parsed with window ... as after having clause

* Lint errors are fixed

* Support for multiple windows

* fix lint errors

* simplifications

* rename function

* Rewrite named window search in functional style

* Test added and some minor changes

* Minor changes on tests and namings, and semantic check is removed

---------

Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>

* Fix merge conflict (apache#885)

* Update CHANGELOG for `0.34.0` release (apache#884)

* chore: Release sqlparser version 0.34.0

* Update criterion requirement from 0.4 to 0.5 in /sqlparser_bench (apache#890)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: AviRaboah <[email protected]>
Co-authored-by: pawel.leszczynski <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Aljaž Mur Eržen <[email protected]>
Co-authored-by: Armin Primadi <[email protected]>
Co-authored-by: Okue <[email protected]>
Co-authored-by: Andrew Kane <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Jeffrey <[email protected]>
Co-authored-by: Maximilian Roos <[email protected]>
Co-authored-by: eitsupi <[email protected]>
Co-authored-by: Maciej Obuchowski <[email protected]>
Co-authored-by: Berkay Şahin <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants