Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use separate lookahead of WITH for BUCKET_COUNT #75878

Conversation

chengxiong-ruan
Copy link
Contributor

Fixes #75604

In #75231 we use the WITH_LA to resolve conflicts on the
WITH BUCKET_COUNT clause. I suspect it somehow made the
random sql generator generate sqls like this:

IMPORT INTO FAMILY . FAMILY . CHAR ( ident , ident ) PUBLICATION DATA ( PLACEHOLDER ) WITH OPTIONS ( BUCKET_COUNT = PLACEHOLDER );

which is formatted into:

IMPORT INTO "family"."family"."char"(ident, ident) PUBLICATION DATA ('placeholder') WITH bucket_count = 'placeholder'

this fails the parser for sure, since it's not a legit statement in crdb.
This PR creates a new lookahead token to prevent the sql generator
from generating IMPORT statement with BUCKET_COUNT in it.

Release note: None

Fixes cockroachdb#75604

In cockroachdb#75231 we use the `WITH_LA` to resolve conflicts on the
`WITH BUCKET_COUNT` clause. However, it somehow makes the
random sql generator generate sqls like this:
```
IMPORT INTO FAMILY . FAMILY . CHAR ( ident , ident ) PUBLICATION DATA ( PLACEHOLDER ) WITH OPTIONS ( BUCKET_COUNT = PLACEHOLDER );
```
which is formatted into:
```
IMPORT INTO "family"."family"."char"(ident, ident) PUBLICATION DATA ('placeholder') WITH bucket_count = 'placeholder'
```
this fails the parser for sure, since it's not a legit statement in crdb.
This PR creates a new lookahead token to prevent the sql generator
from generating `IMPORT` statement with `BUCKET_COUNT` in it.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear that I ran a roachprod stress to repro but I was just not lucky enough to make it fail on the same case. So I wasn't able to prove my theory with a it failed before but fixed after verification :(

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rafiss)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this! can you add a regression test case in pkg/sql/parser/testdata?

@rafiss
Copy link
Collaborator

rafiss commented Feb 2, 2022

i.e. apply this patch

diff --git a/pkg/sql/parser/testdata/import_export b/pkg/sql/parser/testdata/import_export
index e7f367c4c2..1223d1c033 100644
--- a/pkg/sql/parser/testdata/import_export
+++ b/pkg/sql/parser/testdata/import_export
@@ -85,3 +85,7 @@ EXPORT INTO CSV 's3://my/path/%part%.csv' WITH delimiter = '|' FROM SELECT a, su
 EXPORT INTO CSV ('s3://my/path/%part%.csv') WITH delimiter = ('|') FROM SELECT (a), ((sum)((b))) FROM c WHERE ((d) = (1)) ORDER BY ((sum)((b))) DESC LIMIT (10) -- fully parenthesized
 EXPORT INTO CSV '_' WITH delimiter = '_' FROM SELECT a, sum(b) FROM c WHERE d = _ ORDER BY sum(b) DESC LIMIT _ -- literals removed
 EXPORT INTO CSV 's3://my/path/%part%.csv' WITH _ = '|' FROM SELECT _, sum(_) FROM _ WHERE _ = 1 ORDER BY sum(_) DESC LIMIT 10 -- identifiers removed
+
+parse
+IMPORT INTO FAMILY . FAMILY . CHAR ( ident , ident ) PUBLICATION DATA ( PLACEHOLDER ) WITH OPTIONS ( BUCKET_COUNT = PLACEHOLDER );
+----

then run make test PKG=./pkg/sql/parser TESTS='TestParseDatadriven' TESTFLAGS='-rewrite'

@chengxiong-ruan
Copy link
Contributor Author

@rafiss thanks for the hint, but the issue here is the random sql generator generated unexpected sql statement. so the statement would fail anyway. I played with the test you mentioned above and it just failed with parse_test.go:47: at or near "with": syntax error.

@rafiss
Copy link
Collaborator

rafiss commented Feb 2, 2022

Ah I see, then I'm not sure this fix will change that -- the opt_with_options rule in sql.y could still lead to an option with a name of an unreserved_keyword to be used. Maybe we need to update the random syntax generator itself.

@chengxiong-ruan
Copy link
Contributor Author

oh yeah, agreed... I just suspect my previous change cause it so hope this change would solve it.
Now looks like it's probably just the timing that we were lucky enough and never hit into the failure before, but only after my change merged. Closing this pr for now. Do you know who might be the best person to ask for advice how the random generator can be fixed?

@rafiss
Copy link
Collaborator

rafiss commented Feb 2, 2022

Yeah looks like unlucky timing with your PR! Since it wasn't caused by that, feel free to reassign the issue back to me.

But if you would still like to keep working on it, that would be very appreciated :)

Do you know who might be the best person to ask for advice how the random generator can be fixed?

No one's worked on it for a while unfortunately. Asking in #sql-experience might be the best way.

Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably don't have enough bandwidth at the moment :( Assigning back to you for now. Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

@chengxiong-ruan chengxiong-ruan deleted the separate-lookahead-for-with-bucket-count branch February 4, 2022 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/tests: TestRandomSyntaxGeneration failed [bucket count syntax]
3 participants