-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1 ddl extended keyword #1455
V1 ddl extended keyword #1455
Conversation
Conformance comparison report-Cross Engine
Number failing in both: 228 Number passing in legacy engine but fail in eval engine: 313 Number failing in legacy engine but pass in eval engine: 207 Conformance comparison report-Cross Commit-LEGACY
Number failing in both: 435 Number passing in Base (28edbb7) but now fail: 0 Number failing in Base (28edbb7) but now pass: 1 Click here to see
Conformance comparison report-Cross Commit-EVAL
Number failing in both: 541 Number passing in Base (28edbb7) but now fail: 1 Number failing in Base (28edbb7) but now pass: 1 Click here to see
Click here to see
|
@@ -829,6 +837,9 @@ table_definition::{ | |||
name: '.identifier.symbol', | |||
type: '.type', | |||
constraints: list::[constraint], | |||
// Should this be a constraint? | |||
is_optional: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can prepend this with a TODO
to make discoverable later. Also, on syntax, do this matter to be a constraint or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters on the syntax.
The only thing is that Attribute Constraint in SQL is often after the data type declaration, but this is more of a terminology thing.
| TBLPROPERTIES PAREN_LEFT keyValuePair (COMMA keyValuePair)* PAREN_RIGHT # TblProperties | ||
; | ||
|
||
keyValuePair : key=LITERAL_STRING EQ value=literal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the full literal
is required. Maybe it can get limited to LITERAL_STRING
for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should permit Literal, i.e., key to be a string or timestamp does not make much sense to me.
I think (and forget to leave a comment here), the question is more towards identifier
vs string
?
i.e.,
TBLPROPERTIES(myKey = 'some value')
vs
TBLPROPERTIES('myKey' = 'someValue' )
Reason why this PR leans towards using string is
- this syntax are based from HIVE and HIVE uses string.
- Supporting string as key value seems reasonable to me and I don't think we need to deprecate this later.
- Supporting identifier as string seems to be a ergonomic feature and most likely we will process the identifier as string in subsequent process (as opposite to attempting to bind the identifier to a variable...). If you agree this is the case then we can always add the support later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree with the key being string
, but what I meant was do we need value=literal
:
literal
: NULL # LiteralNull
| MISSING # LiteralMissing
| TRUE # LiteralTrue
| FALSE # LiteralFalse
| LITERAL_STRING # LiteralString
| LITERAL_INTEGER # LiteralInteger
| LITERAL_DECIMAL # LiteralDecimal
| ION_CLOSURE # LiteralIon
| DATE LITERAL_STRING # LiteralDate
| TIME ( PAREN_LEFT LITERAL_INTEGER PAREN_RIGHT )? (WITH TIME ZONE)? LITERAL_STRING # LiteralTime
| TIMESTAMP ( PAREN_LEFT LITERAL_INTEGER PAREN_RIGHT )? (WITH TIME ZONE)? LITERAL_STRING # LiteralTimestamp
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see.
I am not against limiting the scope to string value only, but I don't see any drawback to support all other values. It could be useful for customer who wants to specify like TBLPROPERTIES('isActive' = false)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps would be good for us to avoid widening the scope for now :)
SuccessTestCase( | ||
"CREATE TABLE with Partition by single attribute", | ||
""" | ||
CREATE TABLE tbl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be syntax error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be semantic error, i.e., referring a non-existing attribute in Partition By expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is the following in a valid syntax considering it lacks the parenthesis and column definitions?
CREATE TABLE tbl PARTITION BY (a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so because
CREATE TABLE tbl
has been a valid syntax for us.
Based on this I am treating the grammar as three parts:
CREATE TABLE tbl -- header
(a INT2) -- data type definition
PARTITION BY (a) -- table extension
@@ -136,6 +137,19 @@ uniqueConstraintDef | |||
// but we at least can eliminate SFW query here. | |||
searchCondition : exprOr; | |||
|
|||
// SQL Extension, Support additional table metadatas such as partition by, tblProperties, etc. | |||
tableExtension | |||
: PARTITION BY partitionExpr # PartitionBy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning here on using PARTITION BY
compared to PARTITIONED BY
which was what HIVE DDL uses (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable)?
Was it to follow what postgresql does (https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-PARMS-PARTITION-BY)? I'm not sure if there's a standard syntax defined for DDL for this table extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also noticed that PARTITION
was already an existing keyword that's used for window functions:
windowPartitionList
: PARTITION BY expr (COMMA expr)*
;
Not sure if that factored into the decision here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A partitioned table is not in the SQL spec, hence the syntax is an extension made by various DB implementation.
The reason we choice to not following HIVE but PostgreSQL is to avoided the cognitive overhead.
PARTITION BY (a)
vs PARTITIONED BY (a INT2)
.
That is: The PARTITION BY
clause can take in a Partition Expression, and the PARTITIONED BY
clause (with is not supported yet) can take in attribute declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the added context. Could be helpful that context in a comment for why we choose PostgreSQL's syntax here rather than HIVE's
partiql-parser/src/main/kotlin/org/partiql/parser/internal/PartiQLParserDefault.kt
Outdated
Show resolved
Hide resolved
partiql-parser/src/test/kotlin/org/partiql/parser/internal/PartiQLParserDDLTests.kt
Show resolved
Hide resolved
Co-authored-by: Arash Maymandi <[email protected]>
Co-authored-by: Alan Cai <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## v1 #1455 +/- ##
=====================================
Coverage ? 72.11%
Complexity ? 2499
=====================================
Files ? 264
Lines ? 19802
Branches ? 3681
=====================================
Hits ? 14281
Misses ? 4510
Partials ? 1011
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
* Revert "V1 ddl extended keyword (#1455)" This reverts commit 2879f3a * Revert "struct subfield and list element type (#1449)" This reverts commit 23f6fee * Revert "run apiDump (#1447)" This reverts commit 607c4c0 * Revert "Support parsing for attribute and tuple level constraint (#1442)" This reverts commit abfc58d * fix post-revert build * submodule
Relevant Issues
Description
Other Information
Updated Unreleased Section in CHANGELOG: [YES/NO]
Any backward-incompatible changes? [YES/NO]
errors for users that are using our public APIs or the entities that have
public
visibility in our code-base. >Any new external dependencies? [YES/NO]
Do your changes comply with the Contributing Guidelines
and Code Style Guidelines? [YES/NO]
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.