Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1 ddl extended keyword #1455

Merged
merged 4 commits into from
May 6, 2024
Merged

V1 ddl extended keyword #1455

merged 4 commits into from
May 6, 2024

Conversation

yliuuuu
Copy link
Contributor

@yliuuuu yliuuuu commented May 2, 2024

Relevant Issues

  • [Closes/Related To] Issue #XXX

Description

  • Extended DDL Features:
    • OPTIONAL Keyword: Declared after the attribute name to mark the attribute as optional.
    • COMMENT Keyword: Declared at the end of attribute declaration to add an description to the attribute.
    • TBLPROPERTIES Keyword: Table Level properties (arbitrary key value pair).
    • PARTITION BY keyword: Partition By Clause, for now we only support list of attribute for Partition Expression.

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES/NO]

    • < If NO, why? >
  • Any backward-incompatible changes? [YES/NO]

    • < If YES, why? >
    • < For this purpose, we define backward-incompatible changes as changes that—when consumed—can potentially result in
      errors for users that are using our public APIs or the entities that have public visibility in our code-base. >
  • Any new external dependencies? [YES/NO]

    • < If YES, which ones and why? >
    • < In addition, please also mention any other alternatives you've considered and the reason they've been discarded >
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES/NO]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@yliuuuu yliuuuu requested review from alancai98 and am357 May 2, 2024 20:40
@yliuuuu yliuuuu marked this pull request as ready for review May 2, 2024 20:41
Copy link

github-actions bot commented May 2, 2024

Conformance comparison report-Cross Engine

Base (legacy) eval +/-
% Passing 92.52% 90.70% -1.82%
✅ Passing 5383 5278 -105
❌ Failing 435 541 106
🔶 Ignored 0 0 0
Total Tests 5818 5819 1
Number passing in both: 5071

Number failing in both: 228

Number passing in legacy engine but fail in eval engine: 313

Number failing in legacy engine but pass in eval engine: 207
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.
207 test(s) were failing in legacy but now pass in eval. Before merging, confirm they are intended to pass.
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Conformance comparison report-Cross Commit-LEGACY

Base (28edbb7) 4958cb1 +/-
% Passing 92.51% 92.52% 0.02%
✅ Passing 5382 5383 1
❌ Failing 436 435 -1
🔶 Ignored 0 0 0
Total Tests 5818 5818 0
Number passing in both: 5382

Number failing in both: 435

Number passing in Base (28edbb7) but now fail: 0

Number failing in Base (28edbb7) but now pass: 1
The following test(s) were previously failing but now pass. Before merging, confirm they are intended to pass:

Click here to see
  • MYSQL_SELECT_29, compileOption: LEGACY

Conformance comparison report-Cross Commit-EVAL

Base (28edbb7) 4958cb1 +/-
% Passing 90.70% 90.70% 0.00%
✅ Passing 5278 5278 0
❌ Failing 541 541 0
🔶 Ignored 0 0 0
Total Tests 5819 5819 0
Number passing in both: 5278

Number failing in both: 541

Number passing in Base (28edbb7) but now fail: 1

Number failing in Base (28edbb7) but now pass: 1
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️. The following test(s) were previously passing but now fail:

Click here to see
  • Example 6 — Value Coercion, compileOption: LEGACY
The following test(s) were previously failing but now pass. Before merging, confirm they are intended to pass:
Click here to see
  • Example 6 — Value Coercion, compileOption: LEGACY

@@ -829,6 +837,9 @@ table_definition::{
name: '.identifier.symbol',
type: '.type',
constraints: list::[constraint],
// Should this be a constraint?
is_optional: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can prepend this with a TODO to make discoverable later. Also, on syntax, do this matter to be a constraint or not?

Copy link
Contributor Author

@yliuuuu yliuuuu May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters on the syntax.

The only thing is that Attribute Constraint in SQL is often after the data type declaration, but this is more of a terminology thing.

partiql-parser/src/main/antlr/PartiQL.g4 Outdated Show resolved Hide resolved
| TBLPROPERTIES PAREN_LEFT keyValuePair (COMMA keyValuePair)* PAREN_RIGHT # TblProperties
;

keyValuePair : key=LITERAL_STRING EQ value=literal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the full literal is required. Maybe it can get limited to LITERAL_STRING for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should permit Literal, i.e., key to be a string or timestamp does not make much sense to me.

I think (and forget to leave a comment here), the question is more towards identifier vs string?

i.e.,

TBLPROPERTIES(myKey = 'some value')

vs

TBLPROPERTIES('myKey' = 'someValue' )

Reason why this PR leans towards using string is

  1. this syntax are based from HIVE and HIVE uses string.
  2. Supporting string as key value seems reasonable to me and I don't think we need to deprecate this later.
  3. Supporting identifier as string seems to be a ergonomic feature and most likely we will process the identifier as string in subsequent process (as opposite to attempting to bind the identifier to a variable...). If you agree this is the case then we can always add the support later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with the key being string, but what I meant was do we need value=literal:

literal
    : NULL                                                                                # LiteralNull
    | MISSING                                                                             # LiteralMissing
    | TRUE                                                                                # LiteralTrue
    | FALSE                                                                               # LiteralFalse
    | LITERAL_STRING                                                                      # LiteralString
    | LITERAL_INTEGER                                                                     # LiteralInteger
    | LITERAL_DECIMAL                                                                     # LiteralDecimal
    | ION_CLOSURE                                                                         # LiteralIon
    | DATE LITERAL_STRING                                                                 # LiteralDate
    | TIME ( PAREN_LEFT LITERAL_INTEGER PAREN_RIGHT )? (WITH TIME ZONE)? LITERAL_STRING   # LiteralTime
    | TIMESTAMP ( PAREN_LEFT LITERAL_INTEGER PAREN_RIGHT )? (WITH TIME ZONE)? LITERAL_STRING   # LiteralTimestamp
    ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see.

I am not against limiting the scope to string value only, but I don't see any drawback to support all other values. It could be useful for customer who wants to specify like TBLPROPERTIES('isActive' = false).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps would be good for us to avoid widening the scope for now :)

SuccessTestCase(
"CREATE TABLE with Partition by single attribute",
"""
CREATE TABLE tbl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be syntax error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be semantic error, i.e., referring a non-existing attribute in Partition By expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is the following in a valid syntax considering it lacks the parenthesis and column definitions?

CREATE TABLE tbl PARTITION BY (a)

See:
https://www.db-fiddle.com/f/6PztiiVM6uEzpvdWnMyKX7/0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so because

CREATE TABLE tbl

has been a valid syntax for us.

Based on this I am treating the grammar as three parts:

CREATE TABLE tbl  -- header
(a INT2) -- data type definition
PARTITION BY (a) -- table extension

partiql-ast/src/main/resources/partiql_ast.ion Outdated Show resolved Hide resolved
partiql-ast/src/main/resources/partiql_ast.ion Outdated Show resolved Hide resolved
@@ -136,6 +137,19 @@ uniqueConstraintDef
// but we at least can eliminate SFW query here.
searchCondition : exprOr;

// SQL Extension, Support additional table metadatas such as partition by, tblProperties, etc.
tableExtension
: PARTITION BY partitionExpr # PartitionBy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning here on using PARTITION BY compared to PARTITIONED BY which was what HIVE DDL uses (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable)?

Was it to follow what postgresql does (https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-PARMS-PARTITION-BY)? I'm not sure if there's a standard syntax defined for DDL for this table extension.

Copy link
Member

@alancai98 alancai98 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed that PARTITION was already an existing keyword that's used for window functions:

windowPartitionList
: PARTITION BY expr (COMMA expr)*
;

Not sure if that factored into the decision here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A partitioned table is not in the SQL spec, hence the syntax is an extension made by various DB implementation.

The reason we choice to not following HIVE but PostgreSQL is to avoided the cognitive overhead.

PARTITION BY (a) vs PARTITIONED BY (a INT2).

That is: The PARTITION BY clause can take in a Partition Expression, and the PARTITIONED BY clause (with is not supported yet) can take in attribute declaration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the added context. Could be helpful that context in a comment for why we choose PostgreSQL's syntax here rather than HIVE's

partiql-parser/src/main/antlr/PartiQL.g4 Show resolved Hide resolved
@yliuuuu yliuuuu requested a review from alancai98 May 3, 2024 23:18
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (v1@28edbb7). Click here to learn what that means.

Additional details and impacted files
@@          Coverage Diff          @@
##             v1    #1455   +/-   ##
=====================================
  Coverage      ?   72.11%           
  Complexity    ?     2499           
=====================================
  Files         ?      264           
  Lines         ?    19802           
  Branches      ?     3681           
=====================================
  Hits          ?    14281           
  Misses        ?     4510           
  Partials      ?     1011           
Flag Coverage Δ
CLI 13.71% <ø> (?)
LANG 77.70% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yliuuuu yliuuuu merged commit 2879f3a into v1 May 6, 2024
14 checks passed
@yliuuuu yliuuuu deleted the v1-ddl-extension-keyword branch May 6, 2024 18:25
yliuuuu added a commit that referenced this pull request Jul 25, 2024
yliuuuu added a commit that referenced this pull request Jul 26, 2024
* Revert "V1 ddl extended keyword (#1455)"

This reverts commit 2879f3a

* Revert "struct subfield and list element type (#1449)"

This reverts commit 23f6fee

* Revert "run apiDump (#1447)"

This reverts commit 607c4c0

* Revert "Support parsing for attribute and tuple level constraint (#1442)"

This reverts commit abfc58d

* fix post-revert build

* submodule
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants