Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for primary key parsing. #7152

Closed
mustafasrepo opened this issue Jul 31, 2023 · 3 comments · Fixed by #7160
Closed

Add support for primary key parsing. #7152

mustafasrepo opened this issue Jul 31, 2023 · 3 comments · Fixed by #7160
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@mustafasrepo
Copy link
Contributor

mustafasrepo commented Jul 31, 2023

Is your feature request related to a problem or challenge?

By using following syntax, we can define primary key.

CREATE TABLE sales_global_with_pk (zip_code INT,
          country VARCHAR(3),
          sn INT,
          ts TIMESTAMP,
          currency VARCHAR(3),
          amount FLOAT,
          primary key(sn)
        ) as VALUES
          (0, 'GRC', 0, '2022-01-01 06:00:00'::timestamp, 'EUR', 30.0),
          (1, 'FRA', 1, '2022-01-01 08:00:00'::timestamp, 'EUR', 50.0),
          (1, 'TUR', 2, '2022-01-01 11:30:00'::timestamp, 'TRY', 75.0),
          (1, 'FRA', 3, '2022-01-02 12:00:00'::timestamp, 'EUR', 200.0),
          (1, 'TUR', 4, '2022-01-03 10:00:00'::timestamp, 'TRY', 100.0)

However, following syntax is not supported.

CREATE TABLE sales_global_with_pk (zip_code INT,
          country VARCHAR(3),
          sn INT primary key,
          ts TIMESTAMP,
          currency VARCHAR(3),
          amount FLOAT
        ) as VALUES
          (0, 'GRC', 0, '2022-01-01 06:00:00'::timestamp, 'EUR', 30.0),
          (1, 'FRA', 1, '2022-01-01 08:00:00'::timestamp, 'EUR', 50.0),
          (1, 'TUR', 2, '2022-01-01 11:30:00'::timestamp, 'TRY', 75.0),
          (1, 'FRA', 3, '2022-01-02 12:00:00'::timestamp, 'EUR', 200.0),
          (1, 'TUR', 4, '2022-01-03 10:00:00'::timestamp, 'TRY', 100.0)

However, it doesn't indicate this also. Above query runs without error. However, there is no primary key defined for this table.
When primary key is supported, following query should succeed.

SELECT sn, ts, SUM(amount)
FROM sales_global_with_pk
GROUP BY sn;

Above query runs, for first version. However, for second version it is not supported. Because, primary key is not parsed.

Describe the solution you'd like

I would like to have support for alternative syntax

Describe alternatives you've considered

No response

Additional context

No response

@parkma99
Copy link
Contributor

Hello, sqlparser-rs supports second version syntax, but datafusion failed to support. I am so confusing so I take a lot time to find the reason. Below is I found:

In sqlparser-rs, the first version primary key(sn) is parsed in https://github.com/sqlparser-rs/sqlparser-rs/blob/10a6ec56371e32f5fa145074e797f8ec74f6d812/src/parser.rs#L3457, primary key(sn) parsed as constraints as a part of CreateTable Statement.

Then in datafusion, constraints will passed in https://github.com/apache/arrow-datafusion/blob/e02ed51b880d09f2156bd192d283aef75cab04b7/datafusion/sql/src/statement.rs#L125, then in https://github.com/apache/arrow-datafusion/blob/e02ed51b880d09f2156bd192d283aef75cab04b7/datafusion/sql/src/statement.rs#L166 or https://github.com/apache/arrow-datafusion/blob/e02ed51b880d09f2156bd192d283aef75cab04b7/datafusion/sql/src/statement.rs#L189 constraints passed in LogicalPlan.

The second version sn INT primary key is parsed in https://github.com/sqlparser-rs/sqlparser-rs/blob/10a6ec56371e32f5fa145074e797f8ec74f6d812/src/parser.rs#L3657C29-L3657C29 as a part of ColumnDef as a part of columns in CreateTable Statement.

Then in datafusion, columns passed in https://github.com/apache/arrow-datafusion/blob/e02ed51b880d09f2156bd192d283aef75cab04b7/datafusion/sql/src/planner.rs#L206-L223 . In there, the ColumnDef option only used NotNull, others option is discareded. The primary key does not pass into LogicalPlan.

Find the reason is soooo happy.

@parkma99
Copy link
Contributor

I will take this issue, add this enhancement

@mustafasrepo
Copy link
Contributor Author

Thanks @parkma99 for locating the root cause, and taking the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants