-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add show external tables #3279
Add show external tables #3279
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3279 +/- ##
==========================================
- Coverage 85.89% 85.47% -0.43%
==========================================
Files 294 294
Lines 53373 54099 +726
==========================================
+ Hits 45845 46241 +396
- Misses 7528 7858 +330
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks @psvri |
This dosent add new syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the syntax being show create external table
vs just show external table
? Show Create
has two verbs in one sentence and seems like a confusing choice.
Also, I'm wary of passing duplicated information (the raw SQL + the result of that raw SQL) - it seems better to store the information once (in the registered table), and regenerate the SQL from that in a normalized fashion.
Oh, interesting, I see this follows the spark precedent. |
Can you add some SQL level test for this fix? @psvri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing @psvri -- I have been away. We are working on increasing the review capacity for this project as well as those who can commit PRs
All in all, I really like the feature -- I have one style suggestion
"+---------------+--------------+------------+-------------------------------------------------------------------------------------------------+", | ||
"| table_catalog | table_schema | table_name | definition |", | ||
"+---------------+--------------+------------+-------------------------------------------------------------------------------------------------+", | ||
"| datafusion | public | abc | CREATE EXTERNAL TABLE abc STORED AS CSV LOCATION ../../testing/data/csv/aggregate_test_100.csv |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ this is great
pub fn try_new(config: ListingTableConfig) -> Result<Self> { | ||
pub fn try_new( | ||
config: ListingTableConfig, | ||
definition: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only thing I would prefer to change in this PR. Specifically, rather than change the API of try_new
I think we could add a function like
pub fn with_definition(mut self, defintion: Option<String>) -> Self {
...
}
This change I think would reduce the size of this PR significantly
Since I realize this PR has been waiting for a week for feedback I will propose this change via PR
Here is my proposed change (targeting this PR) : psvri#1 |
Use builder-style API to set ListingTable definition
NP @alamb I too was on a break for a while. Thanks for PR. I will merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @psvri
@avantgardnerio you had an outstanding comment on this PR, I believe -- can you please review again and see if this meets your expectations now? |
@alamb and @psvri , my concern was regarding the implementation based upon retaining the original raw SQL string. The issue this is resolving had a suggested implementation:
And that was more in line with my thoughts:
This seems strange to me, in that it will carry over irrelevant syntax differences, and also stores redundant information. When I ask postgres to show a table, I don't get my raw SQL back, I get back generated SQL that matches the definition postgres has an internal representation for. It bothers me, but I'm not sure if I'm being pedantic I'll leave it to you @alamb . |
I agree that most other systems store some sort of "canonicalized" view definition. The sqlparser-rs code offers that the output of
I think the question of "how do we store a view definition" is somewhat orthogonal to the feature in this PR to show that view definition (though they are of course related!) and I think we can handle it in a follow on Thus, I plan to file an issue where we can discus the "what to use / how to store a view definition" issue and then merge this PR. This may also be related to the discussion we are having about view implementation on #3249 (comment) |
🤔 so I actually tried out 'show create table' with a view and it already does canonicalize the SQL, it seems: DataFusion CLI v11.0.0
❯ create table t as select * from (values (1), (2), (3)) as sq;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| 3 |
+---------+
3 rows in set. Query took 0.050 seconds.
❯ create view v as select column1 /*my random comment*/ from t where column1 != 1;
0 rows in set. Query took 0.001 seconds.
❯ show create table v;
+---------------+--------------+------------+-----------------------------------------------------------+
| table_catalog | table_schema | table_name | definition |
+---------------+--------------+------------+-----------------------------------------------------------+
| datafusion | public | v | CREATE VIEW v AS SELECT column1 FROM t WHERE column1 <> 1 |
+---------------+--------------+------------+-----------------------------------------------------------+
1 row in set. Query took 0.015 seconds. (note the lack of I believe this means that DataFusion is storing the canonicalized SQL string from sqlparser |
Ok, so now I am really confused. When I tried to run ❯ CREATE EXTERNAL TABLE abc STORED AS /*random comment*/ CSV LOCATION '../testing/data/csv/aggregate_test_100.csv';
0 rows in set. Query took 0.063 seconds.
❯ show create table abc;
+---------------+--------------+------------+------------+
| table_catalog | table_schema | table_name | definition |
+---------------+--------------+------------+------------+
| datafusion | public | abc | |
+---------------+--------------+------------+------------+ |
@@ -443,6 +443,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
&self, | |||
statement: CreateExternalTable, | |||
) -> Result<LogicalPlan> { | |||
let definition = Some(statement.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This to_string()
call converts the parsed statement into a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @psvri. Could you rebase or upmerge to fix the conflicts? |
Benchmark runs are scheduled for baseline = 7c04964 and contender = 0084aeb. 0084aeb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2848.
Rationale for this change
Show create table works for external tables
What changes are included in this PR?
Show create table now works with external tables
Are there any user-facing changes?
Yes