Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add values list expression #1165

Merged
merged 2 commits into from
Oct 23, 2021
Merged

Conversation

jimexist
Copy link
Member

Which issue does this PR close?

Closes #1080

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate sql SQL Planner labels Oct 22, 2021
@jimexist jimexist force-pushed the add-values-list-impl branch 2 times, most recently from bb74f4d to 1216110 Compare October 23, 2021 06:53
@jimexist jimexist changed the title WIP add values list expression add values list expression Oct 23, 2021
@jimexist jimexist requested review from alamb and houqp and removed request for alamb October 23, 2021 06:54
@jimexist jimexist force-pushed the add-values-list-impl branch 2 times, most recently from 71c04fc to 7f2170e Compare October 23, 2021 08:21
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @jimexist -- it will make creating small examples / demos much easier 🏅 🏅 🏅

The only thing I think that needs to be fixed prior to merge is the contents of from_plan in utils.rs where values is not recreated correctly from the input exprs.

The test cases are very thorough as is the implementation.

I took it for a spin locally:

> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three'));
Plan("subquery in FROM must have an alias")
> 
> 
> 
> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) as t;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | one     |
| 2       | two     |
| 3       | three   |
+---------+---------+
3 rows in set. Query took 0.007 seconds.
> 


SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) as t(num, letter);
+-----+--------+
| num | letter |
+-----+--------+
| 1   | one    |
| 2   | two    |
| 3   | three  |
+-----+--------+
3 rows in set. Query took 0.005 seconds.

Which is very cool (it is working the same as Postgres!)

assert!(plan.is_err());
}
{
let sql = "VALUES (1),('2')";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for testing the negative case

datafusion/src/logical_plan/builder.rs Outdated Show resolved Hide resolved
datafusion/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
if data.is_empty() {
return Err(DataFusionError::Plan("Values list cannot be empty".into()));
}
// we have this empty batch as a placeholder to satisfy evaluation argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what you think about moving the creation of the actual RecordBatch from ValuesExec::try_new to execute -- the rationale would be to make PhysicalPlan creation faster and push the actual work into execute where if can potentially be run concurrently with other parts

Given the size of data in a VALUES statement, this is not likely to be any real difference so I am fine with leaving the creation in the same place too -- I just wanted to mention it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can address this in a subsequent PR where more expr types are supported (e.g. CAST)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
@@ -77,6 +77,7 @@ impl OptimizerRule for ConstantFolding {
| LogicalPlan::Aggregate { .. }
| LogicalPlan::Repartition { .. }
| LogicalPlan::CreateExternalTable { .. }
| LogicalPlan::Values { .. }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it is not likely to matter, but constant folding could be applied to the Exprs in values. As written this code will not apply constant folding to those expressions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address in #1170

@@ -124,6 +124,10 @@ pub fn from_plan(
schema: schema.clone(),
alias: alias.clone(),
}),
LogicalPlan::Values { schema, values } => Ok(LogicalPlan::Values {
schema: schema.clone(),
values: values.to_vec(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the values here should be derived from expr - the various optimizers call from_plan to create a new logical plan after potentially rewriting expressions returned from LogicalPlan::expressions.

something like (untested)

values : exprs.windows(values[0].len()).map(|w| w.to_vec()).collect()

I'll add some comments to make that clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_batches_eq!(expected, &actual);
}
{
let sql = "VALUES (NULL,'a'),(NULL,'b'),(3,'c')";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL every case I could come up with to test you have already covered

@alamb
Copy link
Contributor

alamb commented Oct 23, 2021

It also would be cool to add VALUES to the list of supported features on https://github.com/apache/arrow-datafusion#sql-support

@jimexist
Copy link
Member Author

It also would be cool to add VALUES to the list of supported features on https://github.com/apache/arrow-datafusion#sql-support

let me track that here

@jimexist jimexist merged commit 3c1b807 into apache:master Oct 23, 2021
@jimexist jimexist deleted the add-values-list-impl branch October 23, 2021 13:10
@alamb
Copy link
Contributor

alamb commented Oct 23, 2021

🎉

@houqp houqp added the enhancement New feature or request label Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support creating in-memory table by using values lists
3 participants