Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Display for Expr, improve operator display #971

Merged
merged 4 commits into from
Sep 22, 2021

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #347

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Sep 6, 2021
@matthewmturner
Copy link
Contributor Author

matthewmturner commented Sep 6, 2021

@alamb @Dandandan FYI i just started on this. I added the impl for Display on BinaryExpr and updated some of the logical_plan tests to confirm it worked. If you think this is an ok approach i can work on updating the other tests.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @matthewmturner !

I think this looks like a good approach. Thank you

ref right,
ref op,
} => write!(f, "{} {} {}", left, op, right),
_ => write!(f, "{}", ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this code would cover all the variants of Expr -- however, that doesn't have to be done in one PR.

How about as a potential interim solution we could do something like the following and use the debug display until someone has time to add a better display implementation?

Suggested change
_ => write!(f, "{}", ""),
_ => write!(f, "{:?}", self),

@@ -117,19 +117,19 @@ mod tests {
fn test_operators() {
assert_eq!(
format!("{:?}", lit(1u32) + lit(2u32)),
"UInt32(1) Plus UInt32(2)"
"UInt32(1) + UInt32(2)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sure looks better

@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

Note the python test failure has been resolved on master so if you rebase this branch against apache/master that test should pass

@github-actions github-actions bot added ballista development-process Related to development process of DataFusion sql SQL Planner labels Sep 10, 2021
@@ -2343,8 +2347,8 @@ mod tests {
GROUP BY first_name
HAVING MAX(age) > 100 AND MIN(id - 2) < 50";
let expected = "Projection: #person.first_name, #MAX(person.age)\
\n Filter: #MAX(person.age) Gt Int64(100) And #MIN(person.id Minus Int64(2)) Lt Int64(50)\
\n Aggregate: groupBy=[[#person.first_name]], aggr=[[MAX(#person.age), MIN(#person.id Minus Int64(2))]]\
\n Filter: #MAX(person.age) > Int64(100) AND #MIN(person.id Minus Int64(2)) < Int64(50)\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Im a bit confused. The minus operator seems to be formatted differently depending on the part of the plan its in - is that expected? The test passes right now but i had to use Minus in the Filter section and - in the Aggregate section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference is that MIN is a aggregate function which is a different type of Expr -- perhaps it formats its arguments using {:?}

@@ -64,7 +64,7 @@ pub enum AggregateFunction {
impl fmt::Display for AggregateFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// uppercase of the debug.
write!(f, "{}", format!("{:?}", self).to_uppercase())
write!(f, "{}", self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb FYI after making this update I get a stack overflow error when running the test select_aggregate_with_group_by_with_having_using_derived_column_aggreagate_not_in_select. I think i saw on some other issues/PRs a stack related issue, and im not sure if it could be related to this, but wanted to run it by you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#910 (comment) is a perhaps relevant comment. It adds RUST_MIN_STACK_SIZE when the tests are run. Perhaps you can try rebasing to pick up the changes in #910 and see if the problem you were seeing is resolved?

ref left,
ref right,
ref op,
} => write!(f, "{} {} {}", left, op, right),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used at the moment, as the implementation uses debug.

@@ -64,7 +64,7 @@ pub enum AggregateFunction {
impl fmt::Display for AggregateFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// uppercase of the debug.
write!(f, "{}", format!("{:?}", self).to_uppercase())
write!(f, "{}", self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this doing infinite recursion, as it will try to use the same method to display self?

Comment on lines 1613 to +1653
Expr::BinaryExpr { left, op, right } => {
write!(f, "{:?} {:?} {:?}", left, op, right)
write!(f, "{:?} {} {:?}", left, op, right)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dandandan FYI i updated here to use display and i think thats what made it work. i was able to get the expected results on some tests after this as well.

Are you ok with this approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be ok with this as intermediate solution. Ideally we'll move the formatting to Display instead and use that for printing @alamb wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Dandandan

@Dandandan
Copy link
Contributor

Nice, some tests to go ;)

@matthewmturner
Copy link
Contributor Author

@Dandandan @alamb I updated and got all tests to pass. let me know if anything else needed :)

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great quality of life improvement 👍💯

@alamb alamb changed the title Improve operator display Implement Display for Expr, improve operator display Sep 18, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much nicer to me. Thank you @matthewmturner !

Comment on lines 1613 to +1653
Expr::BinaryExpr { left, op, right } => {
write!(f, "{:?} {:?} {:?}", left, op, right)
write!(f, "{:?} {} {:?}", left, op, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Dandandan

@alamb
Copy link
Contributor

alamb commented Sep 18, 2021

I am going to test this branch locally against main to make sure there are no logical conficts in the tests after merging #792 and if not will merge it in

@alamb
Copy link
Contributor

alamb commented Sep 18, 2021

unfortunately when I merged apache/master into this branch locally, several tests failed. It looks like a few newly added tests also need to be updated


failures:

---- execution::context::tests::window_partition_by_order_by stdout ----
thread 'execution::context::tests::window_partition_by_order_by' panicked at 'assertion failed: `(left == right)`
  left: `["+----+----+--------------+-----------------------------------+----------------------------------+------------------------------------------+--------------+----------------+--------------+--------------+--------------+", "| c1 | c2 | ROW_NUMBER() | FIRST_VALUE(test.c2 Plus test.c1) | LAST_VALUE(test.c2 Plus test.c1) | NTH_VALUE(test.c2 Plus test.c1,Int64(1)) | SUM(test.c2) | COUNT(test.c2) | MAX(test.c2) | MIN(test.c2) | AVG(test.c2) |", "+----+----+--------------+-----------------------------------+----------------------------------+------------------------------------------+--------------+----------------+--------------+--------------+--------------+", "| 0  | 1  | 1            | 1                                 | 1                                | 1                                        | 1            | 1              | 1            | 1            | 1            |", "| 0  | 2  | 1            | 2                                 | 2                                | 2                                        | 2            | 1              | 2            | 2            | 2            |", "| 0  | 3  | 1            | 3                                 | 3                                | 3                                        | 3            | 1              | 3            | 3            | 3            |", "| 0  | 4  | 1            | 4                                 | 4                                | 4                                        | 4            | 1              | 4            | 4            | 4            |", "| 0  | 5  | 1            | 5                                 | 5                                | 5                                        | 5            | 1              | 5            | 5            | 5            |", "+----+----+--------------+-----------------------------------+----------------------------------+------------------------------------------+--------------+----------------+--------------+--------------+--------------+"]`,
 right: `["+----+----+--------------+--------------------------------+-------------------------------+---------------------------------------+--------------+----------------+--------------+--------------+--------------+", "| c1 | c2 | ROW_NUMBER() | FIRST_VALUE(test.c2 + test.c1) | LAST_VALUE(test.c2 + test.c1) | NTH_VALUE(test.c2 + test.c1,Int64(1)) | SUM(test.c2) | COUNT(test.c2) | MAX(test.c2) | MIN(test.c2) | AVG(test.c2) |", "+----+----+--------------+--------------------------------+-------------------------------+---------------------------------------+--------------+----------------+--------------+--------------+--------------+", "| 0  | 1  | 1            | 1                              | 1                             | 1                                     | 1            | 1              | 1            | 1            | 1            |", "| 0  | 2  | 1            | 2                              | 2                             | 2                                     | 2            | 1              | 2            | 2            | 2            |", "| 0  | 3  | 1            | 3                              | 3                             | 3                                     | 3            | 1              | 3            | 3            | 3            |", "| 0  | 4  | 1            | 4                              | 4                             | 4                                     | 4            | 1              | 4            | 4            | 4            |", "| 0  | 5  | 1            | 5                              | 5                             | 5                                     | 5            | 1              | 5            | 5            | 5            |", "+----+----+--------------+--------------------------------+-------------------------------+---------------------------------------+--------------+----------------+--------------+--------------+--------------+"]`: 

expected:

[
    "+----+----+--------------+-----------------------------------+----------------------------------+------------------------------------------+--------------+----------------+--------------+--------------+--------------+",
    "| c1 | c2 | ROW_NUMBER() | FIRST_VALUE(test.c2 Plus test.c1) | LAST_VALUE(test.c2 Plus test.c1) | NTH_VALUE(test.c2 Plus test.c1,Int64(1)) | SUM(test.c2) | COUNT(test.c2) | MAX(test.c2) | MIN(test.c2) | AVG(test.c2) |",
    "+----+----+--------------+-----------------------------------+----------------------------------+------------------------------------------+--------------+----------------+--------------+--------------+--------------+",
    "| 0  | 1  | 1            | 1                                 | 1                                | 1                                        | 1            | 1              | 1            | 1            | 1            |",
    "| 0  | 2  | 1            | 2                                 | 2                                | 2                                        | 2            | 1              | 2            | 2            | 2            |",
    "| 0  | 3  | 1            | 3                                 | 3                                | 3                                        | 3            | 1              | 3            | 3            | 3            |",
    "| 0  | 4  | 1            | 4                                 | 4                                | 4                                        | 4            | 1              | 4            | 4            | 4            |",
    "| 0  | 5  | 1            | 5                                 | 5                                | 5                                        | 5            | 1              | 5            | 5            | 5            |",
    "+----+----+--------------+-----------------------------------+----------------------------------+------------------------------------------+--------------+----------------+--------------+--------------+--------------+",
]
actual:

[
    "+----+----+--------------+--------------------------------+-------------------------------+---------------------------------------+--------------+----------------+--------------+--------------+--------------+",
    "| c1 | c2 | ROW_NUMBER() | FIRST_VALUE(test.c2 + test.c1) | LAST_VALUE(test.c2 + test.c1) | NTH_VALUE(test.c2 + test.c1,Int64(1)) | SUM(test.c2) | COUNT(test.c2) | MAX(test.c2) | MIN(test.c2) | AVG(test.c2) |",
    "+----+----+--------------+--------------------------------+-------------------------------+---------------------------------------+--------------+----------------+--------------+--------------+--------------+",
    "| 0  | 1  | 1            | 1                              | 1                             | 1                                     | 1            | 1              | 1            | 1            | 1            |",
    "| 0  | 2  | 1            | 2                              | 2                             | 2                                     | 2            | 1              | 2            | 2            | 2            |",
    "| 0  | 3  | 1            | 3                              | 3                             | 3                                     | 3            | 1              | 3            | 3            | 3            |",
    "| 0  | 4  | 1            | 4                              | 4                             | 4                                     | 4            | 1              | 4            | 4            | 4            |",
    "| 0  | 5  | 1            | 5                              | 5                             | 5                                     | 5            | 1              | 5            | 5            | 5            |",
    "+----+----+--------------+--------------------------------+-------------------------------+---------------------------------------+--------------+----------------+--------------+--------------+--------------+",
]

', datafusion/src/execution/context.rs:1688:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- optimizer::common_subexpr_eliminate::test::subexpr_in_different_order stdout ----
thread 'optimizer::common_subexpr_eliminate::test::subexpr_in_different_order' panicked at 'assertion failed: `(left == right)`
  left: `"Projection: Int32(1) + #test.a, #test.a + Int32(1)\n  TableScan: test projection=None"`,
 right: `"Projection: Int32(1) Plus #test.a, #test.a Plus Int32(1)\n  TableScan: test projection=None"`', datafusion/src/optimizer/common_subexpr_eliminate.rs:647:9

---- optimizer::common_subexpr_eliminate::test::cross_plans_subexpr stdout ----
thread 'optimizer::common_subexpr_eliminate::test::cross_plans_subexpr' panicked at 'assertion failed: `(left == right)`
  left: `"Projection: #Int32(1) + test.a\n  Projection: Int32(1) + #test.a\n    TableScan: test projection=None"`,
 right: `"Projection: #Int32(1) Plus test.a\n  Projection: Int32(1) Plus #test.a\n    TableScan: test projection=None"`', datafusion/src/optimizer/common_subexpr_eliminate.rs:647:9

---- optimizer::common_subexpr_eliminate::test::subexpr_in_same_order stdout ----
thread 'optimizer::common_subexpr_eliminate::test::subexpr_in_same_order' panicked at 'assertion failed: `(left == right)`
  left: `"Projection: #BinaryExpr-+Column-test.aLiteral1 AS Int32(1) + test.a AS first, #BinaryExpr-+Column-test.aLiteral1 AS Int32(1) + test.a AS second\n  Projection: Int32(1) + #test.a AS BinaryExpr-+Column-test.aLiteral1, #test.a, #test.b, #test.c\n    TableScan: test projection=None"`,
 right: `"Projection: #BinaryExpr-+Column-test.aLiteral1 AS Int32(1) Plus test.a AS first, #BinaryExpr-+Column-test.aLiteral1 AS Int32(1) Plus test.a AS second\n  Projection: Int32(1) Plus #test.a AS BinaryExpr-+Column-test.aLiteral1, #test.a, #test.b, #test.c\n    TableScan: test projection=None"`', datafusion/src/optimizer/common_subexpr_eliminate.rs:647:9

---- optimizer::common_subexpr_eliminate::test::aggregate stdout ----
thread 'optimizer::common_subexpr_eliminate::test::aggregate' panicked at 'assertion failed: `(left == right)`
  left: `"Aggregate: groupBy=[[]], aggr=[[Int32(1) + #AggregateFunction-AVGfalseColumn-test.a AS AVG(test.a), Int32(1) - #AggregateFunction-AVGfalseColumn-test.a AS AVG(test.a)]]\n  Projection: AVG(#test.a) AS AggregateFunction-AVGfalseColumn-test.a, #test.a, #test.b, #test.c\n    TableScan: test projection=None"`,
 right: `"Aggregate: groupBy=[[]], aggr=[[Int32(1) Plus #AggregateFunction-AVGfalseColumn-test.a AS AVG(test.a), Int32(1) Minus #AggregateFunction-AVGfalseColumn-test.a AS AVG(test.a)]]\n  Projection: AVG(#test.a) AS AggregateFunction-AVGfalseColumn-test.a, #test.a, #test.b, #test.c\n    TableScan: test projection=None"`', datafusion/src/optimizer/common_subexpr_eliminate.rs:647:9

---- optimizer::common_subexpr_eliminate::test::tpch_q1_simplified stdout ----
thread 'optimizer::common_subexpr_eliminate::test::tpch_q1_simplified' panicked at 'assertion failed: `(left == right)`
  left: `"Aggregate: groupBy=[[]], aggr=[[SUM(#BinaryExpr-*BinaryExpr--Column-test.bLiteral1Column-test.a AS test.a * Int32(1) - test.b), SUM(#BinaryExpr-*BinaryExpr--Column-test.bLiteral1Column-test.a AS test.a * Int32(1) - test.b * Int32(1) + #test.c)]]\n  Projection: #test.a * Int32(1) - #test.b AS BinaryExpr-*BinaryExpr--Column-test.bLiteral1Column-test.a, #test.a, #test.b, #test.c\n    TableScan: test projection=None"`,
 right: `"Aggregate: groupBy=[[]], aggr=[[SUM(#BinaryExpr-*BinaryExpr--Column-test.bLiteral1Column-test.a AS test.a Multiply Int32(1) Minus test.b), SUM(#BinaryExpr-*BinaryExpr--Column-test.bLiteral1Column-test.a AS test.a Multiply Int32(1) Minus test.b Multiply Int32(1) Plus #test.c)]]\n  Projection: #test.a Multiply Int32(1) Minus #test.b AS BinaryExpr-*BinaryExpr--Column-test.bLiteral1Column-test.a, #test.a, #test.b, #test.c\n    TableScan: test projection=None"`', datafusion/src/optimizer/common_subexpr_eliminate.rs:647:9


failures:
    execution::context::tests::window_partition_by_order_by
    optimizer::common_subexpr_eliminate::test::aggregate
    optimizer::common_subexpr_eliminate::test::cross_plans_subexpr
    optimizer::common_subexpr_eliminate::test::subexpr_in_different_order
    optimizer::common_subexpr_eliminate::test::subexpr_in_same_order
    optimizer::common_subexpr_eliminate::test::tpch_q1_simplified

@Dandandan
Copy link
Contributor

@matthewmturner let us know when you can have a look at the tests, I think only a rebase and update of the tests is what's needed here

@matthewmturner
Copy link
Contributor Author

@matthewmturner let us know when you can have a look at the tests, I think only a rebase and update of the tests is what's needed here

sry, somehow missed the prior comments. will work on this.

@matthewmturner
Copy link
Contributor Author

@alamb @Dandandan

as im relatively new to having to rebase, and ive had some issues in the past with it, i just want to confirm im doing it the right way.

i was going to do git pull -rebase=interactive upstream master and then squash all commits except the first so that all other commits arent included.

can you confirm if this is ok or if there is a better way?

thanks in advance!

@alamb
Copy link
Contributor

alamb commented Sep 20, 2021

can you confirm if this is ok or if there is a better way?

I normally do it by:

git fetch apache
git rebase  -i origin/apache

where I have previously set apache to be

git remote add apache [email protected]:apache/arrow-datafusion.git

But I suspect there are other workflows that work too

@houqp
Copy link
Member

houqp commented Sep 21, 2021

@matthewmturner if you are not comfortable doing rebase, you can also do a simple git pull too to pull in the latest master change and resolve the conflicts locally on merge. We do squash merge on every PR, so it doesn't really matter if you do a rebase pull or merge pull in your local branch.

@houqp houqp added enhancement New feature or request and removed sql SQL Planner development-process Related to development process of DataFusion python labels Sep 21, 2021
@matthewmturner
Copy link
Contributor Author

@alamb thank you
@houqp i should be ok with doing rebase - i just saw there were a bunch of options for it and wanted to make sure i wasnt messing up any commit history etc. thanks! will have this done shortly

# This is the 1st commit message:

Add Display for Expr::BinaryExpr

# This is the commit message #2:

Update logical_plan/operators tests

# This is the commit message #3:

rebase and debug display for non binary expr
Update logical_plan/operators tests

rebase and debug display for non binary expr

Add Display for Expr::BinaryExpr

Update logical_plan/operators tests

Updating tests

Update aggregate display

Updating tests without aggregate

More tests

Working on agg/scalar functions

Fix binary_expr in create_name function and attendant tests

More tests

More tests

Doc tests

Rebase and update new tests
@github-actions github-actions bot added the sql SQL Planner label Sep 21, 2021
@matthewmturner
Copy link
Contributor Author

@alamb @houqp im failing on a number of avro related tests now that dont seem impacted by this PR. any thoughts?

@alamb
Copy link
Contributor

alamb commented Sep 21, 2021

@alamb @houqp im failing on a number of avro related tests now that dont seem impacted by this PR. any thoughts?

It looks to me like somehow the submodule pin of testing has changed (likely inadvertently) as part of the PR:
Screen Shot 2021-09-21 at 1 45 47 PM

@matthewmturner
Copy link
Contributor Author

@alamb @houqp im failing on a number of avro related tests now that dont seem impacted by this PR. any thoughts?

It looks to me like somehow the submodule pin of testing has changed (likely inadvertently) as part of the PR:
Screen Shot 2021-09-21 at 1 45 47 PM

gah, guess i messed up the rebase. although im confused how the submodule was impacted - i didnt do anything for that (maybe that was the issue). ill look into it.

@matthewmturner
Copy link
Contributor Author

@alamb all good now i think!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewmturner -- something is still messed up with the submodules as this PR shows a change to them.

Screen Shot 2021-09-22 at 11 35 58 AM

Let me take a shot at cleaning up the PR and see if I can get rid of them...

@matthewmturner
Copy link
Contributor Author

@alamb ugh sry for the trouble. its not clear to me the right flow i should have performed. my understanding was that i was missing the updates to the submodules (avro in particular) so i updated them to the latest. what is the correct way to update submodules then?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fix in af86048. For the record, here are the commands I used (cargo culted from past experience):

git reset apache/master testing
git restore testing

git reset apache/master parquet-testing
git restore parquet-testing

git commit -a -m 'Restore submodule references from master'
git push

Thanks for sticking with this @matthewmturner -- I plan to merge this PR when the tests have completed

@matthewmturner
Copy link
Contributor Author

I pushed a fix in af86048. For the record, here are the commands I used (cargo culted from past experience):

git reset apache/master testing
git restore testing

git reset apache/master parquet-testing
git restore parquet-testing

git commit -a -m 'Restore submodule references from master'
git push

Thanks for sticking with this @matthewmturner -- I plan to merge this PR when the tests have completed

Ok - thank you for providing that.

@alamb
Copy link
Contributor

alamb commented Sep 22, 2021

ugh sry for the trouble. its not clear to me the right flow i should have performed

No worries! I personally find working with submodules with git also quite confusing -- part of the problem is that changes to them are picked up if you do git commit -a -m even if you didn't explicitly want to change the references. 🤷 hopefully it won't hit us again

@alamb alamb merged commit a02d5e1 into apache:master Sep 22, 2021
@alamb
Copy link
Contributor

alamb commented Sep 22, 2021

Thanks again @matthewmturner !

@matthewmturner
Copy link
Contributor Author

@alamb np. appreciate your guidance and patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve display of operators in Explain output
4 participants