Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand wildcard expressions in distinct on #12941

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

epsio-banay
Copy link
Contributor

Closes #12940

Rationale for this change

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules proto Related to proto crate labels Oct 15, 2024
@epsio-banay epsio-banay force-pushed the fix-distinct-on-with-wildcard branch 2 times, most recently from 1d53a7a to d237a93 Compare October 15, 2024 14:44
@epsio-banay epsio-banay force-pushed the fix-distinct-on-with-wildcard branch from d237a93 to 4f73eec Compare October 15, 2024 14:45
@github-actions github-actions bot removed the proto Related to proto crate label Oct 15, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Thank you @epsio-banay

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @epsio-banay it looks to me. Just one suggestion:
How about adding an SQL test in datafusion/sqllogictest/test_files/distinct_on.slt?
It would be nice to ensure the plan won't be affected by other changes.

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

I verified that before this PR this query fails like this:

> create table foo as values (1), (2);
0 row(s) fetched.

> select distinct on (column1) * from foo;
Optimizer rule 'replace_distinct_aggregate' failed
caused by
replace_distinct_aggregate
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "foo" })], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "foo" })], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

And after this PR it works

DataFusion CLI v42.0.0
> create table foo as values (1), (2);
0 row(s) fetched.
Elapsed 0.024 seconds.

> select distinct on (column1) * from foo;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
+---------+
2 row(s) fetched.
Elapsed 0.026 seconds.

How about adding an SQL test in datafusion/sqllogictest/test_files/distinct_on.slt?

this is an excellent call @goldmedal -- I will make a follow on PR

@alamb alamb merged commit 9663141 into apache:main Oct 16, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

PR with SLT tests: #12968

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distinct on with wildcard is not transformed correctly by ExpandWildcardRule
3 participants