Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add CI check for query logical plan regressions #911

Open
charlesbluca opened this issue Nov 9, 2022 · 0 comments
Open

[ENH] Add CI check for query logical plan regressions #911

charlesbluca opened this issue Nov 9, 2022 · 0 comments
Labels
datafusion Related to work in DataFusion enhancement New feature or request

Comments

@charlesbluca
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Frequently, when bumping our DataFusion version, we run into failures in test_queries.py as a result of "regressions" introduced through the upstream changes.

In actuality, these "regressions" often end up being optimal changes to the generated logical plans that expose issues with the Python API. However, in some cases, the changes to the logical plans are suboptimal, and it would be good to identify them (even if they don't result in failures) before merging in a version bump.

Describe the solution you'd like
It would be nice if there was some element to CI that clearly identified how a PR changed the logical plans associated with each query.

This could look like a CI check that raises a failure if the plans are different between main and the PR branch, or a comment that gets posted on PRs with a summary of the diff between the old and new plans. Don't have a strong preference right now, though I would like something that doesn't cause too much confusion to newer contributors who are unfamiliar with the query testing.

Describe alternatives you've considered
An alternative that should be pursued in parallel to this is improving the coverage of our Rust API testing; this would allow us to have a better idea of what more granular queries should produce, which in general will make it easier to identify why a change in the Rust code causes a regression in the query plans.

Additional context
Thinking about this because of #903, which caused several passing queries to start failing again.

@charlesbluca charlesbluca added enhancement New feature or request datafusion Related to work in DataFusion labels Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Related to work in DataFusion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant