You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Frequently, when bumping our DataFusion version, we run into failures in test_queries.py as a result of "regressions" introduced through the upstream changes.
In actuality, these "regressions" often end up being optimal changes to the generated logical plans that expose issues with the Python API. However, in some cases, the changes to the logical plans are suboptimal, and it would be good to identify them (even if they don't result in failures) before merging in a version bump.
Describe the solution you'd like
It would be nice if there was some element to CI that clearly identified how a PR changed the logical plans associated with each query.
This could look like a CI check that raises a failure if the plans are different between main and the PR branch, or a comment that gets posted on PRs with a summary of the diff between the old and new plans. Don't have a strong preference right now, though I would like something that doesn't cause too much confusion to newer contributors who are unfamiliar with the query testing.
Describe alternatives you've considered
An alternative that should be pursued in parallel to this is improving the coverage of our Rust API testing; this would allow us to have a better idea of what more granular queries should produce, which in general will make it easier to identify why a change in the Rust code causes a regression in the query plans.
Additional context
Thinking about this because of #903, which caused several passing queries to start failing again.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Frequently, when bumping our DataFusion version, we run into failures in
test_queries.py
as a result of "regressions" introduced through the upstream changes.In actuality, these "regressions" often end up being optimal changes to the generated logical plans that expose issues with the Python API. However, in some cases, the changes to the logical plans are suboptimal, and it would be good to identify them (even if they don't result in failures) before merging in a version bump.
Describe the solution you'd like
It would be nice if there was some element to CI that clearly identified how a PR changed the logical plans associated with each query.
This could look like a CI check that raises a failure if the plans are different between
main
and the PR branch, or a comment that gets posted on PRs with a summary of the diff between the old and new plans. Don't have a strong preference right now, though I would like something that doesn't cause too much confusion to newer contributors who are unfamiliar with the query testing.Describe alternatives you've considered
An alternative that should be pursued in parallel to this is improving the coverage of our Rust API testing; this would allow us to have a better idea of what more granular queries should produce, which in general will make it easier to identify why a change in the Rust code causes a regression in the query plans.
Additional context
Thinking about this because of #903, which caused several passing queries to start failing again.
The text was updated successfully, but these errors were encountered: