-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to disable Physical input schema should be the same as the one converted from logical schema
error
#13065
Comments
Physical input schema should be the same as the one converted from logical schema
error
I should clarify -- ideally this check should be enabled by default and it is a goal we should shoot for. However, as there are clearly bugs in the code that currently prevent it from passing cleanly in all cases (which were previously in the code), I think it is better to relax the check and sort out the errors rather than hard failing plans. |
Some additional context -- we have definitely not isolated all the bugs this check uncovered. Even with this know bug (#13010) patched for us, we are still encountering this failed check every few minutes. Changing this check to a warning, not an error, has been necessary for us. Assuming that we are not the only ones, having the feature @alamb proposed here (to also convert to a warning based on configuration) would help unblock others from ungrading datafusion IMO. |
We (InfluxData) will likely contribute fixes back upstream into DataFusion as we find issues as well |
It is probably happens when users utilize only physical planner, although there are schema alignments that happening on logical planner? |
I remember bunch of issues on schema comparison when different null flag for the column caused a problem.Since DF reworked the |
@wiedld are you able to capture failing queries (or rather: queries giving a warning) and maybe turn some into actionable issues for the community? |
The question I'd want answered is: does disabling this check hide an actual problem or are we confident the problem exists within the check itself? That is, do we know for certain that disabling the check isn't allowing faulty output? In general I recommend against features to disable checks but I also recognize that unblocking downstream users may be more important. |
I believe disabling the check hides actual problems -- a list of such problems is here #12733 However, those plans ran in DataFusion 41 (and sometimes metadata was dropped from the output schema) and now the plans refuse to run. So depending on your perspective this is either an improvement or regression:
So having a flag to change behaviors based on your point of view I think makes sense
No, we do not know this for certain. I think my point of view is that running with the check disabled is no worse than DataFusion 41 so while not ideal, it isn't worse |
There are two classes of errors we have seen so far in InfluxDB:
I believe delta-rs is also seeing field name mismatches (e.g. the embedded fieldname on |
Before DF 41 when doing a schema comparison we relied on helper methods which excludes nullability and metadata comparison. I suppose this method is gone after the refactoring and we need to revive it and use for schema comparison |
@findepi I'll try to get us actionable issues this week. Specifically, I need to isolate reproducers in either our code base (e.g. our physical optimizers) versus datafusion. |
I think it is more accurate to say "all currently filed issues have been resolved" @wiedld and think we have at least one more issue that is as yet unfiled (that we see occuring in our logs after we upgraded DataFusion and turned this error into a warning). More to come -- we are working on getting a self contained reproducer |
know / filed -- it seems like we mean mostly the same 👍
thanks! |
Anyone willing to help make a PR with this change? |
I am making a PR for this |
Sorry what PR @alamb, I'm a bit lost, is it to fix the schema equality and not rely on nullability/metadata? |
Do we have a stable repro case on this for DF only? I'm thinking on fixing the schema equality as it would fix the issue without params |
We have fixed all (known) DF bugs on main. The known ones are listed on #12733 I believe. However @wiedld and I are quite confident there is at least one more (as yet unfiled one) that we are seeing in our production system. Even once we file and fix that I am not at all confident there won't be any others I view adding this configuration option as an insurance policy |
another side of the medal is if there is a critical issue like schema doesn't match because of data/ordering they won't catch it because param enabled and eventually can end up with corrupted data. My vision we should correctly implement |
I think it is somewhat debatable if we should/shouldn't be checking that the nullability and metadata matches. I actually think having the invariant that the corresponding physical plan created for a LogicalPlan should have the exact same schema seems quite reasonable 🤔 |
Thus far we have found 1 additional schema mishandling bug, triggering this check, where the bug was in datafusion. If I encounter more bugs in DF code, I'll add it to the Epic #12733. |
Is your feature request related to a problem or challenge?
This bug, released in DataFusion 42.0.0 ,
AggregateUDFImpl::is_null
#11989Added a new check in the DefaultPhysicalPlanner that the schema of the output plan is the same as the input plan
datafusion/datafusion/core/src/physical_planner.rs
Lines 660 to 662 in 818ce3f
While @jayzhan211 's heroic efforts has this passing in all the DataFusion tests, it turned out this check failed on many downstream implementations:
Downstream in InfluxDB 3.0 we turned the check into a warning in our fork to unblock our upgrade
We even made a patch release to try and get the delta-rs upgrade working:
But it is still failing when I write this (see delta-io/delta-rs#2886 (comment))
Describe the solution you'd like
Note there is at least one open outstanding bug: #13010
I would like some way to disable this check to unblock upgrades in downstream crates.
Describe alternatives you've considered
I propose we add a new config value that lets downstream crates opt in / out of this check, similarly to
datafusion.optimizer.skip_failed_rules
(see Config Docs)Something like:
datafusion.execution.validate_schema
: If true, theDefaultPhysicalPlanner
will error if the input plan's schema does not exactly match the output plan.Additional context
No response
The text was updated successfully, but these errors were encountered: