-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in projection: "column types must match schema types, expected XXX but found YYY" #1448
Conversation
@@ -63,13 +63,22 @@ impl ProjectionExec { | |||
|
|||
let fields: Result<Vec<Field>> = expr | |||
.iter() | |||
.map(|(e, name)| match input_schema.field_with_name(name) { | |||
Ok(f) => Ok(f.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed this in my review -- basically the projection's output is not the same for its input even if the field name matches (due to aliases). The actual output type needs to be calculated from the expr for all cases.
let nullable = e.nullable(&input_schema)?; | ||
Ok(Field::new(name, dt, nullable)) | ||
} | ||
.map(|(e, name)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced it is a great idea to copy field metadata from the input to the output based on the field name alone... As that would mean metadata on a field named a
from the input could end up on the output field a
even if the output was not related at all to a
.
Thus I will make made this code only copy field level metadata for a direct input column reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the question I had asked previously about when projections created new columns, basically, it doesn't seem to make sense to just blindly carry over metadata because it's about a column that may no longer exist.
I like what you did there :) I have a perhaps naive question, shouldn't aliasing a query to an already existing column just fail outright? Wouldn't that reasonably be something of a "which column do you really want here?" confusion? |
@hntd187 -- I agree that aliasing one column to another column is unlikely to be actually useful 😆 it was just the minimal reproducer I could come up with. Another use of Specifically, the plan in the IOx test was:
Which was built programatically, but is approximately what would come out of this query SELECT
cpu,
host,
count(usage_system) as usage_system,
count(usage_user) as usage_user
max(time) as time,
FROM
cpu
WHERE
0 < time
AND time < 2001 In this case, the consumer of the output expect the columns named a certain way, and without the alias |
Ahh this fails because of the count changing the type of usage_system and usage_user correct? |
Yes, I think the type of |
Fix bug in projection: "column types must match schema types, expected XXX but found YYY" (apache#1448) See merge request noah/arrow-datafusion!2
Which issue does this PR close?
Closes #1447
Rationale for this change
There was a bug introduced in #1378
What changes are included in this PR?
Do not copy the field from the input schema, but explicitly copy the metadata
Are there any user-facing changes?
Queries that should work will once again do so
cc @hntd187