-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in projection: "column types must match schema types, expected XXX but found YYY" #1448
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
//! projection expressions. `SELECT` without `FROM` will only evaluate expressions. | ||
|
||
use std::any::Any; | ||
use std::collections::BTreeMap; | ||
use std::pin::Pin; | ||
use std::sync::Arc; | ||
use std::task::{Context, Poll}; | ||
|
@@ -63,13 +64,15 @@ impl ProjectionExec { | |
|
||
let fields: Result<Vec<Field>> = expr | ||
.iter() | ||
.map(|(e, name)| match input_schema.field_with_name(name) { | ||
Ok(f) => Ok(f.clone()), | ||
Err(_) => { | ||
let dt = e.data_type(&input_schema)?; | ||
let nullable = e.nullable(&input_schema)?; | ||
Ok(Field::new(name, dt, nullable)) | ||
} | ||
.map(|(e, name)| { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not convinced it is a great idea to copy field metadata from the input to the output based on the field name alone... As that would mean metadata on a field named Thus I There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was the question I had asked previously about when projections created new columns, basically, it doesn't seem to make sense to just blindly carry over metadata because it's about a column that may no longer exist. |
||
let mut field = Field::new( | ||
name, | ||
e.data_type(&input_schema)?, | ||
e.nullable(&input_schema)?, | ||
); | ||
field.set_metadata(get_field_metadata(e, &input_schema)); | ||
|
||
Ok(field) | ||
}) | ||
.collect(); | ||
|
||
|
@@ -179,6 +182,24 @@ impl ExecutionPlan for ProjectionExec { | |
} | ||
} | ||
|
||
/// If e is a direct column reference, returns the field level | ||
/// metadata for that field, if any. Otherwise returns None | ||
fn get_field_metadata( | ||
e: &Arc<dyn PhysicalExpr>, | ||
input_schema: &Schema, | ||
) -> Option<BTreeMap<String, String>> { | ||
let name = if let Some(column) = e.as_any().downcast_ref::<Column>() { | ||
column.name() | ||
} else { | ||
return None; | ||
}; | ||
|
||
input_schema | ||
.field_with_name(name) | ||
.ok() | ||
.and_then(|f| f.metadata().as_ref().cloned()) | ||
} | ||
|
||
fn stats_projection( | ||
stats: Statistics, | ||
exprs: impl Iterator<Item = Arc<dyn PhysicalExpr>>, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed this in my review -- basically the projection's output is not the same for its input even if the field name matches (due to aliases). The actual output type needs to be calculated from the expr for all cases.