-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added composite identifiers to get field of struct. #628
Conversation
This does not work for scalar values yet, #602 tracks that. |
1a8203b
to
181a510
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am glad you got this implemented before I started working on it. The design I was going to try was much less elegant than yours :)
fn plan_compound(mut identifiers: Vec<String>) -> Expr { | ||
if &identifiers[0][0..1] == "@" { | ||
Expr::ScalarVariable(identifiers) | ||
} else if identifiers.len() == 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth considering a follow up PR to handle an edge case where user tries to access nested field with unqualified column like column.field1.field2
. for compound identifiers, we should probably check the first identifier to see if it's a valid relation name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jorgecarleitao -- this is really cool
.as_any() | ||
.downcast_ref::<StructArray>() | ||
.unwrap() | ||
.column_by_name(&self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool -- I was wondering if column_by_name
returning None
could be a legit error. For example what happens when trying to select a non existent field? it seems like the calls to get_field
will have failed earlier with a real error so it is probably fine to treat this as infallible.
let sql = "SELECT test.c1.inner FROM test"; | ||
let actual = execute(&mut ctx, sql).await; | ||
let expected = vec![vec!["1.1"], vec!["NULL"]]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to add a negative test here too (e.g. SELECT test.c1.nonexistent FROM test
) to ensure we get an error (and not a panic!
)
@@ -1201,7 +1201,7 @@ impl TryInto<protobuf::LogicalExprNode> for &Expr { | |||
Expr::Wildcard => Ok(protobuf::LogicalExprNode { | |||
expr_type: Some(protobuf::logical_expr_node::ExprType::Wildcard(true)), | |||
}), | |||
Expr::TryCast { .. } => unimplemented!(), | |||
_ => unimplemented!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove , this is not ideal; I have to admit I am not very familiar with proto and this part yet. Could be relevant to add a small guide in the readme like "how to add a new logical node to ballista"? It could reduce the barrier?
@jorgecarleitao shall we merge this? Do you need some help wrapping up the additional tests? |
@jorgecarleitao i am available to help finishing up whatever is blocking this PR from getting merged. |
Closing stale PRs to keep the PR list manageable. Please reopen if this is a mistake (it seems like maybe someone else could pick up this work and drive it forward when needed). I may do so if/when we find calculating some of our timeseries analytic functions in IOx are taking a long time, but we haven't seen that yet |
* Add avro as a datasource, file and table provider * wip * Added support composite identifiers for struct type. * Fixed build. * cheat and add unions to valid composite column types * Implement the AvroArrayReader * Add binary types * Enable Avro as a FileType * Enable registering an avro table in the sql parsing * Change package name for datafusion/avro * Implement Avro datasource tests and fix avro_rs::Value resolution to Arrow types * Test for AvroExec::try_from_path * external table avro test * Basic schema conversion tests * Complete test for avro_to_arrow_reader on alltypes_dictionnary * fix_stable: .rewind is 'unstable' * Fix license files and remove the unused avro-converter crate * fix example test in avro_to_arrow * add avro_sql test to default workflow * Adress clippies * Enable avro as a valid datasource for client execution * Add avro to available logical plan nodes * Add ToTimestampMillis as a scalar function in protos * Allow Avro in PhysicalPlan nodes * Remove remaining confusing references to 'json' in avro mod * rename 'parquet' words in avro test and examples * Handle Union of nested lists in arrow reader * test timestamp arrays * remove debug statement * Make avro optional * Remove debug statement * Remove GetField usage (see #628) * Fix docstring in parser tests * Test batch output rather than just rows individually * Remove 'csv' from error strings in physical_plan::avro * Avro sample sql and explain queries tests in sql.rs * Activate avro feature for cargo tests in github workflow * Add a test for avro registering multiple files in a single table * Switch to Result instead of Option for resolve_string * Address missing clippy warning should_implement_trait in arrow_to_avro/reader * Add fmt display implementation for AvroExec * ci: fix cargo sql run example, use datafusion/avro feature instead of 'avro' * license: missing license file for avro_to_arrow/schema.rs * only run avro datasource tests if features have 'avro' * refactor: rename infer_avro_schema_from_reader to read_avro_schema_from_reader * Pass None as props to avro schema schema_to_field_with_props until further notice * Change schema inferance to FixedSizeBinary(16) for Uuid * schema: prefix metadata coming from avro with 'avro' * make num traits optional and part of the avro feature flag * Fix avro schema tests regarding external props * split avro physical plan test feature wise and add a non-implemented test * submodule: switch back to apache/arrow-testing * fix_test: columns are now prefixed in the plan * avro_test: fix clippy warning cmp-owned * avro: move statistics to the physical plan * Increase min stack size for cargo tests Co-authored-by: Jorge C. Leitao <[email protected]>
Closes #603
Note that this only works with full identifiers, i.e.
SELECT test.c1.field FROM test
.The DataFrame API is named
Expr::get_field
, just to make it explicit. We could addcol("c1")["field"]
or something.