Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added composite identifiers to get field of struct. #628

Closed
wants to merge 2 commits into from

Conversation

jorgecarleitao
Copy link
Member

Closes #603

Note that this only works with full identifiers, i.e. SELECT test.c1.field FROM test.

The DataFrame API is named Expr::get_field, just to make it explicit. We could add col("c1")["field"] or something.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Jun 26, 2021
@jorgecarleitao jorgecarleitao changed the title Added support composite identifiers for struct type. Added composite identifiers to get field of struct. Jun 26, 2021
@jorgecarleitao
Copy link
Member Author

This does not work for scalar values yet, #602 tracks that.

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad you got this implemented before I started working on it. The design I was going to try was much less elegant than yours :)

fn plan_compound(mut identifiers: Vec<String>) -> Expr {
if &identifiers[0][0..1] == "@" {
Expr::ScalarVariable(identifiers)
} else if identifiers.len() == 2 {
Copy link
Member

@houqp houqp Jun 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth considering a follow up PR to handle an edge case where user tries to access nested field with unqualified column like column.field1.field2. for compound identifiers, we should probably check the first identifier to see if it's a valid relation name.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jorgecarleitao -- this is really cool

.as_any()
.downcast_ref::<StructArray>()
.unwrap()
.column_by_name(&self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool -- I was wondering if column_by_name returning None could be a legit error. For example what happens when trying to select a non existent field? it seems like the calls to get_field will have failed earlier with a real error so it is probably fine to treat this as infallible.

let sql = "SELECT test.c1.inner FROM test";
let actual = execute(&mut ctx, sql).await;
let expected = vec![vec!["1.1"], vec!["NULL"]];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to add a negative test here too (e.g. SELECT test.c1.nonexistent FROM test) to ensure we get an error (and not a panic!)

@@ -1201,7 +1201,7 @@ impl TryInto<protobuf::LogicalExprNode> for &Expr {
Expr::Wildcard => Ok(protobuf::LogicalExprNode {
expr_type: Some(protobuf::logical_expr_node::ExprType::Wildcard(true)),
}),
Expr::TryCast { .. } => unimplemented!(),
_ => unimplemented!(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andygrove , this is not ideal; I have to admit I am not very familiar with proto and this part yet. Could be relevant to add a small guide in the readme like "how to add a new logical node to ballista"? It could reduce the barrier?

@alamb
Copy link
Contributor

alamb commented Jul 5, 2021

@jorgecarleitao shall we merge this? Do you need some help wrapping up the additional tests?

@houqp
Copy link
Member

houqp commented Jul 20, 2021

@jorgecarleitao i am available to help finishing up whatever is blocking this PR from getting merged.

@alamb alamb added the stale-pr label Aug 17, 2021
@alamb
Copy link
Contributor

alamb commented Aug 20, 2021

Closing stale PRs to keep the PR list manageable. Please reopen if this is a mistake (it seems like maybe someone else could pick up this work and drive it forward when needed).

I may do so if/when we find calculating some of our timeseries analytic functions in IOx are taking a long time, but we haven't seen that yet

@alamb alamb closed this Aug 20, 2021
@alamb alamb mentioned this pull request Aug 25, 2021
Igosuki added a commit to Igosuki/arrow-datafusion that referenced this pull request Aug 26, 2021
Igosuki added a commit to Igosuki/arrow-datafusion that referenced this pull request Aug 30, 2021
Igosuki added a commit to Igosuki/arrow-datafusion that referenced this pull request Sep 1, 2021
Igosuki added a commit to Igosuki/arrow-datafusion that referenced this pull request Sep 10, 2021
Igosuki added a commit to Igosuki/arrow-datafusion that referenced this pull request Sep 12, 2021
Igosuki added a commit to Igosuki/arrow-datafusion that referenced this pull request Sep 14, 2021
alamb pushed a commit that referenced this pull request Sep 15, 2021
* Add avro as a datasource, file and table provider

* wip

* Added support composite identifiers for struct type.

* Fixed build.

* cheat and add unions to valid composite column types

* Implement the AvroArrayReader

* Add binary types

* Enable Avro as a FileType

* Enable registering an avro table in the sql parsing

* Change package name for datafusion/avro

* Implement Avro datasource tests and fix avro_rs::Value resolution to Arrow types

* Test for AvroExec::try_from_path

* external table avro test

* Basic schema conversion tests

* Complete test for avro_to_arrow_reader on alltypes_dictionnary

* fix_stable: .rewind is 'unstable'

* Fix license files and remove the unused avro-converter crate

* fix example test in avro_to_arrow

* add avro_sql test to default workflow

* Adress clippies

* Enable avro as a valid datasource for client execution

* Add avro to available logical plan nodes

* Add ToTimestampMillis as a scalar function in protos

* Allow Avro in PhysicalPlan nodes

* Remove remaining confusing references to 'json' in avro mod

* rename 'parquet' words in avro test and examples

* Handle Union of nested lists in arrow reader

* test timestamp arrays

* remove debug statement

* Make avro optional

* Remove debug statement

* Remove GetField usage (see #628)

* Fix docstring in parser tests

* Test batch output rather than just rows individually

* Remove 'csv' from error strings in physical_plan::avro

* Avro sample sql and explain queries tests in sql.rs

* Activate avro feature for cargo tests in github workflow

* Add a test for avro registering multiple files in a single table

* Switch to Result instead of Option for resolve_string

* Address missing clippy warning should_implement_trait in arrow_to_avro/reader

* Add fmt display implementation for AvroExec

* ci: fix cargo sql run example, use datafusion/avro feature instead of 'avro'

* license: missing license file for avro_to_arrow/schema.rs

* only run avro datasource tests if features have 'avro'

* refactor: rename infer_avro_schema_from_reader to read_avro_schema_from_reader

* Pass None as props to avro schema schema_to_field_with_props until further notice

* Change schema inferance to FixedSizeBinary(16) for Uuid

* schema: prefix metadata coming from avro with 'avro'

* make num traits optional and part of the avro feature flag

* Fix avro schema tests regarding external props

* split avro physical plan test feature wise and add a non-implemented test

* submodule: switch back to apache/arrow-testing

* fix_test: columns are now prefixed in the plan

* avro_test: fix clippy warning cmp-owned

* avro: move statistics to the physical plan

* Increase min stack size for cargo tests

Co-authored-by: Jorge C. Leitao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for getter for StructArray
4 participants