You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, this fix will not catch mistakes like reordered columns. For example, if table A has columns a, b and table B has columns b, a, then DataFusion will happily compute the union, with the wrong values in the wrong columns.
So why not just compare the entire schema? Or at least the column names and types (i.e. ignoring metadata)? The docs explicitly say that the schemas must be equal.
To Reproduce
#[tokio::test]asyncfntest_union(){usecrate::data_frame;use datafusion::assert_batches_sorted_eq;use datafusion::common::arrow::array::{ArrayRef,StringArray};use datafusion::common::arrow::record_batch::RecordBatch;use std::sync::Arc;let ctx = SessionContext::new();let a = ctx
.read_batch(RecordBatch::try_from_iter([("a",Arc::new(StringArray::from(vec!["a"]))asArrayRef),("b",Arc::new(StringArray::from(vec!["b"]))asArrayRef),]).unwrap(),).unwrap();let b = ctx
.read_batch(RecordBatch::try_from_iter([("b",Arc::new(StringArray::from(vec!["b"]))asArrayRef),("a",Arc::new(StringArray::from(vec!["a"]))asArrayRef),]).unwrap(),).unwrap();let union = a.union(b).unwrap();assert_batches_sorted_eq!(["+---+---+","| a | b |","+---+---+","| a | b |","| a | b |","+---+---+",],&union.collect().await.unwrap());}
Expected behavior
Test passes.
Actual behavior
assertion `left == right` failed:
expected:
[
"+---+---+",
"| a | b |",
"+---+---+",
"| a | b |",
"| a | b |",
"+---+---+",
]
actual:
[
"+---+---+",
"| a | b |",
"+---+---+",
"| a | b |",
"| b | a |",
"+---+---+",
]
The text was updated successfully, but these errors were encountered:
I think this is actually working correctly and the docs might need to be updated. I do not think union requires the fields to be the same nor have the same names. It just requires the same number of fields and each corresponding field to be coercible to a common type.
create table t1 (a, b) AS
VALUES
('a'::varchar, 'b'::varchar),
('c', 'd') ;
create table t2 (c, d) AS select b, a from t1;
select a, b from t1;
select c, d from t2;
select a, b from t1 union select c, d from t2;
output:
SELECT 2
SELECT 2
a | b
---+---
a | b
c | d
(2 rows)
c | d
---+---
b | a
d | c
(2 rows)
a | b
---+---
d | c
a | b
b | a
c | d
Describe the bug
Using
datafusion
version 42.2.0.Follow up to #13092, which was fixed by #13117 thanks to @Omega359.
However, this fix will not catch mistakes like reordered columns. For example, if table A has columns
a
,b
and table B has columnsb
,a
, then DataFusion will happily compute the union, with the wrong values in the wrong columns.So why not just compare the entire schema? Or at least the column names and types (i.e. ignoring metadata)? The docs explicitly say that the schemas must be equal.
To Reproduce
Expected behavior
Test passes.
Actual behavior
The text was updated successfully, but these errors were encountered: