You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This may be expected behavior so feel free to close if this is intended. However, it is (potentially) different from postgres behavior and I figured I would mention it. The reproducer can probably explain the issue better than I can.
I'm able to work around the issue by renaming all fields on one of the inputs with a prefix but I didn't have to do this before and so I figured I'd report it and make sure the change is intentional.
To Reproduce
use arrow::array::{ArrayRef, Int32Array, RecordBatch};
use datafusion::prelude::*;
use std::sync::Arc;
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
let id: ArrayRef = Arc::new(Int32Array::from(vec![0, 1, 2]));
let value: ArrayRef = Arc::new(Int32Array::from(vec![0, 1, 2]));
let batch = RecordBatch::try_from_iter(vec![("id", id), ("value", value)]).unwrap();
ctx.register_batch("tes", batch).unwrap();
let id: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
let value: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
let batch = RecordBatch::try_from_iter(vec![("id", id), ("value", value)]).unwrap();
ctx.register_batch("tes2", batch).unwrap();
let tes = ctx.table("tes").await.unwrap();
let tes2 = ctx.table("tes2").await.unwrap();
// This succeeds (the two tables have different names and so the qualified names of the columns differ)
let joined = tes
.clone()
.join(tes2, JoinType::Full, &["id"], &["id"], None)
.unwrap();
joined.show().await.unwrap();
// This fails with the error:
//
// SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "tes" }, name: "id" }, Some(""))
let tes_clone = tes.clone();
let joined = tes
.join(tes_clone, JoinType::Full, &["id"], &["id"], None)
.unwrap();
joined.show().await.unwrap();
}
Expected behavior
I would expect both joins to succeed (they did in version 42).
Additional context
In postgres the closest I get is:
CREATE TABLE tes (id int, val int);
INSERT INTO tes (id, val) VALUES (0, 0), (1, 1), (2, 2);
CREATE TABLE tes2 (id int, val int);
INSERT INTO tes2 (id, val) VALUES (1, 1), (2, 2), (3, 3);
SELECT * FROM tes FULL OUTER JOIN tes2 ON tes.id = tes2.id;
SELECT * FROM tes as t1 FULL OUTER JOIN tes as t2 ON t1.id = t2.id;
It's not exactly the same as I have to alias tes. I will mention that my full motivation here is to support a join we do in lance during a merge_insert. We do a full outer join between the existing data (target table) and the new data (source table). Since these tables have the same schema and they are created with SessionContext::read_table they have the same name.
An alternative (and maybe simpler) fix would be to introduce a SessionContext::read_table_with_alias function which takes in an optional table name.
The text was updated successfully, but these errors were encountered:
This is disallowed by #12608, because its output schema contains duplicate names and can lead to ambiguous references.
Postgres also does not allow self-join unless a different table alias is specified.
psql (16.6 (Ubuntu 16.6-0ubuntu0.24.04.1))
Type "help"for help.
psql=>select* from t1 cross join t1;
ERROR: table name "t1" specified more than once
psql=>select* from t1 cross join t1 t2;
a | b | a | b
---+---+---+---
1 | 1 | 1 | 1
(1 row)
Describe the bug
This may be expected behavior so feel free to close if this is intended. However, it is (potentially) different from postgres behavior and I figured I would mention it. The reproducer can probably explain the issue better than I can.
I'm able to work around the issue by renaming all fields on one of the inputs with a prefix but I didn't have to do this before and so I figured I'd report it and make sure the change is intentional.
To Reproduce
Expected behavior
I would expect both joins to succeed (they did in version 42).
Additional context
In postgres the closest I get is:
It's not exactly the same as I have to alias
tes
. I will mention that my full motivation here is to support a join we do in lance during a merge_insert. We do a full outer join between the existing data (target table) and the new data (source table). Since these tables have the same schema and they are created withSessionContext::read_table
they have the same name.An alternative (and maybe simpler) fix would be to introduce a
SessionContext::read_table_with_alias
function which takes in an optional table name.The text was updated successfully, but these errors were encountered: