You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Two dictionary array columns can not be compared with each other (comparisons to a constant do work, however)
To Reproduce
#[tokio::test]asyncfn query_on_string_dictionary() -> Result<()>{// Test to ensure DataFusion can operate on dictionary types// Use StringDictionary (32 bit indexes = keys)let d1:DictionaryArray<Int32Type> = vec![Some("one"),None,Some("three")].into_iter().collect();let d2:DictionaryArray<Int32Type> = vec![Some("blarg"),None,Some("three")].into_iter().collect();let d3:StringArray = vec![Some("XYZ"),None,Some("three")].into_iter().collect();let batch =
RecordBatch::try_from_iter(vec![("d1",Arc::new(d1)asArrayRef),("d2",Arc::new(d2)asArrayRef),("d3",Arc::new(d3)asArrayRef),]).unwrap();let table = MemTable::try_new(batch.schema(),vec![vec![batch]])?;letmut ctx = ExecutionContext::new();
ctx.register_table("test",Arc::new(table))?;// comparison with another dictionary columnlet sql = "SELECT d1 FROM test WHERE d1 = d2";let actual = execute_to_batches(&mut ctx, sql).await;let expected = vec!["+-------+","| d1 |","+-------+","| three |","+-------+",];assert_batches_eq!(expected,&actual);// comparison with a non dictionary columnlet sql = "SELECT d1 FROM test WHERE d1 = d3";let actual = execute_to_batches(&mut ctx, sql).await;let expected = vec!["+-------+","| d1 |","+-------+","| three |","+-------+",];assert_batches_eq!(expected,&actual);
The core issue here is that the arrow comparison kernels don't support DictionaryArray comparisons. Likewise, the datafusion dispatch code doesn't support DictionaryArray comparisons either.
The expedient thing to do here is to make datafusion cast DictionaryArray to a StringArray prior to comparison
The longer term thing to do is to properly support DictionaryArray comparisons in the arrow comparison eq_dyn and friends, kernels, as @matthewmturner has done for eq_dyn_scalar et. and teach datafusion how to use them.
Describe the bug
Two dictionary array columns can not be compared with each other (comparisons to a constant do work, however)
To Reproduce
test fails with
Expected behavior
Expected result: test passes
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: