Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concating dictionary array leads to duplicated dict values. #3837

Closed
eddyxu opened this issue Mar 10, 2023 · 1 comment
Closed

Concating dictionary array leads to duplicated dict values. #3837

eddyxu opened this issue Mar 10, 2023 · 1 comment
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog

Comments

@eddyxu
Copy link
Member

eddyxu commented Mar 10, 2023

Describe the bug

I was trying to concatenate a few DictionaryArrays using arrow_select::concat::concat. While the each Dictionary Array shares the same value strings. The resulted array however have all values from each DictionaryArray copied (duplicated) in the final output.

To Reproduce

let arrs: Vec<Arc<DictionaryArray<Int32Type>>> = (0..10).map(|v| {
          let mut dict_builder = StringDictionaryBuilder::<Int32Type>::new();
          dict_builder.append_null();
          dict_builder.append("a").unwrap();
          dict_builder.append("b").unwrap();
          dict_builder.append("c").unwrap();
          Arc::new(dict_builder.finish())
    }).collect();
    let mut arrays: Vec<&dyn Array> = vec![];
    for b in arrs.iter() {
        arrays.push(b.as_ref());
    }
    let b = concat(arrays.as_slice()).unwrap();
    println!("Batch is: {:?}", b);

Output

Batch is: DictionaryArray {keys: PrimitiveArray<Int32>
[
  null,
  0,
  1,
  2,
  null,
  3,
  4,
  5,
] values: StringArray
[
  "a",
  "b",
  "c",
  "a",
  "b",
  "c",
]}

Expected behavior

I'd expect the output will be

DictionaryArray {keys: PrimitiveArray<Int32>
[
  null,
  0,
  1,
  2,
  null,
  0,
  1,
   2
] values: StringArray
[
  "a",
  "b",
  "c",
]}

Additional context

N/A

@tustvold
Copy link
Contributor

This is intentional, #506 tracks doing something different here

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog development-process Related to development process of arrow-rs and removed bug labels Mar 10, 2023
@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants