-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support casting from integer to binary #5015
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is perhaps worth noting that ToByteSlice returns the native endian encoding, but frankly arrow-rs probably has all sorts of problems on big endian architectures so I'm inclined to not hold this back on this
arrow-cast/src/cast.rs
Outdated
) -> Result<ArrayRef, ArrowError> { | ||
let array = array.as_primitive::<FROM>(); | ||
|
||
let mut builder = GenericBinaryBuilder::<O>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could possibly initialize the correct capacity here
arrow-cast/src/cast.rs
Outdated
for i in 0..array.len() { | ||
if array.is_null(i) { | ||
builder.append_null(); | ||
} else { | ||
builder.append_value(array.value(i).to_byte_slice()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in 0..array.len() { | |
if array.is_null(i) { | |
builder.append_null(); | |
} else { | |
builder.append_value(array.value(i).to_byte_slice()); | |
} | |
} | |
for v in array { | |
builder.append_option(v); | |
} |
Perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, v
is not AsRef<[u8]>
though.
2352 | builder.append_option(v);
| ------------- ^ the trait `AsRef<[u8]>` is not implemented for `<FROM as arrow_array::ArrowPrimitiveType>::Native`
| |
| required by a bound introduced by this call
But I also cannot do builder.append_option(v.map(|v| v.to_byte_slice()));
too as it complains returns a value referencing data owned by the current function
.
I keep it unchanged so.
Yea, I noticed native endian encoding is used when looking into |
arrow-cast/src/cast.rs
Outdated
let array = array.as_primitive::<FROM>(); | ||
|
||
let mut builder = | ||
GenericBinaryBuilder::<O>::with_capacity(array.len(), array.values().inner().capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually made me wonder if we could not do something even simpler like, i.e. just reuse the buffer
let array = array.as_primitive::<FROM>();
let size = std::mem::size_of::<FROM::Native>().usize_as();
let offsets = OffsetBuffer::from_lengths(std::iter::repeat(size).take(array.len());
Ok(GenericBinaryArray::<O>::new(offsets, array.values().inner().clone(), array.nulls().cloned()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, my concern is that other type casting doesn't reuse buffer (or do we?), users expect that cast kernel returns an array with new buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reuse the buffer when converting between StringArray and BinaryArray, and possibly some other transformations. I'm not sure why you wouldn't want to do so if you're able to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already did it, it is okay. Let me update this.
@@ -2317,6 +2341,19 @@ fn value_to_string<O: OffsetSizeTrait>( | |||
Ok(Arc::new(builder.finish())) | |||
} | |||
|
|||
fn cast_numeric_to_binary<FROM: ArrowPrimitiveType, O: OffsetSizeTrait>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, actually I also want to support non native endian encoding configured from cast options. But I will put it into follow ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be better as a custom kernel then? Perhaps kernels mirroring https://doc.rust-lang.org/std/primitive.i32.html#method.to_be and https://doc.rust-lang.org/std/primitive.i32.html#method.to_le ?
I think it would be good to avoid adding data type specific options to CastOptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to have a kernel doing to_be
/to_le
? It sounds good.
Thank you @tustvold |
Which issue does this PR close?
Closes #5014.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?