-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): Add support for arrow binary type #4935
Conversation
@ritchie46 tested this using code I already had and it works for querying parquet files, filtering reading data etc. Just can't get the tests to compile. Also probably need to add some tests, can you point where I should add tests? |
@ozgrakkurt thanks for the initial push on this. Can you feature gate this functionality? I only want this compiled if Tests can be placed under |
@ritchie46 could you do the feature gating if it is possible? Not sure how exactly to do it |
@ritchie46 can you check again? |
also closes #4877 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this @ozgrakkurt. I know this is a lot of work. I have left a few remarks on places where I think we can remove the code and let the Series
simply return null
. This often does this already automatically, so we can probably just remove the code.
I am only not sure we should implement Ord
for binary? E.g. min
, max
, sort
? Are byte slices Ord
in rust? If so, we can follow.
@@ -804,6 +815,21 @@ impl QuantileAggSeries for Utf8Chunked { | |||
} | |||
} | |||
|
|||
#[cfg(feature = "dtype-binary")] | |||
impl QuantileAggSeries for BinaryChunked { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be removed. The Series
can just return null
for the Binary dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this compilation error when I remove this:
error[E0277]: the trait bound `datatypes::BinaryType: datatypes::PolarsIntegerType` is not satisfied
--> polars/polars-core/src/series/implementations/binary.rs:326:45
|
326 | QuantileAggSeries::median_as_series(&self.0)
| ----------------------------------- ^^^^^^^ the trait `datatypes::PolarsIntegerType` is not implemented for `datatypes::BinaryType`
| |
| required by a bound introduced by this call
|
= help: the following other types implement trait `datatypes::PolarsIntegerType`:
datatypes::Int16Type
datatypes::Int32Type
datatypes::Int64Type
datatypes::Int8Type
datatypes::UInt16Type
datatypes::UInt32Type
datatypes::UInt64Type
datatypes::UInt8Type
note: required for `chunked_array::ChunkedArray<datatypes::BinaryType>` to implement `chunked_array::ops::aggregate::QuantileAggSeries`
--> polars/polars-core/src/chunked_array/ops/aggregate.rs:715:9
|
715 | impl<T> QuantileAggSeries for ChunkedArray<T>
| ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
error[E0277]: the trait bound `datatypes::BinaryType: datatypes::PolarsIntegerType` is not satisfied
--> polars/polars-core/src/series/implementations/binary.rs:339:47
|
339 | QuantileAggSeries::quantile_as_series(&self.0, quantile, interpol)
| ------------------------------------- ^^^^^^^ the trait `datatypes::PolarsIntegerType` is not implemented for `datatypes::BinaryType`
| |
| required by a bound introduced by this call
|
= help: the following other types implement trait `datatypes::PolarsIntegerType`:
datatypes::Int16Type
datatypes::Int32Type
datatypes::Int64Type
datatypes::Int8Type
datatypes::UInt16Type
datatypes::UInt32Type
datatypes::UInt64Type
datatypes::UInt8Type
note: required for `chunked_array::ChunkedArray<datatypes::BinaryType>` to implement `chunked_array::ops::aggregate::QuantileAggSeries`
--> polars/polars-core/src/chunked_array/ops/aggregate.rs:715:9
|
715 | impl<T> QuantileAggSeries for ChunkedArray<T>
| ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
@@ -692,6 +692,17 @@ impl VarAggSeries for Utf8Chunked { | |||
} | |||
} | |||
|
|||
#[cfg(feature = "dtype-binary")] | |||
impl VarAggSeries for BinaryChunked { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be removed. The Series
can just return null
for the Binary dtype
@@ -849,6 +875,31 @@ impl ChunkAggSeries for Utf8Chunked { | |||
} | |||
} | |||
|
|||
#[cfg(feature = "dtype-binary")] | |||
impl ChunkAggSeries for BinaryChunked { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be removed. The Series
can just return null
for the Binary dtype
@@ -508,6 +508,137 @@ impl ChunkSort<Utf8Type> for Utf8Chunked { | |||
} | |||
} | |||
|
|||
#[cfg(feature = "dtype-binary")] | |||
impl ChunkSort<BinaryType> for BinaryChunked { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should binary implement sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes imo, since string has it. Sorting binary is useful as well
Yes if T is Ord any slice of T is ord as well I think. Those operations are useful in binary as well |
Working on it! |
Something went wrong. Reopend on #5122 |
closes #4903
@ritchie46 can you check if this makes sense when you have time?
I redirected arrow dep to my fork temporarily