Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): Add support for arrow binary type #4935

Closed
wants to merge 0 commits into from
Closed

feat(rust): Add support for arrow binary type #4935

wants to merge 0 commits into from

Conversation

ozgrakkurt
Copy link
Contributor

closes #4903

@ritchie46 can you check if this makes sense when you have time?

I redirected arrow dep to my fork temporarily

@github-actions github-actions bot added the rust Related to Rust Polars label Sep 22, 2022
@ozgrakkurt
Copy link
Contributor Author

@ritchie46 tested this using code I already had and it works for querying parquet files, filtering reading data etc. Just can't get the tests to compile. Also probably need to add some tests, can you point where I should add tests?

@ozgrakkurt ozgrakkurt marked this pull request as ready for review September 23, 2022 16:26
@ritchie46
Copy link
Member

@ozgrakkurt thanks for the initial push on this.

Can you feature gate this functionality? I only want this compiled if dtype-binary is activated.

Tests can be placed under polars/tests/it/.

@ozgrakkurt
Copy link
Contributor Author

ozgrakkurt commented Sep 26, 2022

@ritchie46 could you do the feature gating if it is possible? Not sure how exactly to do it

@github-actions github-actions bot added the python Related to Python Polars label Sep 29, 2022
@ozgrakkurt
Copy link
Contributor Author

@ritchie46 can you check again?

@ozgrakkurt
Copy link
Contributor Author

also closes #4877

@stinodego stinodego changed the title feat: add support for arrow binary type feat(rust): add support for arrow binary type Oct 1, 2022
@stinodego stinodego changed the title feat(rust): add support for arrow binary type feat(rust): Add support for arrow binary type Oct 1, 2022
@github-actions github-actions bot added the enhancement New feature or an improvement of an existing feature label Oct 1, 2022
@stinodego stinodego removed the python Related to Python Polars label Oct 1, 2022
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @ozgrakkurt. I know this is a lot of work. I have left a few remarks on places where I think we can remove the code and let the Series simply return null. This often does this already automatically, so we can probably just remove the code.

I am only not sure we should implement Ord for binary? E.g. min, max, sort? Are byte slices Ord in rust? If so, we can follow.

@@ -804,6 +815,21 @@ impl QuantileAggSeries for Utf8Chunked {
}
}

#[cfg(feature = "dtype-binary")]
impl QuantileAggSeries for BinaryChunked {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be removed. The Series can just return null for the Binary dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this compilation error when I remove this:

error[E0277]: the trait bound `datatypes::BinaryType: datatypes::PolarsIntegerType` is not satisfied
   --> polars/polars-core/src/series/implementations/binary.rs:326:45
    |
326 |         QuantileAggSeries::median_as_series(&self.0)
    |         ----------------------------------- ^^^^^^^ the trait `datatypes::PolarsIntegerType` is not implemented for `datatypes::BinaryType`
    |         |
    |         required by a bound introduced by this call
    |
    = help: the following other types implement trait `datatypes::PolarsIntegerType`:
              datatypes::Int16Type
              datatypes::Int32Type
              datatypes::Int64Type
              datatypes::Int8Type
              datatypes::UInt16Type
              datatypes::UInt32Type
              datatypes::UInt64Type
              datatypes::UInt8Type
note: required for `chunked_array::ChunkedArray<datatypes::BinaryType>` to implement `chunked_array::ops::aggregate::QuantileAggSeries`
   --> polars/polars-core/src/chunked_array/ops/aggregate.rs:715:9
    |
715 | impl<T> QuantileAggSeries for ChunkedArray<T>
    |         ^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^

error[E0277]: the trait bound `datatypes::BinaryType: datatypes::PolarsIntegerType` is not satisfied
   --> polars/polars-core/src/series/implementations/binary.rs:339:47
    |
339 |         QuantileAggSeries::quantile_as_series(&self.0, quantile, interpol)
    |         ------------------------------------- ^^^^^^^ the trait `datatypes::PolarsIntegerType` is not implemented for `datatypes::BinaryType`
    |         |
    |         required by a bound introduced by this call
    |
    = help: the following other types implement trait `datatypes::PolarsIntegerType`:
              datatypes::Int16Type
              datatypes::Int32Type
              datatypes::Int64Type
              datatypes::Int8Type
              datatypes::UInt16Type
              datatypes::UInt32Type
              datatypes::UInt64Type
              datatypes::UInt8Type
note: required for `chunked_array::ChunkedArray<datatypes::BinaryType>` to implement `chunked_array::ops::aggregate::QuantileAggSeries`
   --> polars/polars-core/src/chunked_array/ops/aggregate.rs:715:9
    |
715 | impl<T> QuantileAggSeries for ChunkedArray<T>
    |         ^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^


@@ -692,6 +692,17 @@ impl VarAggSeries for Utf8Chunked {
}
}

#[cfg(feature = "dtype-binary")]
impl VarAggSeries for BinaryChunked {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be removed. The Series can just return null for the Binary dtype

@@ -849,6 +875,31 @@ impl ChunkAggSeries for Utf8Chunked {
}
}

#[cfg(feature = "dtype-binary")]
impl ChunkAggSeries for BinaryChunked {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be removed. The Series can just return null for the Binary dtype

@@ -508,6 +508,137 @@ impl ChunkSort<Utf8Type> for Utf8Chunked {
}
}

#[cfg(feature = "dtype-binary")]
impl ChunkSort<BinaryType> for BinaryChunked {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should binary implement sort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes imo, since string has it. Sorting binary is useful as well

@ozgrakkurt
Copy link
Contributor Author

Thanks a lot for this @ozgrakkurt. I know this is a lot of work. I have left a few remarks on places where I think we can remove the code and let the Series simply return null. This often does this already automatically, so we can probably just remove the code.

I am only not sure we should implement Ord for binary? E.g. min, max, sort? Are byte slices Ord in rust? If so, we can follow.

Yes if T is Ord any slice of T is ord as well I think. Those operations are useful in binary as well

@ozgrakkurt ozgrakkurt requested a review from ritchie46 October 3, 2022 13:22
@ritchie46
Copy link
Member

Working on it!

@ritchie46 ritchie46 closed this Oct 6, 2022
@ritchie46
Copy link
Member

Something went wrong. Reopend on #5122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Arrow Binary data type
3 participants