Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Prepare APIs for extension types #357

Merged
merged 12 commits into from
Aug 31, 2021
Merged

Prepare APIs for extension types #357

merged 12 commits into from
Aug 31, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 30, 2021

This PR is a refactor to enable (logical) extension types.

It is motivated by the design described by @sundy-li in #350.

backward incompatible changes:

  • all arrays (e.g. Utf8Array) now require DataType as argument of from_data, new_null and new_empty

@jorgecarleitao
Copy link
Owner Author

@sundy-li , I kept the DataType on all arrays for now since it is a non-trivial change to decouple arrays from DataType because the logical information is currently necessary in many compute, io and display kernels.

@codecov
Copy link

codecov bot commented Aug 30, 2021

Codecov Report

Merging #357 (082e615) into main (640ec4a) will increase coverage by 0.35%.
The diff coverage is 78.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #357      +/-   ##
==========================================
+ Coverage   80.88%   81.23%   +0.35%     
==========================================
  Files         326      326              
  Lines       21043    20969      -74     
==========================================
+ Hits        17021    17035      +14     
+ Misses       4022     3934      -88     
Impacted Files Coverage Δ
src/array/boolean/ffi.rs 0.00% <0.00%> (ø)
src/compute/like.rs 0.00% <0.00%> (ø)
src/io/ipc/read/array/list.rs 100.00% <ø> (ø)
src/io/ipc/read/array/union.rs 75.00% <ø> (ø)
src/io/ipc/read/common.rs 92.47% <ø> (ø)
src/io/parquet/read/primitive/dictionary.rs 78.84% <ø> (ø)
src/record_batch.rs 44.44% <0.00%> (ø)
src/compute/filter.rs 75.37% <10.52%> (+12.18%) ⬆️
src/array/fixed_size_list/mod.rs 54.00% <25.00%> (-4.34%) ⬇️
src/array/fixed_size_binary/mod.rs 44.82% <33.33%> (-0.94%) ⬇️
... and 75 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 640ec4a...082e615. Read the comment docs.

@sundy-li
Copy link
Collaborator

Ok, that's reasonable.

@jorgecarleitao
Copy link
Owner Author

@sundy-li @ritchie46 , I found a beautiful representation of the primitive types that simplifies a lot of code (-400 LOC). I pushed to this PR and updated the user guide accordingly.

Something like

enum PrimitiveType {
    Int8,
    Int16,
    Int32,
    Int64,
    Int128,
    ...
    DaysMs,
}

enum PhysicalType {
    /// A Null with no allocation.
    Null,
    /// A boolean represented as a single bit.
    Boolean,
    /// An array where each slot has a known compile-time size.
    Primitive(PrimitiveType),
    ...
}

this allow us to write things like

    match values.data_type().to_physical_type() {
        Primitive(primitive) => with_match_primitive_type!(primitive, |$T| {
            let values = values.as_any().downcast_ref().unwrap();
            Ok(Box::new(primitive::take::<$T, _>(&values, indices)))
        }),
        Utf8 => {
            let values = values.as_any().downcast_ref().unwrap();
            Ok(Box::new(utf8::take::<i32, _>(values, indices)))
        }
        ...

which shaved LOC via macros (and will likely do the same for dependents).

Basically, instead of matching PhysicalType::Int32, match PhysicalType::Primitive(PrimitiveType::Int32).

@sundy-li
Copy link
Collaborator

sundy-li commented Aug 31, 2021

Great, we have a similar macro like with_match_primitive_type in https://github.com/datafuselabs/datafuse/blob/7a3bfc5620e545333ca4e35c71ac849b421b74fc/common/functions/src/aggregates/macros.rs#L15-L28

But your implementation is more graceful because the __with_ty__ is just a private macro.

@jorgecarleitao jorgecarleitao merged commit 145b7c8 into main Aug 31, 2021
@jorgecarleitao jorgecarleitao deleted the prep_struct branch August 31, 2021 16:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants