-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add substrait support for Interval types and literals #10646
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
//! - Default type reference is 0. It is used when the actual type is the same with the original type. | ||
//! - Extended variant type references start from 1, and ususlly increase by 1. | ||
|
||
// For type variations | ||
pub const DEFAULT_TYPE_REF: u32 = 0; | ||
pub const UNSIGNED_INTEGER_TYPE_REF: u32 = 1; | ||
pub const TIMESTAMP_SECOND_TYPE_REF: u32 = 0; | ||
|
@@ -37,3 +38,58 @@ pub const DEFAULT_CONTAINER_TYPE_REF: u32 = 0; | |
pub const LARGE_CONTAINER_TYPE_REF: u32 = 1; | ||
pub const DECIMAL_128_TYPE_REF: u32 = 0; | ||
pub const DECIMAL_256_TYPE_REF: u32 = 1; | ||
|
||
// For custom types | ||
/// For [`DataType::Interval`] with [`IntervalUnit::YearMonth`]. | ||
/// | ||
/// An `i32` for elapsed whole months. See also [`ScalarValue::IntervalYearMonth`] | ||
/// for the literal definition in DataFusion. | ||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::YearMonth`]: datafusion::arrow::datatypes::IntervalUnit::YearMonth | ||
/// [`ScalarValue::IntervalYearMonth`]: datafusion::common::ScalarValue::IntervalYearMonth | ||
pub const INTERVAL_YEAR_MONTH_TYPE_REF: u32 = 1; | ||
|
||
/// For [`DataType::Interval`] with [`IntervalUnit::DayTime`]. | ||
/// | ||
/// An `i64` as: | ||
/// - days: `i32` | ||
/// - milliseconds: `i32` | ||
/// | ||
/// See also [`ScalarValue::IntervalDayTime`] for the literal definition in DataFusion. | ||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::DayTime`]: datafusion::arrow::datatypes::IntervalUnit::DayTime | ||
/// [`ScalarValue::IntervalDayTime`]: datafusion::common::ScalarValue::IntervalDayTime | ||
pub const INTERVAL_DAY_TIME_TYPE_REF: u32 = 2; | ||
|
||
/// For [`DataType::Interval`] with [`IntervalUnit::MonthDayNano`]. | ||
/// | ||
/// An `i128` as: | ||
/// - months: `i32` | ||
/// - days: `i32` | ||
/// - nanoseconds: `i64` | ||
/// | ||
/// See also [`ScalarValue::IntervalMonthDayNano`] for the literal definition in DataFusion. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI the interval implementation is changing in the next arrow I think: apache/arrow-rs#5769 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for informing, I'd like to help migrate to the new arrow version. BTW, is there any plan for next bump? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think ETA is next week sometime apache/arrow-rs#5688 Maybe you can make a "pre-release" of DataFusion against the un-released version of arrow-rs (which @tustvold often does to make sure as a way to sanity check the release) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good 👍 The "pre-update" is #10765 (still WIP |
||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::MonthDayNano`]: datafusion::arrow::datatypes::IntervalUnit::MonthDayNano | ||
/// [`ScalarValue::IntervalMonthDayNano`]: datafusion::common::ScalarValue::IntervalMonthDayNano | ||
pub const INTERVAL_MONTH_DAY_NANO_TYPE_REF: u32 = 3; | ||
|
||
// For User Defined URLs | ||
/// For [`DataType::Interval`] with [`IntervalUnit::YearMonth`]. | ||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::YearMonth`]: datafusion::arrow::datatypes::IntervalUnit::YearMonth | ||
pub const INTERVAL_YEAR_MONTH_TYPE_URL: &str = "interval-year-month"; | ||
/// For [`DataType::Interval`] with [`IntervalUnit::DayTime`]. | ||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::DayTime`]: datafusion::arrow::datatypes::IntervalUnit::DayTime | ||
pub const INTERVAL_DAY_TIME_TYPE_URL: &str = "interval-day-time"; | ||
/// For [`DataType::Interval`] with [`IntervalUnit::MonthDayNano`]. | ||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::MonthDayNano`]: datafusion::arrow::datatypes::IntervalUnit::MonthDayNano | ||
pub const INTERVAL_MONTH_DAY_NANO_TYPE_URL: &str = "interval-month-day-nano"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be totally wrong, but are you sure this is how
type_reference
is supposed to work? I'd kind of expect the type_reference to map to an extension / MappingType::ExtensionType, kinda like function_reference.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not at all sure how this is supposed to work -- when I was reviewing this PR I think I concluded it followed the existing patterns.
If you have additional information / knowledge that would help improve things I think we would welcome that help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two "references":
https://github.com/apache/datafusion/pull/10646/files#diff-d1c5f4c37ac8286d2045acb61bee17382179469557132eb02844413b260ae41bR1440-R1441
To my understanding, type variation (like these) are for different types from one base type. And type references (like these) are for different base types, those user-defined types.
I'll submit a patch to add document for the usage tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, actually I think both references (type_variation_reference and type_reference) have the same problem - I hadn't realized it affects the type_variation_reference too. Now, if each system defines the type/variation references as consts, then plans will look compatible, but may have different meaning (nothing tells another Substrait producer/consumer that type_variation_reference = 1 on a list means a large list, for example).
I believe both should be defined as SimpleExtension files, following the schema (like here https://github.com/apache/arrow/blob/main/format/substrait/extension_types.yaml) rather than as constants (kinda what
datafusion/datafusion/substrait/src/variation_const.rs
Line 21 in 0905426
That way one can (in theory, at least) teach another Substrait producer/consumer about the DataFusion extensions and keep plans compatible, or at least the systems will recognize that the plans are incompatible as it refers to extension URIs that the other system doesn't know about.
Does that make sense? I'm not 100% sure of anything I'm saying here, so I may as be understanding something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. The last (I'm aware of, at least) big piece we are missing in substrait is those *.yaml spec for all extended things and related URL settings. At present, all the things are defined in the document.
From the substrait website, we'll need a yaml parsing component to support extensions from other systems as well, if we are going to implement the ability to consume plans from external systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to file a tracking issue for tasks related to substrait support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated at #8149
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool, so it was the plan all along :) Sg!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document PR: #10719