-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Substrait's IntervalCompound type/literal instead of interval-month-day-nano UDT #12112
Conversation
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
d89da8e
to
66cb65a
Compare
)] | ||
pub const INTERVAL_MONTH_DAY_NANO_TYPE_REF: u32 = 3; | ||
|
||
/// For [`DataType::Interval`] with [`IntervalUnit::MonthDayNano`]. | ||
/// | ||
/// [`DataType::Interval`]: datafusion::arrow::datatypes::DataType::Interval | ||
/// [`IntervalUnit::MonthDayNano`]: datafusion::arrow::datatypes::IntervalUnit::MonthDayNano | ||
#[deprecated( | ||
since = "42.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that should be next version, what will it be? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
43 I believe: #12470
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in d527b4e
@@ -2310,39 +2306,6 @@ mod test { | |||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn custom_type_literal_extensions() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sadly as we no longer produce custom types, there's no easy way to test for them either :/
maybe if we one day implement some UDT support then that can be used for testing here instead. But for now there's no good replacement.
66cb65a
to
f1e5176
Compare
f1e5176
to
e18ec17
Compare
this should be good to review, fyi @alamb @vbarua @westonpace @tokoko (feel free to lmk if you'd prefer not to be tagged :)) |
p | ||
); | ||
} | ||
let nanos = *subseconds * i64::pow(10, (*p - 9) as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't precision handling here and on L2294 for IntervalDayToSecond be similar? Seems like only 0, 3, 6, 9 are supported over there. This one makes more sense to me, just curious why they are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very fair question! I think in general the answer is "yes", but there's a catch - the IntervalDayTime in Arrow only supports millisecond precision, so it cannot be done with a single i64::pow. It could be done with two (making it more complicated), or with a fp64::pow (leading to floating point ops).
But there is an additional question of should we allow the silent precision-losing operations (and the answer is probably no, at least not w/o some user config option) so I think that parts needs a better look anyways and I was thinking of doing it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. I think there are two questions (probably unrelated?) to explore here: 1) precision-losing (for p>3) and 2) whether in-between values are valid (p==1, p==2)
Please do continue to tag me -- thank you very much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @Blizzara
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 comments but nothing blocking from me
@@ -1842,7 +1848,7 @@ fn from_substrait_type( | |||
), | |||
} | |||
} else { | |||
// Kept for backwards compatibility, new plans should include the extension instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I would suggest updating this warning here and further down to something like
// Kept for backwards compatibility
// Producers should use IntervalCompound instead
to make it clearer what/who should be using the new type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1831,8 +1832,13 @@ fn from_substrait_type( | |||
Ok(DataType::Interval(IntervalUnit::YearMonth)) | |||
} | |||
r#type::Kind::IntervalDay(_) => Ok(DataType::Interval(IntervalUnit::DayTime)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly out of scope of this PR, but do we need to checking the precision on this type now?
Based on https://github.com/substrait-io/substrait/blob/683f4179a058c2c99c04501b920a48ff372356ff/proto/substrait/type.proto#L132-L140 DataFusion can only handle this type with precision loss when precision
is set to 0 or 3 (seconds or milliseconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup see #12112 (comment). I think we should change it, maybe with some option so that user can decide if they want to throw or accept the precision loss. But will be a separate PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Only one small question.
p | ||
); | ||
} | ||
let nanos = *subseconds * i64::pow(10, (*p - 9) as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably butchering this but should this be 9 - *p
instead of *p - 9
? If precision is 0 then we want 10^9 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think you’re right 🙈 math is hard.. I’ll fix this tomorrow. Thanks for catching it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think you’re right 🙈 math is hard.. I’ll fix this tomorrow. Thanks for catching it!
Marking this PR as draft so we don't accidentally merge it until it is fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 79f6341 and added a test in ea8bc4a, thanks again for catching this @westonpace!
Thanks @alamb, I converted back to PR now and I think it's ready to merge :)
I'll plan to merge this PR once the CI checks have completed |
🚀 |
Thanks again @Blizzara @vbarua @westonpace and @tokoko 👥 |
Which issue does this PR close?
Rationale for this change
Previously, Substrait didn't have a type & literal for the intervals that combine year/month and day/time, so DF had to use a custom type to support those. substrait-io/substrait#665 however added support for IntervalCompound, which is a superset of DF's IntervalMonthDayNano - so we can switch over.
What changes are included in this PR?
Add support for consuming Substrait's IntervalCompound type/literals (in addition to existing UDTs, for backwards compatibility)
Switch Substrait producing to produce IntervalCompound type/literals
Are these changes tested?
Tested through existing unit tests.
However, as we now don't produce any UDTs, we cannot easily test the UDT-producing code anymore.
Are there any user-facing changes?
Substrait now produces IntervalCompound types instead of interval-month-day-nano UDTs. Both types can be consumed.