-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support cast timestamp to time #3016
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @naosense
it looks great, leave some suggestions for consolidating codes
and some test cases that could cover more use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, I've left some suggestions. I think we can use chrono to do more of the heavy lifting here, especially w.r.t timezones. Some more tests, especially of overflow behaviour would be awesome
Thanks for your suggestions, I'll try to find out |
d722def
to
7ca1e99
Compare
Getting really close - thank you for sticking with it 👍 |
LGTM (assuming the CI passes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @naosense @tustvold . LGTM
in this version, the behavior is inconsistent with casting timestamp
this is the behavior for timestamptz::timestamp
❯ set time zone to '+01:00';
0 rows in set. Query took 0.000 seconds.
❯ select (timestamp '1969-12-31T23:00:01')::timestamptz;
+-----------------------------+
| Utf8("1969-12-31T23:00:01") |
+-----------------------------+
| 1970-01-01T00:00:01+01:00 |
+-----------------------------+
1 row in set. Query took 0.003 seconds.
and this is the result for casting it to timestamp
❯ select (timestamp '1969-12-31T23:00:01')::timestamptz::timestamp;
+-----------------------------+
| Utf8("1969-12-31T23:00:01") |
+-----------------------------+
| 1969-12-31T23:00:01 |
+-----------------------------+
1 row in set. Query took 0.003 seconds.
i think we could merge this pr and get back to this discussion
fn test_cast_timestamp_to_time64() { | ||
// test timestamp secs | ||
let a = TimestampSecondArray::from(vec![Some(86405), Some(1), None]) | ||
.with_timezone("+01:00".to_string()); | ||
let array = Arc::new(a) as ArrayRef; | ||
let b = cast(&array, &DataType::Time64(TimeUnit::Microsecond)).unwrap(); | ||
let c = b.as_any().downcast_ref::<Time64MicrosecondArray>().unwrap(); | ||
assert_eq!(3605000000, c.value(0)); | ||
assert_eq!(3601000000, c.value(1)); | ||
assert!(c.is_null(2)); | ||
let b = cast(&array, &DataType::Time64(TimeUnit::Nanosecond)).unwrap(); | ||
let c = b.as_any().downcast_ref::<Time64NanosecondArray>().unwrap(); | ||
assert_eq!(3605000000000, c.value(0)); | ||
assert_eq!(3601000000000, c.value(1)); | ||
assert!(c.is_null(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to this test cases, i think the behavior now is
casting timestamptz 1970-01-01T00:00:01+01:00
to time64
becomes 01:00:01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this doesn't seem to be right 😢
Taking a look now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually correct, the array contents are
"1970-01-02 01:00:05 +01:00",
"1970-01-01 01:00:01 +01:00",
null,
And therefore timestamptz 1970-01-01T01:00:01+01:00
correctly becomes 01:00:01
I have filed #3069 as this debug ouput confused me.
Can you confirm @waitingkuo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i used @naosense 's branch this is what i have
let a = TimestampSecondArray::from(vec![Some(86405), Some(1), None])
.with_timezone("+01:00".to_string());
let array = Arc::new(a) as ArrayRef;
let b = cast(&array, &DataType::Time64(TimeUnit::Microsecond)).unwrap();
let c = b.as_any().downcast_ref::<Time64MicrosecondArray>().unwrap();
println!("{:?}", c);
outputs
PrimitiveArray<Time64(Microsecond)>
[
01:00:05,
01:00:01,
null,
]
while
let a = TimestampSecondArray::from(vec![Some(86405), Some(1), None])
.with_timezone("+01:00".to_string());
let array = Arc::new(a) as ArrayRef;
let b = cast(&array, &DataType::Timestamp(TimeUnit::Microsecond, None)).unwrap();
let c = b.as_any().downcast_ref::<TimestampMicrosecondArray>().unwrap();
//println!("{:?}", b);
println!("{:?}", c);
outputs
PrimitiveArray<Timestamp(Microsecond, None)>
[
1970-01-02T00:00:05,
1970-01-01T00:00:01,
null,
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
casting timestamp<+1> to timestamp simply drop the time zone from data_type
while
casting timestamp<+1> to time add this 1 hour into underline utc timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we not yet have the consensus to do either
casting 2000-01-01T08:00:00+08:00
to Timestamp becomes
- 2000-01-01T08:00:00 (which is what postgrseql has, also is consistent with this pr)
- or 2000-01-01T00:00:00 (which doesn't change the underline values, number of seconds/milis/micros/nanos fro 1970
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is also inconsistent with the arrow specification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold i'm not sure whether arrow has this specification or not.
i tried pyarrow before, it acted like 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly, that doesn't mean it is actually correct 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the discussion to the ticket - #1936 (comment)
@waitingkuo thanks for your detailed explanation, may I ask which way is more reasonable? |
there's an opened issue here #1936 i might submit a RFC to discuss these things soon. |
Benchmark runs are scheduled for baseline = f596209 and contender = d76a0d6. d76a0d6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Absolutely, can't agree any more! |
HI @waitingkuo, I dont know if my PR is correct, could you review it?
Which issue does this PR close?
Related to apache/datafusion#4054
Rationale for this change
Support something like this
What changes are included in this PR?
Are there any user-facing changes?