Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cast timestamp to time #3016

Merged
merged 8 commits into from
Nov 10, 2022
Merged

Conversation

naosense
Copy link
Contributor

@naosense naosense commented Nov 4, 2022

HI @waitingkuo, I dont know if my PR is correct, could you review it?

Which issue does this PR close?

Related to apache/datafusion#4054

Rationale for this change

Support something like this

select case when current_time() = cast(now() as time) then 'OK' else 'FAIL' end result

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 4, 2022
Copy link
Contributor

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @naosense
it looks great, leave some suggestions for consolidating codes
and some test cases that could cover more use cases

arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, I've left some suggestions. I think we can use chrono to do more of the heavy lifting here, especially w.r.t timezones. Some more tests, especially of overflow behaviour would be awesome

arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
arrow/src/compute/kernels/cast.rs Outdated Show resolved Hide resolved
@naosense
Copy link
Contributor Author

naosense commented Nov 4, 2022

Thanks for your suggestions, I'll try to find out

@naosense naosense force-pushed the cast_timestamp_to_time branch from d722def to 7ca1e99 Compare November 8, 2022 05:02
@naosense naosense requested review from waitingkuo and tustvold and removed request for waitingkuo and tustvold November 8, 2022 05:07
arrow-cast/src/cast.rs Outdated Show resolved Hide resolved
@naosense naosense requested review from tustvold and removed request for waitingkuo November 9, 2022 08:27
arrow-cast/src/cast.rs Outdated Show resolved Hide resolved
@tustvold
Copy link
Contributor

tustvold commented Nov 9, 2022

Getting really close - thank you for sticking with it 👍

@naosense naosense requested a review from tustvold November 9, 2022 10:19
@tustvold
Copy link
Contributor

tustvold commented Nov 9, 2022

LGTM (assuming the CI passes)

Copy link
Contributor

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @naosense @tustvold . LGTM

in this version, the behavior is inconsistent with casting timestamp

this is the behavior for timestamptz::timestamp

set time zone to '+01:00';
0 rows in set. Query took 0.000 seconds.
❯ select (timestamp '1969-12-31T23:00:01')::timestamptz;
+-----------------------------+
| Utf8("1969-12-31T23:00:01") |
+-----------------------------+
| 1970-01-01T00:00:01+01:00   |
+-----------------------------+
1 row in set. Query took 0.003 seconds.

and this is the result for casting it to timestamp

select (timestamp '1969-12-31T23:00:01')::timestamptz::timestamp;
+-----------------------------+
| Utf8("1969-12-31T23:00:01") |
+-----------------------------+
| 1969-12-31T23:00:01         |
+-----------------------------+
1 row in set. Query took 0.003 seconds.

i think we could merge this pr and get back to this discussion

Comment on lines +4295 to +4309
fn test_cast_timestamp_to_time64() {
// test timestamp secs
let a = TimestampSecondArray::from(vec![Some(86405), Some(1), None])
.with_timezone("+01:00".to_string());
let array = Arc::new(a) as ArrayRef;
let b = cast(&array, &DataType::Time64(TimeUnit::Microsecond)).unwrap();
let c = b.as_any().downcast_ref::<Time64MicrosecondArray>().unwrap();
assert_eq!(3605000000, c.value(0));
assert_eq!(3601000000, c.value(1));
assert!(c.is_null(2));
let b = cast(&array, &DataType::Time64(TimeUnit::Nanosecond)).unwrap();
let c = b.as_any().downcast_ref::<Time64NanosecondArray>().unwrap();
assert_eq!(3605000000000, c.value(0));
assert_eq!(3601000000000, c.value(1));
assert!(c.is_null(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to this test cases, i think the behavior now is

casting timestamptz 1970-01-01T00:00:01+01:00 to time64
becomes 01:00:01

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't seem to be right 😢

Taking a look now

Copy link
Contributor

@tustvold tustvold Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually correct, the array contents are

  "1970-01-02 01:00:05 +01:00",
  "1970-01-01 01:00:01 +01:00",
  null,

And therefore timestamptz 1970-01-01T01:00:01+01:00 correctly becomes 01:00:01

I have filed #3069 as this debug ouput confused me.

Can you confirm @waitingkuo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold

i used @naosense 's branch this is what i have

    let a = TimestampSecondArray::from(vec![Some(86405), Some(1), None])
            .with_timezone("+01:00".to_string());

        let array = Arc::new(a) as ArrayRef;
        let b = cast(&array, &DataType::Time64(TimeUnit::Microsecond)).unwrap();
        let c = b.as_any().downcast_ref::<Time64MicrosecondArray>().unwrap();

        println!("{:?}", c);

outputs

PrimitiveArray<Time64(Microsecond)>
[
  01:00:05,
  01:00:01,
  null,
]

while

    let a = TimestampSecondArray::from(vec![Some(86405), Some(1), None])
        .with_timezone("+01:00".to_string());

    let array = Arc::new(a) as ArrayRef;
    let b = cast(&array, &DataType::Timestamp(TimeUnit::Microsecond, None)).unwrap();
    let c = b.as_any().downcast_ref::<TimestampMicrosecondArray>().unwrap();
    
   //println!("{:?}", b);
    println!("{:?}", c);

outputs

PrimitiveArray<Timestamp(Microsecond, None)>
[
  1970-01-02T00:00:05,
  1970-01-01T00:00:01,
  null,
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

casting timestamp<+1> to timestamp simply drop the time zone from data_type
while
casting timestamp<+1> to time add this 1 hour into underline utc timestamp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we not yet have the consensus to do either
casting 2000-01-01T08:00:00+08:00 to Timestamp becomes

  1. 2000-01-01T08:00:00 (which is what postgrseql has, also is consistent with this pr)
  2. or 2000-01-01T00:00:00 (which doesn't change the underline values, number of seconds/milis/micros/nanos fro 1970

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is also inconsistent with the arrow specification?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold i'm not sure whether arrow has this specification or not.
i tried pyarrow before, it acted like 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly, that doesn't mean it is actually correct 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the discussion to the ticket - #1936 (comment)

@naosense
Copy link
Contributor Author

naosense commented Nov 9, 2022

@waitingkuo thanks for your detailed explanation, may I ask which way is more reasonable?

@waitingkuo
Copy link
Contributor

there's an opened issue here #1936
I don't have strong opinions for now.

i might submit a RFC to discuss these things soon.
feel free to leave your comments. time zone is hard!

@tustvold tustvold merged commit d76a0d6 into apache:master Nov 10, 2022
@ursabot
Copy link

ursabot commented Nov 10, 2022

Benchmark runs are scheduled for baseline = f596209 and contender = d76a0d6. d76a0d6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@naosense naosense deleted the cast_timestamp_to_time branch November 10, 2022 01:22
@naosense
Copy link
Contributor Author

there's an opened issue here #1936
I don't have strong opinions for now.

i might submit a RFC to discuss these things soon.
feel free to leave your comments. time zone is hard!

Absolutely, can't agree any more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants