Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_timestamp timeunit to be consistent with postgresql's #2979

Closed
Tracked by #3148
waitingkuo opened this issue Jul 28, 2022 · 13 comments · Fixed by #7844
Closed
Tracked by #3148

to_timestamp timeunit to be consistent with postgresql's #2979

waitingkuo opened this issue Jul 28, 2022 · 13 comments · Fixed by #7844
Assignees
Labels
enhancement New feature or request

Comments

@waitingkuo
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Current to_timestamp assume the input number is nanoseconds from 1970

select to_timestamp(10);
+-------------------------------+
| totimestamp(Int64(10))        |
+-------------------------------+
| 1970-01-01 00:00:00.000000010 |
+-------------------------------+
1 row in set. Query took 0.000 seconds.

while postgresql's is to assume the number is seonds from 1970

willy=# select to_timestamp(10) at time zone 'utc';
      timezone       
---------------------
 1970-01-01 00:00:10
(1 row)

Should we align with postgrseql's?

Describe the solution you'd like

  1. rename the original to_timestamp as to_timestamp_nanoseconds
  2. we do have a to_timestamp_seconds for now. we could either rename it to to_timestamp or make a new to_timestamp as to_timestamp_seconds's alias
@waitingkuo waitingkuo added the enhancement New feature or request label Jul 28, 2022
@waitingkuo waitingkuo changed the title default to_timestamp timeunit to be consistent with postgrseql's to_timestamp timeunit to be consistent with postgrseql's Jul 28, 2022
@alamb
Copy link
Contributor

alamb commented Jul 30, 2022

Hi @waitingkuo -- I think given the desire to make datafusion "A Declarative SQL query interface compatible with PostgreSQL" as described on https://arrow.apache.org/datafusion/specification/roadmap.html#datafusion compatible changing the current to_timestamp to be to_timestamp_nanos() and changing to_timestamp() to be the same as the current to_timestamp_seconds would be reasonable

Here are the docs, in case anyone is interested

https://github.com/apache/arrow-datafusion/blob/2598893ac7737e92b3be7b74f606280b32264e0e/docs/source/user-guide/sql/datafusion-functions.md#to_timestamp

@waitingkuo
Copy link
Contributor Author

i'm interested in this, please assign me, thanks~

@waitingkuo
Copy link
Contributor Author

waitingkuo commented Aug 8, 2022

@alamb after investigating more on postgresql's, i found that we have following differences:

Differences

  1. postgresql is using microsecond for the resolution of timestamp, datafusion is using nanoseconds

This is postgresql's
image

  1. postgresql to_timestamp have 2 kinds of signatures:
  • 2.1 to_timestamp(double precision) -> timestamp with time zone

https://www.postgresql.org/docs/14/functions-datetime.html

image

the input for to_timestamp in posgresql has the unit second. i.e. to_timestamp(1) means 1 second after 1970 but not 1 microconds.
Current datafusion supports Int64, TimeUnit(Second), TimeUnit(MicroSecond), TimeUnit(MilliSecond), TimeUnit(NanoSeconds), and UTF-8. Our current to_timestamp(int64) deems the input as nanoseconds instead of seconds which is different from postgresql's. Postresql doesnt support to_timestamp(TimeUnit). I suggest to align to_timestamp(i64) to posgresql's; and add another to_timestamp(float64). And leave the to_timestamp(TimeUnit) behave the same as what we have now (conversion between units). to_timestamp(UTF-8) is in 2.2

image

datafusion currently supports to_timestamp(utf-8) without the 2nd argument datafusion uses chrono
) to parse the time string without specify the format (while posgresql's need us to specify the format)

  1. postgresql's to_timestamp output timestamp with timezone, while datafusion outputs TimeUnit(UNIT, None). The second argument for TimeUnit in arrow in implements timezone, just we don't use it in to_timestamp yet

Summary & My proposal

  1. keep using nanosecond as default while there's no specific timeunit set
  2. make follwings:
  • to_timestamp(i64) recognize input as second, output Timestamp(NanoSecond, None)
    • (e.g. to_timestamp(1) -> 1970-01-01:00:00:01)
  • to_timestamp_nanos(i64) recognize input as nanoseconds, output Timestamp(NanoSecond, None)
    • (e.g. to_timestamp_nanos(1) -> 1970-01-01:00:00:00.000000001)
  • to_timestamp_millis(I64) recognize input as milliseconds, output Timestamp(MilliSecond, None)
    • (e.g. to_timestamp_millis(1) -> 1970-01-01:00:00:01.000001)
  • to_timestamp_micros(I64) recognize input as microsecond, output Timestamp(MicroSecond, None)
    • (e.g. to_timestamp_micros(1) -> 1970-01-01:00:00:00.001)
  • to_timestamp_seconds(I64) recognize input as second, output Timestamp(Second, None)
    • (e.g. to_timestamp_seconds(1) -> 1970-01-01:00:00:01)
  1. support to_timestamp(float64) and `to_timestamp_xx(float64)
  2. keep using chorno's time string format for now
  3. start another thread for dicussing timezone implementation, I feel like this might break many things. And we also need to implement something like SET TIME ZONE AS xxx like what posgresql has

Please leave your comments. I'll submit another issue for tracking all of these.

@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

Summary & My proposal

Makes sense to me. 👍

cc @ovr @andygrove @seddonm1 @stuartcarnie (not sure if anyone else is interested in this discussion too)

start another thread for dicussing timezone implementation, I feel like this might break many things. And we also need to implement something like SET TIME ZONE AS xxx like what posgresql has

There are a variety of open issues known about timestamp handling -- for example apache/arrow-rs#1936 apache/arrow-rs#1380 apache/arrow-rs#597 and others

It would be great to figure out how best to support these and start fixing it up.

@alamb alamb changed the title to_timestamp timeunit to be consistent with postgrseql's to_timestamp timeunit to be consistent with postgrwsql's Aug 9, 2022
@alamb alamb changed the title to_timestamp timeunit to be consistent with postgrwsql's to_timestamp timeunit to be consistent with postgresql's Aug 9, 2022
@waitingkuo
Copy link
Contributor Author

@alamb should we consider to_timestamp() to output timestamp with time zone?

in postgresql select timestamp '2020-01-01' output timestamp without time zone but to_timestamp() output timestamp with time zone.

@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

@alamb should we consider to_timestamp() to output timestamp with time zone?

I think that can be a long term goal -- I suspect making that change will be hard / impossible without improving the timezone support in general across DataFusion

@waitingkuo
Copy link
Contributor Author

waitingkuo commented Aug 9, 2022

@alamb i could image that might breaks lots of thing
there're some time zone issues in datafusion for now as well e.g. #3080

@waitingkuo
Copy link
Contributor Author

@alamb
after spend time figuring out how postgresql is working, i submitted a proposal here
#3100

probably we need to deal with datatype date / time / timestamp / time zone first and then move to this timestamp related functions

@alamb
Copy link
Contributor

alamb commented Aug 11, 2022

probably we need to deal with datatype date / time / timestamp / time zone first and then move to this timestamp related functions

I agree that sounds wise

@waitingkuo
Copy link
Contributor Author

@alamb it seems like now is a great time to align time-related functions with PostgreSQL, as most of the fundamental work has already been completed.

@alamb
Copy link
Contributor

alamb commented Oct 20, 2023

@alamb it seems like now is a great time to align time-related functions with PostgreSQL, as most of the fundamental work has already been completed.

I agree -- it would be a good time.

@waitingkuo would you be willing to make a new "date/time" epic and move all the unfinished stuff from #3148 to that new epic as a way to figure out what else is needed?

@waitingkuo
Copy link
Contributor Author

@alamb
Sure, I am currently on vacation, and I will do it a little later.

@alamb
Copy link
Contributor

alamb commented Oct 21, 2023

Thanks @waitingkuo -- no rush!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants