Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new UDF to_local_time() #11

Closed
wants to merge 10 commits into from

Conversation

appletreeisyellow
Copy link
Owner

@appletreeisyellow appletreeisyellow commented Jul 3, 2024

Which issue does this PR close?

Help with

Rationale for this change

This PR adds a ScalarUDF function to_local_time():

  • this function converts a timezone-aware timestamp to local time (with no offset or timezone information). In other words, this function strips off the timezone from the timestamp, while keep the display value of the timestamp the same. See examples below
  • only accept 1 input with type Timestamp(..., *)
  • returns with type Timestamp(..., None)

Example

This is how to use it in datafusion-cli:

> select to_local_time('2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels');
+---------------------------------------------+
| to_local_time(Utf8("2024-04-01T00:00:20Z")) |
+---------------------------------------------+
| 2024-04-01T00:00:20                         |
+---------------------------------------------+
1 row(s) fetched.
Elapsed 0.010 seconds.

> select to_local_time('2024-04-01T00:00:20'::timestamp AT TIME ZONE 'Europe/Brussels');
+--------------------------------------------+
| to_local_time(Utf8("2024-04-01T00:00:20")) |
+--------------------------------------------+
| 2024-04-01T00:00:20                        |
+--------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

> select
  time,
  arrow_typeof(time) as type,
  to_local_time(time) as to_local_time,
  arrow_typeof(to_local_time(time)) as to_local_time_type
from (
  select '2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels' as time
);
+---------------------------+------------------------------------------------+---------------------+-----------------------------+
| time                      | type                                           | to_local_time       | to_local_time_type          |
+---------------------------+------------------------------------------------+---------------------+-----------------------------+
| 2024-04-01T00:00:20+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) | 2024-04-01T00:00:20 | Timestamp(Nanosecond, None) |
+---------------------------+------------------------------------------------+---------------------+-----------------------------+
1 row(s) fetched.
Elapsed 0.017 seconds.

Example of using to_local_time() in date_bin()

Combine to_local_time() with date_bin() will look like:

> select date_bin(interval '1 day', to_local_time('2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels'));
+----------------------------------------------------------------------------------------------------+
| date_bin(IntervalMonthDayNano("18446744073709551616"),to_local_time(Utf8("2024-04-01T00:00:20Z"))) |
+----------------------------------------------------------------------------------------------------+
| 2024-04-01T00:00:00                                                                                |
+----------------------------------------------------------------------------------------------------+


> select date_bin(interval '1 day', to_local_time('2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels')) AT TIME ZONE 'Europe/Brussels';
+----------------------------------------------------------------------------------------------------+
| date_bin(IntervalMonthDayNano("18446744073709551616"),to_local_time(Utf8("2024-04-01T00:00:20Z"))) |
+----------------------------------------------------------------------------------------------------+
| 2024-04-01T00:00:00+02:00                                                                          |
+----------------------------------------------------------------------------------------------------+
Click to see more examples of applying to array values
  1. Write sample data
create or replace table t AS
VALUES
  ('2024-01-01T00:00:01Z'),
  ('2024-02-01T00:00:01Z'),
  ('2024-03-01T00:00:01Z'),
  ('2024-04-01T00:00:01Z'),
  ('2024-05-01T00:00:01Z'),
  ('2024-06-01T00:00:01Z'),
  ('2024-07-01T00:00:01Z'),
  ('2024-08-01T00:00:01Z'),
  ('2024-09-01T00:00:01Z'),
  ('2024-10-01T00:00:01Z'),
  ('2024-11-01T00:00:01Z'),
  ('2024-12-01T00:00:01Z')
;

create or replace view t_utc as
select column1::timestamp AT TIME ZONE 'UTC' as "column1"
from t;

create or replace view t_timezone 
as 
select column1::timestamp AT TIME ZONE 'Europe/Brussels' as "column1" 
from t;
  1. See how tables look like
> select column1, arrow_typeof(column1) from t;
+----------------------+-------------------------+
| column1              | arrow_typeof(t.column1) |
+----------------------+-------------------------+
| 2024-01-01T00:00:01Z | Utf8                    |
| 2024-02-01T00:00:01Z | Utf8                    |
| 2024-03-01T00:00:01Z | Utf8                    |
| 2024-04-01T00:00:01Z | Utf8                    |
| 2024-05-01T00:00:01Z | Utf8                    |
| 2024-06-01T00:00:01Z | Utf8                    |
| 2024-07-01T00:00:01Z | Utf8                    |
| 2024-08-01T00:00:01Z | Utf8                    |
| 2024-09-01T00:00:01Z | Utf8                    |
| 2024-10-01T00:00:01Z | Utf8                    |
| 2024-11-01T00:00:01Z | Utf8                    |
| 2024-12-01T00:00:01Z | Utf8                    |
+----------------------+-------------------------+
12 row(s) fetched.
Elapsed 0.009 seconds.

> select column1, arrow_typeof(column1) from t_utc;
+----------------------+------------------------------------+
| column1              | arrow_typeof(t_utc.column1)        |
+----------------------+------------------------------------+
| 2024-01-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-02-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-03-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-04-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-05-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-06-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-07-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-08-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-09-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-10-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-11-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
| 2024-12-01T00:00:01Z | Timestamp(Nanosecond, Some("UTC")) |
+----------------------+------------------------------------+
12 row(s) fetched.
Elapsed 0.011 seconds.

> select column1, arrow_typeof(column1) from t_timezone;
+---------------------------+------------------------------------------------+
| column1                   | arrow_typeof(t_timezone.column1)               |
+---------------------------+------------------------------------------------+
| 2024-01-01T00:00:01+01:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-02-01T00:00:01+01:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-03-01T00:00:01+01:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-04-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-05-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-06-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-07-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-08-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-09-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-10-01T00:00:01+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-11-01T00:00:01+01:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
| 2024-12-01T00:00:01+01:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) |
+---------------------------+------------------------------------------------+
12 row(s) fetched.
Elapsed 0.012 seconds.
  1. Query using to_local_time()
> select column1, to_local_time(column1), arrow_typeof(to_local_time(column1)) from t_utc;
+----------------------+------------------------------+--------------------------------------------+
| column1              | to_local_time(t_utc.column1) | arrow_typeof(to_local_time(t_utc.column1)) |
+----------------------+------------------------------+--------------------------------------------+
| 2024-01-01T00:00:01Z | 2024-01-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-02-01T00:00:01Z | 2024-02-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-03-01T00:00:01Z | 2024-03-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-04-01T00:00:01Z | 2024-04-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-05-01T00:00:01Z | 2024-05-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-06-01T00:00:01Z | 2024-06-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-07-01T00:00:01Z | 2024-07-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-08-01T00:00:01Z | 2024-08-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-09-01T00:00:01Z | 2024-09-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-10-01T00:00:01Z | 2024-10-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-11-01T00:00:01Z | 2024-11-01T00:00:01          | Timestamp(Nanosecond, None)                |
| 2024-12-01T00:00:01Z | 2024-12-01T00:00:01          | Timestamp(Nanosecond, None)                |
+----------------------+------------------------------+--------------------------------------------+
12 row(s) fetched.
Elapsed 0.015 seconds.

> select column1, to_local_time(column1), arrow_typeof(to_local_time(column1)) from t_timezone;
+---------------------------+-----------------------------------+-------------------------------------------------+
| column1                   | to_local_time(t_timezone.column1) | arrow_typeof(to_local_time(t_timezone.column1)) |
+---------------------------+-----------------------------------+-------------------------------------------------+
| 2024-01-01T00:00:01+01:00 | 2024-01-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-02-01T00:00:01+01:00 | 2024-02-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-03-01T00:00:01+01:00 | 2024-03-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-04-01T00:00:01+02:00 | 2024-04-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-05-01T00:00:01+02:00 | 2024-05-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-06-01T00:00:01+02:00 | 2024-06-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-07-01T00:00:01+02:00 | 2024-07-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-08-01T00:00:01+02:00 | 2024-08-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-09-01T00:00:01+02:00 | 2024-09-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-10-01T00:00:01+02:00 | 2024-10-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-11-01T00:00:01+01:00 | 2024-11-01T00:00:01               | Timestamp(Nanosecond, None)                     |
| 2024-12-01T00:00:01+01:00 | 2024-12-01T00:00:01               | Timestamp(Nanosecond, None)                     |
+---------------------------+-----------------------------------+-------------------------------------------------+
12 row(s) fetched.
Elapsed 0.016 seconds.
  1. Combine with date_bin()
> select date_bin(interval '1 day', to_local_time(column1)) AT TIME ZONE 'Europe/Brussels' as date_bin from t_utc;
+---------------------------+
| date_bin                  |
+---------------------------+
| 2024-01-01T00:00:00+01:00 |
| 2024-02-01T00:00:00+01:00 |
| 2024-03-01T00:00:00+01:00 |
| 2024-04-01T00:00:00+02:00 |
| 2024-05-01T00:00:00+02:00 |
| 2024-06-01T00:00:00+02:00 |
| 2024-07-01T00:00:00+02:00 |
| 2024-08-01T00:00:00+02:00 |
| 2024-09-01T00:00:00+02:00 |
| 2024-10-01T00:00:00+02:00 |
| 2024-11-01T00:00:00+01:00 |
| 2024-12-01T00:00:00+01:00 |
+---------------------------+
12 row(s) fetched.
Elapsed 0.023 seconds.

> select date_bin(interval '1 day', to_local_time(column1)) AT TIME ZONE 'Europe/Brussels' as date_bin from t_timezone;
+---------------------------+
| date_bin                  |
+---------------------------+
| 2024-01-01T00:00:00+01:00 |
| 2024-02-01T00:00:00+01:00 |
| 2024-03-01T00:00:00+01:00 |
| 2024-04-01T00:00:00+02:00 |
| 2024-05-01T00:00:00+02:00 |
| 2024-06-01T00:00:00+02:00 |
| 2024-07-01T00:00:00+02:00 |
| 2024-08-01T00:00:00+02:00 |
| 2024-09-01T00:00:00+02:00 |
| 2024-10-01T00:00:00+02:00 |
| 2024-11-01T00:00:00+01:00 |
| 2024-12-01T00:00:00+01:00 |
+---------------------------+
12 row(s) fetched.
Elapsed 0.011 seconds.

What changes are included in this PR?

New ScalarUDF function to_local_time() with tests

Are these changes tested?

Yes

Are there any user-facing changes?

No API changes.

@appletreeisyellow appletreeisyellow force-pushed the chunchun/udf-to-localtime branch from 7427955 to a49eca9 Compare July 9, 2024 01:57
appletreeisyellow pushed a commit that referenced this pull request Aug 6, 2024
… `interval` (apache#11466)

* Unparser rule for datatime cast (#10)

* use timestamp as the identifier for date64

* rename

* implement CustomDialectBuilder

* fix

* dialect with interval style (#11)

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* fmt

* clippy

* doc

* Update datafusion/sql/src/unparser/expr.rs

Co-authored-by: Andrew Lamb <[email protected]>

* update the doc for CustomDialectBuilder

* fix doc test

---------

Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
@appletreeisyellow
Copy link
Owner Author

Superseded by apache#11347

@appletreeisyellow appletreeisyellow deleted the chunchun/udf-to-localtime branch August 6, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant