-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: write_csv
result has incorrect formatting
#4876
Comments
write_csv
result has incorrect format for EXTRACT(YEAR FROM ...)
write_csv
result has incorrect formatting
(arrow_dev) alamb@MacBook-Pro-8:~/Software/influxdb_iox2$ cat /tmp/data.csv
name,created_at,last_report
Sales,1825-08-29T07:29:01.256,2022-08-29
Marketing,2017-02-16T07:29:01.256,2022-02-16
IT,2019-04-04T07:29:01.256,2021-04-04
Finance,2016-09-14T07:29:01.256,2021-09-14
HR,2017-03-01T07:29:01.256,2022-03-01 You are correct that extract is now a (arrow_dev) alamb@MacBook-Pro-8:~/Software/influxdb_iox2$ datafusion-cli
DataFusion CLI v16.0.0
❯ select * from '/tmp/data.csv';
+-----------+-------------------------+-------------+
| name | created_at | last_report |
+-----------+-------------------------+-------------+
| Sales | 1825-08-29T07:29:01.256 | 2022-08-29 |
| Marketing | 2017-02-16T07:29:01.256 | 2022-02-16 |
| IT | 2019-04-04T07:29:01.256 | 2021-04-04 |
| Finance | 2016-09-14T07:29:01.256 | 2021-09-14 |
| HR | 2017-03-01T07:29:01.256 | 2022-03-01 |
+-----------+-------------------------+-------------+
5 rows in set. Query took 0.020 seconds.
❯ SELECT d.name, arrow_typeof(EXTRACT(YEAR FROM d.created_at)) as year, d.last_report + INTERVAL '12' MONTH as deadline FROM '/tmp/data.csv' d ORDER BY d.created_at;
+-----------+---------+------------+
| name | year | deadline |
+-----------+---------+------------+
| Sales | Float64 | 2023-08-29 |
| Finance | Float64 | 2022-09-14 |
| Marketing | Float64 | 2023-02-16 |
| HR | Float64 | 2023-03-01 |
| IT | Float64 | 2022-04-04 |
+-----------+---------+------------+
❯ Perhaps you can add a workaround by explicitly casting to numeric:
|
Thanks for clarifying @alamb! But I have concern about difference between Why does it have different formatting rules? I just expected that But in general, we can close this issue I guess |
The different formatting is due to differences in the upstream arrow crate -- I have filed apache/arrow-rs#3513 to track |
I will do so -- thank you very much for this report DDtKey - keep them coming! |
@alamb btw, just wanted to mention, that postgres returns exactly I just meant that it could lead to misunderstanding why the same query before returned expected results and now it's float for year, so I hope issue with formatting in |
postgres=# SELECT pg_typeof(EXTRACT(YEAR FROM now()));
pg_typeof
-----------
numeric
(1 row)
Indeed -- I think we switched |
Postgrseql's quoted from https://www.postgresql.org/docs/15/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT the return type for extract is still float but not decimal yet, it's tracked here It's pending since the Decimal type wasn't consistent yet. #3996 (comment) |
I think it would be interesting to try out. I am not sure what else is left related to "experimental" support for Decimal -- it seems pretty full featured now to me, given the work from @liukun4515 and others |
Describe the bug
write_csv
result contains unexpected format forEXTRACT(YEAR FROM ...)
. It looks like floating number for some reason.To Reproduce
Example of file:
SQL:
It returns:
So result of
EXTRACT(YEAR FROM d.created_at) as year
has floating format for some reason.While
data_frame.show()
works expected:Expected behavior
Result should be consistent with
show
and previous versiondatafusion 15.0.0
(it used to work)Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: