when informing data types, use functions to parse type (e.g. DateTime) #899

hdcpereira · 2021-09-13T12:11:33Z

Lets say I have a dataset with a DateTime type in one of its columns. Some SQL requests return datetime types as YYYY-MM-DD HH:MM:SS e.g. 2019-03-01 14:04:31. When reading with CSV.read I'm not able to use

my_data = CSV.read(file, DataFrame, header = 1, copycols = true, types = Dict(:time => Dates.DateTime))

because Dates.DateTime doesn't accept YYYY-MM-DD HH:MM:SS only YYYY-MM-DDTHH:MM:SS. One should be able to inform what is the DateTime structure before reading it with Dates.DateTime function, which allows us to inform what is the structure of the DateTime. e.g.

my_data = CSV.read(file, DataFrame, header = 1, copycols = true,types = Dict(:time => x -> Dates.DateTime(x, "YYYY-MM-DD HH:MM:SS")))

The text was updated successfully, but these errors were encountered:

nickrobinson251 · 2021-09-13T13:51:41Z

Does passing dateformat = "YYYY-MM-DD HH:MM:SS" address the issue?
e.g.

my_data = CSV.read(file, DataFrame, header = 1, copycols = true, types = Dict(:time => Dates.DateTime), dateformat = "YYYY-MM-DD HH:MM:SS")

https://csv.juliadata.org/latest/reading.html#dateformat

hdcpereira · 2021-09-13T13:59:54Z

Does passing dateformat = "YYYY-MM-DD HH:MM:SS" address the issue?
e.g.
my_data = CSV.read(file, DataFrame, header = 1, copycols = true, types = Dict(:time => Dates.DateTime), dateformat = "YYYY-MM-DD HH:MM:SS") 
https://csv.juliadata.org/latest/reading.html#dateformat

Nope. It returns me this error for every row in my dataset.

Warning: thread = 1 warning: error parsing DateTime around row = 1, col = 1: "2019-10-03 14:04:22,", error=INVALID: OK | DELIMITED | INVALID_DELIMITER 
└ @ CSV /home/hpereira/.julia/packages/CSV/22psr/src/file.jl:626

And the dataframe

time|
DateTime(?)|
missing |
missing |
missing | 
missing | 
missing | 
...

nickrobinson251 · 2021-09-13T14:40:48Z

i wonder if that's just because the dateformat is incorrect. How about dateformat = "yyy-mm-dd HH:MM:SS"?

hdcpereira · 2021-09-13T14:43:09Z

wow, it worked just fine. But I remain with question. Wouldn't it be a nice feature to let the function operate in the dictionary?

quinnj · 2021-09-13T17:54:52Z

It would be really hard, nay impossible, to support arbitrary transform functions efficiently like this. For instance, in your first example, calling DateTime(x, "yyyy-mm-dd HH:MM:SS") on every value in a datetime column would create a new DateFormat object for each cell. That's a big reason why, for custom datetime formats, we require passing it in at the top-level as a dedicated keyword argument.

The other difficulty in allowing arbitrary functions would be the column typing; you can't really know the result type of applying a function like this, so it would be hard to have the column value setting be an efficient operation, which would really bloat parsing times.

As I've considered this in the past, there's really no advantage to having something like that builtin to CSV.read vs. just doing the transformation yourself post-parsing on the CSV.File or DataFrame object.

quinnj closed this as completed Sep 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when informing data types, use functions to parse type (e.g. DateTime) #899

when informing data types, use functions to parse type (e.g. DateTime) #899

hdcpereira commented Sep 13, 2021 •

edited

Loading

nickrobinson251 commented Sep 13, 2021

hdcpereira commented Sep 13, 2021 •

edited

Loading

nickrobinson251 commented Sep 13, 2021 •

edited

Loading

hdcpereira commented Sep 13, 2021

quinnj commented Sep 13, 2021

when informing data types, use functions to parse type (e.g. DateTime) #899

when informing data types, use functions to parse type (e.g. DateTime) #899

Comments

hdcpereira commented Sep 13, 2021 • edited Loading

nickrobinson251 commented Sep 13, 2021

hdcpereira commented Sep 13, 2021 • edited Loading

nickrobinson251 commented Sep 13, 2021 • edited Loading

hdcpereira commented Sep 13, 2021

quinnj commented Sep 13, 2021

hdcpereira commented Sep 13, 2021 •

edited

Loading

hdcpereira commented Sep 13, 2021 •

edited

Loading

nickrobinson251 commented Sep 13, 2021 •

edited

Loading