Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing error when setting column type to ZonedDateTime #807

Closed
bluesmoon opened this issue Feb 9, 2021 · 3 comments
Closed

Parsing error when setting column type to ZonedDateTime #807

bluesmoon opened this issue Feb 9, 2021 · 3 comments

Comments

@bluesmoon
Copy link

bluesmoon commented Feb 9, 2021

I have Parsers v1.0.15 and CSV v0.8.2 on julia 1.5.3

This issue may be related to #729

I have the following CSV:

"a","b","c","d"
1,a,2021-02-09T15:11:01.118,2021-02-09T15:11:01.144-05:00
2,b,2021-02-09T00:00:00.0,2021-02-09T15:11:01.144-05:00
3,c,2021-02-09T15:11:01.118,2021-02-09T15:11:01.144-05:00

I have the following code that stores the d column as a String:

using CSV, Dates, TimeZones, DataFrames
df = CSV.File("/data/testdb/test-types.csv") |> DataFrame
3×4 DataFrame
 Row │ a      b       c                        d                             
     │ Int64  String  DateTime                 String                        
─────┼───────────────────────────────────────────────────────────────────────
   1 │     1  a       2021-02-09T15:11:01.118  2021-02-09T15:11:01.144-05:00
   2 │     2  b       2021-02-09T00:00:00      2021-02-09T15:11:01.144-05:00
   3 │     3  c       2021-02-09T15:11:01.118  2021-02-09T15:11:01.144-05:00

If I change the code to use types, it fails with a parser error. This happens regardless of whether I use dateformats or not:

df_tz = CSV.File("/data/testdb/test-types.csv", types=Dict(:d => ZonedDateTime), dateformats=Dict(:d => Dates.default_format(ZonedDateTime))) |> DataFrame

ERROR: MethodError: no method matching ZonedDateTime(::Int64)
Closest candidates are:
  ZonedDateTime(::Int64, ::Union{Int32, Int64}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::AbstractString) at /home/ubuntu/.julia/packages/TimeZones/K98G0/src/types/zoneddatetime.jl:131
  ZonedDateTime(::Integer, ::Integer, ::Integer, ::Integer, ::Integer, ::Integer, ::Integer, ::VariableTimeZone, ::Integer) at /home/ubuntu/.julia/packages/TimeZones/K98G0/src/types/zoneddatetime.jl:116
  ZonedDateTime(::Integer, ::Integer, ::Integer, ::Integer, ::Integer, ::Integer, ::VariableTimeZone, ::Integer) at none:0
  ...
Stacktrace:
 [1] default(::Type{ZonedDateTime}) at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/Parsers.jl:145
 [2] xparse at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/Parsers.jl:180 [inlined]
 [3] parsevalue!(::Type{ZonedDateTime}, ::UInt8, ::SentinelArrays.SentinelArray{ZonedDateTime,1,UndefInitializer,Missing,Array{ZonedDateTime,1}}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,true,true,false,Missing,UInt8,DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssszzz"),Tuple{Dates.DatePart{'y'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.DatePart{'z'}}}}, ::Int64, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:907
 [4] macro expansion at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:620 [inlined]
 [5] parsecustom! at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:610 [inlined]
 [6] parserow(::Int64, ::Val{false}, ::Int64, ::Dict{Type,Type}, ::Array{AbstractArray{T,1} where T,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}, ::Bool, ::Parsers.Options{false,true,true,false,Missing,UInt8,Nothing}, ::Array{Parsers.Options,1}, ::Type{Tuple{Tuple{SentinelArrays.SentinelArray{ZonedDateTime,1,UndefInitializer,Missing,Array{ZonedDateTime,1}},ZonedDateTime}}}, ::Base.RefValue{Int64}, ::Int64) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:669
 [7] parsefilechunk!(::Val{false}, ::Int64, ::Dict{Type,Type}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}, ::Bool, ::Parsers.Options{false,true,true,false,Missing,UInt8,Nothing}, ::Array{Parsers.Options,1}, ::Type{Tuple{Tuple{SentinelArrays.SentinelArray{ZonedDateTime,1,UndefInitializer,Missing,Array{ZonedDateTime,1}},ZonedDateTime}}}, ::Int64) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:541
 [8] CSV.File(::CSV.Header{false,Parsers.Options{false,true,true,false,Missing,UInt8,Nothing},Array{UInt8,1}}; finalizebuffer::Bool, startingbyteposition::Nothing, endingbyteposition::Nothing, limit::Nothing, threaded::Nothing, typemap::Dict{Type,Type}, tasks::Int64, lines_to_check::Int64, maxwarnings::Int64, debug::Bool) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:303
 [9] CSV.File(::String; header::Int64, normalizenames::Bool, datarow::Int64, skipto::Nothing, footerskip::Int64, transpose::Bool, comment::Nothing, use_mmap::Nothing, ignoreemptylines::Bool, select::Nothing, drop::Nothing, missingstrings::Array{String,1}, missingstring::String, delim::Nothing, ignorerepeated::Bool, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, dateformat::Nothing, dateformats::Dict{Symbol,DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssszzz"),Tuple{Dates.DatePart{'y'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.DatePart{'z'}}}}, decimal::UInt8, truestrings::Array{String,1}, falsestrings::Array{String,1}, type::Nothing, types::Dict{Symbol,DataType}, typemap::Dict{Type,Type}, pool::Float64, lazystrings::Bool, strict::Bool, silencewarnings::Bool, debug::Bool, parsingdebug::Bool, kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:218

If I change the types value to DateTime, then I get the same error as in #729

@bluesmoon
Copy link
Author

The cause of the exception seems to be this:

default(::Type{T}) where {T <: Dates.TimeType} = T(0)

Since the constructor for ZonedDateTime requires a second TimeZone parameter.

Still not sure why it's actually getting there.

@bluesmoon
Copy link
Author

I can get around the initial issue by extending default in my own code:

using CSV, Dates, TimeZones, DataFrames, Parsers
import Parsers.default

default(::Type{ZonedDateTime}) = ZonedDateTime(DateTime(0), TimeZones.utc_tz)

This then gives me the following error when I try to parse:

df_tz = CSV.File("/data/testdb/test-types.csv", types=Dict(:d => ZonedDateTime)) |> DataFrame
ERROR: TypeError: in typeassert, expected Tuple{Char,Int64}, got a value of type Tuple{UInt8,Int64}
Stacktrace:
 [1] tryparsenext_fixedtz(::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Int64) at /home/ubuntu/.julia/packages/TimeZones/K98G0/src/parse.jl:12
 [2] tryparsenext at /home/ubuntu/.julia/packages/TimeZones/K98G0/src/parse.jl:69 [inlined]
 [3] tryparsenext at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Dates/src/io.jl:41 [inlined]
 [4] macro expansion at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/dates.jl:84 [inlined]
 [5] mytryparsenext_core at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/dates.jl:61 [inlined]
 [6] macro expansion at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/dates.jl:52 [inlined]
 [7] mytryparsenext_internal at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/dates.jl:32 [inlined]
 [8] typeparser at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/dates.jl:3 [inlined]
 [9] xparse at /home/ubuntu/.julia/packages/Parsers/2MBHI/src/Parsers.jl:254 [inlined]
 [10] parsevalue!(::Type{ZonedDateTime}, ::UInt8, ::SentinelArrays.SentinelArray{ZonedDateTime,1,UndefInitializer,Missing,Array{ZonedDateTime,1}}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,true,true,false,Missing,UInt8,DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssszzz"),Tuple{Dates.DatePart{'y'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.DatePart{'z'}}}}, ::Int64, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:907
 [11] macro expansion at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:620 [inlined]
 [12] parsecustom! at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:610 [inlined]
 [13] parserow(::Int64, ::Val{false}, ::Int64, ::Dict{Type,Type}, ::Array{AbstractArray{T,1} where T,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}, ::Bool, ::Parsers.Options{false,true,true,false,Missing,UInt8,Nothing}, ::Array{Parsers.Options,1}, ::Type{Tuple{Tuple{SentinelArrays.SentinelArray{ZonedDateTime,1,UndefInitializer,Missing,Array{ZonedDateTime,1}},ZonedDateTime}}}, ::Base.RefValue{Int64}, ::Int64) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:669
 [14] parsefilechunk!(::Val{false}, ::Int64, ::Dict{Type,Type}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}, ::Bool, ::Parsers.Options{false,true,true,false,Missing,UInt8,Nothing}, ::Array{Parsers.Options,1}, ::Type{Tuple{Tuple{SentinelArrays.SentinelArray{ZonedDateTime,1,UndefInitializer,Missing,Array{ZonedDateTime,1}},ZonedDateTime}}}, ::Int64) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:541
 [15] CSV.File(::CSV.Header{false,Parsers.Options{false,true,true,false,Missing,UInt8,Nothing},Array{UInt8,1}}; finalizebuffer::Bool, startingbyteposition::Nothing, endingbyteposition::Nothing, limit::Nothing, threaded::Nothing, typemap::Dict{Type,Type}, tasks::Int64, lines_to_check::Int64, maxwarnings::Int64, debug::Bool) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:303
 [16] CSV.File(::String; header::Int64, normalizenames::Bool, datarow::Int64, skipto::Nothing, footerskip::Int64, transpose::Bool, comment::Nothing, use_mmap::Nothing, ignoreemptylines::Bool, select::Nothing, drop::Nothing, missingstrings::Array{String,1}, missingstring::String, delim::Nothing, ignorerepeated::Bool, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, dateformat::Nothing, dateformats::Dict{Symbol,DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssszzz"),Tuple{Dates.DatePart{'y'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.DatePart{'z'}}}}, decimal::UInt8, truestrings::Array{String,1}, falsestrings::Array{String,1}, type::Nothing, types::Dict{Symbol,DataType}, typemap::Dict{Type,Type}, pool::Float64, lazystrings::Bool, strict::Bool, silencewarnings::Bool, debug::Bool, parsingdebug::Bool, kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ubuntu/.julia/packages/CSV/la2cd/src/file.jl:218

@quinnj
Copy link
Member

quinnj commented Aug 20, 2021

ok, after a big Parsers.jl upgrade in time-type parsing functionality recently, doing df_tz = CSV.File("/data/testdb/test-types.csv", types=Dict(:d => ZonedDateTime), dateformats=Dict(:d => Dates.default_format(ZonedDateTime))) |> DataFrame now works on current CSV.jl main branch.

@quinnj quinnj closed this as completed Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants