Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tables containing Set values are serializable but corresponding deserialized Arrow.Tables are inaccessible #75

Closed
jrevels opened this issue Nov 28, 2020 · 3 comments

Comments

@jrevels
Copy link
Contributor

jrevels commented Nov 28, 2020

Running on 32560d3:

julia> using Arrow

julia> construct_table(args...) = (io = IOBuffer(); Arrow.write(io, args...); seekstart(io); Arrow.Table(io))
construct_table (generic function with 1 method)

julia> t = construct_table((sets = [Set([1,2,3]), Set([1,2,3])],));

julia> collect(t.sets)
ERROR: MethodError: Cannot `convert` an object of type Pair{Int64,Nothing} to an object of type Int64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Ptr) where T<:Integer at pointer.jl:23
  ...
Stacktrace:
 [1] setindex!(::Dict{Int64,Nothing}, ::Nothing, ::Pair{Int64,Nothing}) at ./dict.jl:372
 [2] push!(::Set{Int64}, ::Pair{Int64,Nothing}) at ./set.jl:57
 [3] union!(::Set{Int64}, ::Dict{Int64,Nothing}) at ./abstractset.jl:91
 [4] Set{Int64}(::Dict{Int64,Nothing}) at ./set.jl:10
 [5] getindex at /Users/jarrettrevels/.julia/dev/Arrow/src/arraytypes/struct.jl:44 [inlined]
 [6] copyto_unaliased!(::IndexLinear, ::Array{Set{Int64},1}, ::IndexLinear, ::Arrow.Struct{Set{Int64},Tuple{Arrow.Map{Dict{Int64,Nothing},Int32,Arrow.Struct{Arrow.KeyValue{Int64,Nothing},Tuple{Arrow.Primitive{Int64,Array{Int64,1}},Arrow.Struct{Nothing,Tuple{}}}}}}}) at ./abstractarray.jl:860
 [7] copyto! at ./abstractarray.jl:840 [inlined]
 [8] _collect_indices at ./array.jl:642 [inlined]
 [9] collect(::Arrow.Struct{Set{Int64},Tuple{Arrow.Map{Dict{Int64,Nothing},Int32,Arrow.Struct{Arrow.KeyValue{Int64,Nothing},Tuple{Arrow.Primitive{Int64,Array{Int64,1}},Arrow.Struct{Nothing,Tuple{}}}}}}}) at ./array.jl:626
 [10] top-level scope at REPL[36]:1
@quinnj
Copy link
Member

quinnj commented Dec 1, 2020

Yes, as you've discovered, the Arrow.jl implementation isn't currently very useful for "serialization" of generic objects, but focused on the limited set of types it was originally designed for. I've looked into this and the DataType cases and they're non-trivial with how the code is currently structured, but it should be much nicer if people want to "hook in" to using Arrow.jl for generic serialization. The biggest thing is making the ArrowTypes module much cleaner in terms of the contracts for the various types: what's required and then making sure hte implementation only relies on those contracts. Then people can overload the necessary methods and everything will just work, similar to the StructTypes.jl system. let me look at the NamedTuple issue and see if it's lower hanging fruit

@jrevels
Copy link
Contributor Author

jrevels commented Dec 1, 2020

Ha, yeah - I was actually surprised that Sets were serializable in the first place 😁 Thanks for checking it out.

I'd consider it a win if the error could be moved to serialization time (and try to prevent non-deserializable content from being serialized in the first place as a general rule)

@quinnj
Copy link
Member

quinnj commented Mar 24, 2021

I'm going to add support for this in #156

@quinnj quinnj closed this as completed Mar 24, 2021
quinnj added a commit that referenced this issue Mar 29, 2021
* Start work on overhauling type serialization architecture

* More work; serialization is pretty much done but not tested

* fix timetype ArrowTypes definitions

* more work to get tests passing

* get tests passing?

* fix

* Fix #75 by supporting Set serialization/deserialization

* Fix #85 by supporting tuple serialization/deserialization

* Lots of cleanup

* few more fixes

* Update src/arrowtypes.jl

Co-authored-by: Jarrett Revels <[email protected]>

* Update src/arrowtypes.jl

Co-authored-by: Jarrett Revels <[email protected]>

* fix NullKind reading

* Fix #134 by requiring concrete or union of concrete element types for
all columns when serializing

* Add new ArrowTypes.arrowmetadata method for providing additional extension type metadata htat can be used in JuliaType

* Update manual

* tests

Co-authored-by: Jarrett Revels <[email protected]>
tanmaykm pushed a commit to tanmaykm/arrow-julia that referenced this issue Apr 7, 2021
* Start work on overhauling type serialization architecture

* More work; serialization is pretty much done but not tested

* fix timetype ArrowTypes definitions

* more work to get tests passing

* get tests passing?

* fix

* Fix apache#75 by supporting Set serialization/deserialization

* Fix apache#85 by supporting tuple serialization/deserialization

* Lots of cleanup

* few more fixes

* Update src/arrowtypes.jl

Co-authored-by: Jarrett Revels <[email protected]>

* Update src/arrowtypes.jl

Co-authored-by: Jarrett Revels <[email protected]>

* fix NullKind reading

* Fix apache#134 by requiring concrete or union of concrete element types for
all columns when serializing

* Add new ArrowTypes.arrowmetadata method for providing additional extension type metadata htat can be used in JuliaType

* Update manual

* tests

Co-authored-by: Jarrett Revels <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants