Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuple comprehension does not work with map #101

Closed
piever opened this issue Nov 30, 2017 · 7 comments
Closed

Tuple comprehension does not work with map #101

piever opened this issue Nov 30, 2017 · 7 comments
Labels

Comments

@piever
Copy link
Collaborator

piever commented Nov 30, 2017

I found a mysterious bug: in map, if I use a function outputting a tuple that I write explicitly, everything works. If instead the tuple is a result of a tuple comprehension, it only stores the first column.

julia> t = table(rand(10), rand(10), names = [:x, :y])
Table with 10 rows, 2 columns:
x         y
───────────────────
0.6323    0.0750121
0.315025  0.884202
0.419645  0.55112
0.493855  0.822601
0.276584  0.71265
0.377394  0.399386
0.134123  0.574975
0.623424  0.399668
0.69115   0.0666205
0.310245  0.254196

julia> map(i -> (i.x, i.y), t)
Table with 10 rows, 2 columns:
1         2
───────────────────
0.6323    0.0750121
0.315025  0.884202
0.419645  0.55112
0.493855  0.822601
0.276584  0.71265
0.377394  0.399386
0.134123  0.574975
0.623424  0.399668
0.69115   0.0666205
0.310245  0.254196

julia> s = (:x, :y)
(:x, :y)

julia> map(i -> Tuple(getfield(i, j) for j in s), t)
Table with 10 rows, 1 columns:
1
────────
0.6323
0.315025
0.419645
0.493855
0.276584
0.377394
0.134123
0.623424
0.69115
0.310245
@shashi shashi added the bug label Dec 3, 2017
@shashi
Copy link
Collaborator

shashi commented Dec 3, 2017

It's probably because:

julia> IndexedTables._promote_op(i -> Tuple(getfield(i, j) for j in s), typeof(t[1]))
Tuple{Any}

I think in this case we can't infer the output type of the lambda. The right thing to do here is return an Array{Any}.

Use astuple to convert a named tuple to a tuple (it's a generated function, hence can be inferred)

julia> map(IndexedTables.astuple, t)
Table with 10 rows, 2 columns:
1          2
────────────────────
0.0824641  0.69732
0.953091   0.722422
0.335519   0.0399266
0.368203   0.319559
0.854868   0.783442
0.208625   0.961055
0.498797   0.347901
0.948223   0.433926
0.370459   0.808366
0.297149   0.0867216

@shashi shashi closed this as completed Dec 3, 2017
@piever
Copy link
Collaborator Author

piever commented Dec 3, 2017

Thanks for looking into this and for the tip! My concern came from the following inconsistence:

julia> map(i -> Tuple(getfield(i, j) for j in s), t)
10-element Array{Any,1}:
 (0.205925, 0.290762) 
 (0.720414, 0.145541) 
 (0.992084, 0.709543) 
 (0.116492, 0.285352) 
 (0.734721, 0.146378) 
 (0.958972, 0.180972) 
 (0.539473, 0.178098) 
 (0.748478, 0.402817) 
 (0.449255, 0.0980809)
 (0.873253, 0.473562) 

julia> map(i -> Tuple(getfield(i, j) for j in s), (i for i in t))
10-element Array{Tuple{Float64,Float64},1}:
 (0.205925, 0.290762) 
 (0.720414, 0.145541) 
 (0.992084, 0.709543) 
 (0.116492, 0.285352) 
 (0.734721, 0.146378) 
 (0.958972, 0.180972) 
 (0.539473, 0.178098) 
 (0.748478, 0.402817) 
 (0.449255, 0.0980809)
 (0.873253, 0.473562) 

where the "normal" map from Base works well on an iterator, but the overloaded map from IndexedTables doesn't work, but your rational makes sense: to build an IndexedTables you need to infer what the anonymous function is returning, whereas map simply starts creating the new Array and then copies it if the type it has chosen for the first elements is too strict (or at least that's my understanding).

@shashi
Copy link
Collaborator

shashi commented Dec 3, 2017

whereas map simply starts creating the new Array and then copies it if the type it has chosen for the first elements is too strict (or at least that's my understanding).

I'm using _promote_op quite a bit to implement things like this. I suppose it would be better to switch to this process if the inferred type (or any of the fields of the inferred tuple type) is Any.

@piever
Copy link
Collaborator Author

piever commented Dec 3, 2017

A related issue happens with Query (one can't select columns programmatically with a @select statement) and I think the root of the problem was discussed here, where it was suggested to do like map does in Base. Maybe if the current implementation is more performant, the best way would be to only fall back on the "_promote_op free" implementation where inference fails.

@davidanthoff
Copy link
Contributor

Yeah, looks like the same issue as in Query.jl. I have a plan to move to an implementation like in base map if type inference doesn't return a concrete type (or it returns Any). Will complicated things a lot, but I think it is the only solution...

@piever
Copy link
Collaborator Author

piever commented Dec 17, 2017

I think that's a great idea! On the plus side, that should:

  1. Allow select statements with non-inferrable functions (which I think is a very nice improvement: it took me a while to figure out that my queries were not working because the return type couldn't be inferred correctly and I imagine most new users will have similar issues)
  2. Make it easier to transition to Missings

@davidanthoff
Copy link
Contributor

I agree on 1, but disagree on 2 :) I think not relying on type inference is actually easier with DataValue than with Missing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants