Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make pick more useful #44

Merged
merged 4 commits into from
May 31, 2017
Merged

make pick more useful #44

merged 4 commits into from
May 31, 2017

Conversation

shashi
Copy link
Collaborator

@shashi shashi commented May 12, 2017

julia> c = Columns(x=[1,2],y=[2.,4])
2-element IndexedTables.Columns{NamedTuples._NT_x_y{Int64,Float64},NamedTuples._NT_x_y{Array{Int64,1},Array{Float64,1}}}:
 (x = 1, y = 2.0)
 (x = 2, y = 4.0)
julia> pick(:y)(c)
2-element Array{Float64,1}:
 2.0
 4.0

julia> pick(:y)(c)
2-element Array{Float64,1}:
 2.0
 4.0

julia> pick((:y,))(c)
2-element IndexedTables.Columns{NamedTuples._NT_y{Float64},NamedTuples._NT_y{Array{Float64,1}}}:
 (y = 2.0)
 (y = 4.0)

julia> pick((:y,:x))(c)
2-element IndexedTables.Columns{NamedTuples._NT_y_x{Float64,Int64},NamedTuples._NT_y_x{Array{Float64,1},Array{Int64,1}}}:
 (y = 2.0, x = 1)
 (y = 4.0, x = 2)

julia> pick((2,1))(c)
2-element IndexedTables.Columns{NamedTuples._NT_y_x{Float64,Int64},NamedTuples._NT_y_x{Array{Float64,1},Array{Int64,1}}}:
 (y = 2.0, x = 1)
 (y = 4.0, x = 2)

The multiple column version is not inferable in the case the columns are of different types. That can be fixed by using Base.@pure if required.

@JeffBezanson
Copy link
Contributor

If the case of manually writing out the desired fields is important, we could do it very efficiently with a macro @pick(a,b) that expands to

x -> @NT(a=x.a, b=x.b)

To extend that to Columns, we could instead expand to

x -> wrap(x, @NT(a=x.a, b=x.b))

where

wrap(x, y) = y
wrap(x::Columns, y) = Columns(y)

or something along those lines.

@shashi
Copy link
Collaborator Author

shashi commented May 12, 2017

@pick

That's a neat idea! Gives us a chance to generate the required output type during macro expansion time (esp. since it's not possible with @generated functions).

We specialize map(p::Proj, c::Columns) to call p only on c.columns, this is pretty fast at the moment for this reason.

@shashi shashi force-pushed the s/pick branch 2 times, most recently from c72ef63 to 4276963 Compare May 13, 2017 07:48
@shashi
Copy link
Collaborator Author

shashi commented May 13, 2017

I've updated the PR. Added @pick with exactly the syntax you suggested. I've retained the old pick which has a slightly different behavior. Specifically, pick(:x)(@NT(x=1, y=2)) == 1, but @pick(x)(@NT(x=1, y=2)) == @NT(x=1).

@shashi
Copy link
Collaborator Author

shashi commented May 13, 2017

Tests now pass except on OSX nightly (same problem of missing native library or something).

src/utils.jl Outdated
tup = if all([isa(x, Symbol) for x in ex])
# Named tuple
T = NamedTuples.create_tuple([x for x in ex])
:(NamedTuples.$T($([:(getfield(x, $(Expr(:quote, f)))) for f in ex]...)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line fails in a multi-process situation because the symbol $T only gets defined on the master process... This can be fixed by adding @everywhere to the previous line, but do we want to do such things during macro expansion??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new type gets eval'd in Main then it will be sent to other processes, which would be good for now. Related to JuliaData/NamedTuples.jl#27 (comment) --- that change to NamedTuples might help fix things like this more comprehensively.

Copy link
Collaborator Author

@shashi shashi May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be sent to other processes

I'm trying to make this change to NamedTuples but it doesn't look like we can send new types on release-0.6 branch of Julia:

> jl -p 1

julia> struct Foo
           x
       end

julia> remotecall_fetch(identity, 2, Foo("x"))
ERROR: On worker 2:
UndefVarError: Foo not defined
deserialize_datatype at ./serialize.jl:861
handle_deserialize at ./serialize.jl:601
deserialize at ./serialize.jl:571
ntuple at ./tuple.jl:108
handle_deserialize at ./serialize.jl:592
deserialize_msg at ./distributed/messages.jl:100
deserialize_msg at ./distributed/messages.jl:110
message_handler_loop at ./distributed/process_messages.jl:161
process_tcp_streams at ./distributed/process_messages.jl:118

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't https://github.com/JuliaLang/julia/blob/50c2a378ba2171250b9bee04d4ea63eb1f348932/base/serialize.jl#L487-L504 only work for functions? I don't think we ever sent type definitions automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right. We should actually try to rely on the serialize method we've added for NamedTuple types (which is in JuliaDB, but should be moved to NamedTuples).

It would also be nice if this macro could do less work. Ideally it should expand to a @NT(...) expression, i.e. expand to exactly what you'd write if the macro didn't exist, instead of using NamedTuple internals.

To make all that work, we could have NamedTuples put the type it creates directly into its macro output, instead of generating a GlobalRef.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using @NT but running into some problem I can't remember now, but it was about creating the correct expression. I guess I needed to use Expr(:macrocall,...) instead of :(@NT(...)), will try again, thanks.

type it creates directly into its macro output

Can we use @NT inside a function then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should fix that. I'll work on a NamedTuples PR.

src/utils.jl Outdated
@@ -56,11 +58,51 @@ end

# family of projection functions

immutable Proj{field}; end
immutable Proj{F}
f::F
Copy link
Contributor

@JeffBezanson JeffBezanson May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this definition very confusing. A Proj{f}() is supposed to be a function that selects field f, period. Anything else should be a different type.

Also I realize the definition for (p::Proj)(c::Columns) isn't really right anymore, and should be a method of broadcast now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can separate this out into a ProjFunction type. Right now I use Val{f} for the case you mentioned above.

(p::Proj)(c::Columns) can be removed.

should be a method of broadcast now.

And map too right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@shashi
Copy link
Collaborator Author

shashi commented May 18, 2017

Updated. Works in the multi-process case with JuliaData/NamedTuples.jl#34 , but safe to merge here as-is.

@shashi shashi closed this May 25, 2017
@shashi shashi reopened this May 25, 2017
@codecov-io
Copy link

codecov-io commented May 31, 2017

Codecov Report

Merging #44 into master will increase coverage by 0.78%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #44      +/-   ##
==========================================
+ Coverage   91.98%   92.77%   +0.78%     
==========================================
  Files           6        6              
  Lines         836      844       +8     
==========================================
+ Hits          769      783      +14     
+ Misses         67       61       -6
Impacted Files Coverage Δ
src/utils.jl 98.19% <100%> (+4.02%) ⬆️
src/columns.jl 94.57% <100%> (+1.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 063d41f...25c1b62. Read the comment docs.

@shashi shashi merged commit c062d37 into master May 31, 2017
@shashi shashi deleted the s/pick branch May 31, 2017 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants