Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conditional and passmissing #89

Merged
merged 7 commits into from
Jan 17, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion src/Missings.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module Missings

export allowmissing, disallowmissing, ismissing, missing, missings,
Missing, MissingException, levels, coalesce
Missing, MissingException, levels, coalesce, passmissing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer Missings.propagate if we're still bikeshedding the name here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like propagate too, the only gripe is that it's quite long. But well...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we want to export it or not? If yes then propagate is ok, but Missings.propagate is a bit long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately propagate is quite general to claim it for this feature...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but that is why I thought the idea was propagatemissing which tab-completes.


using Base: ismissing, missing, Missing, MissingException

Expand Down Expand Up @@ -165,4 +165,30 @@ function levels(x)
levs
end

struct PassMissing{F} <: Function end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want:

struct PassMissing{F} <: Function
    f::F
end


@generated (::PassMissing{F})(xs...;kw...) where {F} =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the @generated is doing anything here except ensuring a new function gets compiled w/ every call.

:(any(ismissing, xs) || any(ismissing, values(values(kw))) ? missing : F(xs...; kw...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, the inclusion of keyword arguments here seems off to me; what if missing is a totally valid thing to pass as a keyword argument, but now it's interacting weird w/ PassMissing because it always makes my result missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my stab at this definition would be more along the lines of:

struct PassMissing{F} <: Function                                                                                             
    f::F                                                                                                                      
end                                                                                                                           
                                                                                                                              
function (f::PassMissing{F})(x) where {F}                                                                                     
    if @generated                                                                                                             
        return x === Missing ? missing : :(f.f(x))                                                                            
    else                                                                                                                      
        return x === missing ? missing : f.f(x)                                                                               
    end                                                                                                                       
end                                                                                                                           
                                                                                                                              
function (f::PassMissing{F})(xs...; kw...) where {F}                                                                          
    if @generated                                                                                                             
        for T in xs                                                                                                           
            T === Missing && return missing                                                                                   
        end                                                                                                                   
        return :(f.f(xs...; kw...))                                                                                           
    else                                                                                                                      
        return any(ismissing, xs) ? missing : f.f(xs...; kw...)                                                               
    end                                                                                                                       
end 

with specialize methods for 1 argument, 2 argument, maybe 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Though with generated functions you don't need to specialize on the number of arguments.

Also why do you think keyword arguments should be excluded from the missingness check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments. I have done additional benchmarking and it looks you are right with the optimizations 👍 (I did some tests earlier that did allocate).

I do not understand how and why this if @generated part works.
In particular e.g. why is it different from just writing:

@generated function (f::PassMissing{F})(x) where {F}                                                                                     
        return x === Missing ? missing : :(f.f(x))                                                                            
end       

Could you please explain?

Regarding other issues:

  1. function name: I am open to anything (not a native)
  2. I was also unclear if we want to include keyword arguments as in general it is somewhat arbitrary what user sets as positional argument and what as keyword argument (especially as kwargs do not require to have default values).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if @generated part is just a future-proof against a time when you're able to statically compile a julia program into an executable that can run without LLVM (i.e. w/o runtime compilation). The else block just provides a path that would be executed if the compiler encountered an already non-generated/compiled method for the given arguments at runtime.

For keyword arguments, it just feels more arbitrary. Like, it's hard to imagine a case where I'd be maping over some Vector{Union{Float64, Missing}} and Vector{Union{Int, Missing}} and be doing something like map((x, y)->round(x; digits=y), zip(A, B)). i.e. when would I maybe pass a value vs. maybe pass a missing as a keyword argument? Keyword arguments also tend to use various "sentinel" values as signals or special values for the function to use, including missing; the danger there being that someone writes a function foo(x...; sentinel=nothing), but when the user tries to call Missings.propagate(foo)(x; sentinel=missing), the entire result comes back missing instead of passing missing on to foo as a valid sentinel value.

If anything, I'd say we leave keyword arguments out of the mix for now since they can always be added later.

My main issue with passmissing is that pass usually has a connotation of "ignore" or "do nothing" (see python's pass), whereas we're not really ignoring missing values, we're propagating them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For keyword arguments, it just feels more arbitrary. Like, it's hard to imagine a case where I'd be maping over some Vector{Union{Float64, Missing}} and Vector{Union{Int, Missing}} and be doing something like map((x, y)->round(x; digits=y), zip(A, B)). i.e. when would I maybe pass a value vs. maybe pass a missing as a keyword argument? Keyword arguments also tend to use various "sentinel" values as signals or special values for the function to use, including missing; the danger there being that someone writes a function foo(x...; sentinel=nothing), but when the user tries to call Missings.propagate(foo)(x; sentinel=missing), the entire result comes back missing instead of passing missing on to foo as a valid sentinel value.

I don't think there are any examples of functions using missing as a sentinel currently, right? We rather want people to use nothing for that.

I see your point about keyword arguments generally being about options, while the data is passed as positional arguments, but I'd find it problematic to make this a rule. Overall it doesn't seem we have strong use cases for either behavior, and in such situations I tend to favor the simplest rule (i.e. "return missing if one of the arguments is missing").

If anything, I'd say we leave keyword arguments out of the mix for now since they can always be added later.

The problem is, changing that would be breaking. The only way to be able to choose any behavior later is to throw an error if a keyword argument is missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to be able to choose any behavior later is to throw an error if a keyword argument is missing.

We could define lift for functions that do not take keyword arguments for the time being. It is easy enough to wrap a function requiring kwargs in an anonymous function.


"""
passmissing(f)

Return a function that returns `missing` if any of its positional or keyword arguments
are `missing` and otherwise applies `f` to those arguments.

# Examples
```jldoctest
julia> passmissing(sqrt)(4)
2.0

julia> passmissing(sqrt)(missing)
missing

julia> passmissing(sqrt).([missing, 4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to start with a simpler example with a single scalar (first non-missing, and then missing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

2-element Array{Union{Missing, Float64},1}:
missing
2.0
"""
passmissing(f::Base.Callable) = PassMissing{f}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be typeof(f)?

BTW, have you done some benchmarking to check there are no allocations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK if I wanted to use typeof{f} then I would have to store f in a struct which would be slower and would allocate more.

I will post benchmarks for the current implementation in the main thread so that they do not disappear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in most practical uses, the struct would never allocate. It's the same idea w/ the new iteration protocol: every call to iterate potentially returns a Tuple{T...}, but most of the time those tuples and even intermediate allocated objects don't even get allocated.


end # module
4 changes: 4 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -355,4 +355,8 @@ using Test, Dates, InteractiveUtils, SparseArrays, Missings

# MissingException
@test sprint(showerror, MissingException("test")) == "MissingException: test"

# Lifting
@test isequal(passmissing(sqrt).([missing, 4]), [missing, 2.0])
@test isequal(passmissing(parse)(Int, "a", base=missing), missing)
end