Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Behavior of map() and broadcast() #144

Open
Tracked by #148
nalimilan opened this issue Aug 29, 2016 · 5 comments
Open
Tracked by #148

Behavior of map() and broadcast() #144

nalimilan opened this issue Aug 29, 2016 · 5 comments

Comments

@nalimilan
Copy link
Member

Currently, map/broadcast on NullableArray returns a NullableArray. This can be annoying when you don't want a Nullable result, e.g.:

julia> map(isnull, NullableArray(1:3))
3-element NullableArrays.NullableArray{Any,1}:
 false
 false
 false

I think it would make sense to return a NullableArray only when the function returns Nullable. This is similar to operations on BitArray: JuliaLang/julia#18198

nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Aug 29, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Aug 30, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Aug 31, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Aug 31, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Sep 1, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
@davidagold
Copy link
Contributor

Another way to put the issue is that the semantics for functions that act directly on the Nullable container, e.g. isnull, are different from the standard lifting semantics (as we've come to define them) for mapping arbitrary methods f(x::T) over X::NullableArray{T}. How would you feel if we defined map(f::typeof(Base.isnull), X::NullableArray) to give Array results?

Note also that comprehensions always give Array results, so one could use them instead:

julia> X = NullableArray(collect(1:3))
3-element NullableArrays.NullableArray{Int64,1}:
 1
 2
 3

julia> [ isnull(x) for x in X ]
3-element Array{Bool,1}:
 false
 false
 false

nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Sep 1, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Sep 1, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
@nalimilan
Copy link
Member Author

I don't really like the idea of specializing on specific functions, as it will make it very hard to understand the general behavior of map. As I noted in the description, I think the best behavior would be to preserve the function's return type. For cases where automatic lifting of a function is needed, people could use @lift. That would be much more consistent with the rest of the language.

@johnmyleswhite
Copy link
Member

johnmyleswhite commented Sep 2, 2016

To add to David's comment, the main concern I have with this proposal is that it would make it too easy for the semantics of map on Array and NulllableArray to diverge over time.

Consider the following example:

f(x::Int) = 1
f(x::Nullable{Int}) = 2

map(f, 1:3)
map(f, NullableArray(1:3))

What do these two calls to map evaluate to? I would prefer that they be linked, because I think we should generally discourage redefining functions on nullable objects and instead derive their behaviors from the non-nullable definitions.

This concern is closely linked to the whitelist vs blacklist distinction I've been discussing with David for a while now:

Blacklist approach: We automatically lift all functions for you in contexts like map calls, so that you only have to write custom code when you want to do something that violates the natural lifting semantics that maps NULL to NULL and otherwise acts the same as the unlifted function. isnull is one of the functions that needs to go on the blacklist so that we don't use the default semantics for it.

Whitelist approach: We never lift any functions automatically. You always have to define their behavior on nullable objects manually. We imagine you'll usually obey our default semantics, but we want you to always opt-in.

Which of these approaches is best is admittedly unclear. But I'm currently inclined to think that the blacklist approach will involve less work in the long run and will prevent paradoxes in which the action of functions on nullables and non-nullables is different from unclear reasons.

@nalimilan
Copy link
Member Author

OK. I was rather trying to ensure consistency with Array{Nullable}, so that

map(f, [Nullable(1), Nullable(2), Nullable(3)]) == map(f, NullableArray(1:3))

No divergence would be allowed between NullableArray and Array{Nullable}, except for the fact that the former is optimized and operations on it return a NullableArray to propagate this optimization. Obviously, we can't be consistent both with Array{T} and Array{Nullable{T}}.

Whatever the choice, I don't like the idea of a list, be it black or white: it makes things inconsistent and harder to understand/predict. Let's lift everything or nothing by default.

@johnmyleswhite
Copy link
Member

To clarify, lifting nothing by default is what I mean when I talk about using a whitelist strategy. In that case, the whitelist is literally Julia's method table since the only methods that apply to nullable objects are then methods that explicitly are defined over nullables using normal method definitions.

I think it's reasonable to lift everything with map and require that people use a special isnull function to get an array. Is that ok with you?

I may regret this view later, but I would prefer that we ignore the Array{Nullable{T}} case since I think Julia's generic programming over arrays already makes lots of assumptions about the absence of any null values. In particular, Array{Any} can have nullable objects in it, but there's little support for operations over such an array. For example, sum([1, Nullable(1)] will only work if we start defining mixed operations like 1 + Nullable(1), which I would prefer we not try to define unless we know we have to.

@davidagold davidagold mentioned this issue Sep 18, 2016
14 tasks
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Sep 22, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
maximerischard pushed a commit to maximerischard/DataFrames.jl that referenced this issue Sep 28, 2016
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
@nalimilan nalimilan mentioned this issue Feb 9, 2017
nalimilan added a commit to JuliaData/DataFrames.jl that referenced this issue Jul 8, 2017

Unverified

This user has not yet uploaded their public signing key.
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants