-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add lift function for working with missing values #26661
Conversation
I find the term |
How about calling it |
|
I'm a little surprised that there's a perf difference there — typically those tuple functions are highly optimized and inline like crazy. You could work around the perf difference between
We could then always use the Just a spitball. |
@mbauman Nice, This is almost as fast as Additionally after some analysis I have second thoughts about So the approach could be something like (if I understand your idea correctly - I am extending the
which would give maximum flexibility. |
In the same vein, it would be weird that Regarding the extension to other sentinels than |
Using As for In general in the design of |
I don't think there's a tradeoff between flexibility and speed, as it would be handled at compile time anyway (that's the case for |
OK - let us try this path and start with the specification:
|
Do we have examples of why we need lifting? Is it really that useful? Anybody have a real use-case where they needed lifting? (honestly curious here, not just trying to bat this PR down) |
This was the original reason:
and the last line fails, because column |
Yeah, it's needed for all cases where we don't provide lifted methods (see also #26631). Actually |
@nalimilan This is an implementation working like
|
I kind of lean towards thinking we don't really need this. I think it's an easy thing to think we need, but if you really break down use-cases, I really wonder how much convenience this would actually provide. I mean, in @bkamins's example, you could do uppercase.(collect(skipmissing(df[:A]))) or the map equivalent map(x->ismissing(x) ? missing : uppercase(x), df[:A]) if you ultimately can just drop the missing entries anyway. Compared the suggested lifting functionality here of lift(uppercase).(df[:A])) (assuming that would work correctly w/ broadcasting and everything. It just seems like the tone of some of the comments around this are like: "oh no, Not saying we shouldn't keep proving this out and brainstorming, but just wanted to throw out some of my thoughts. |
@bkamins In principle that looks good, though I'm not sure about performance. I've just realized that @quinnj I see your point. I guess the advantage would be more compelling if we supported the |
Using |
These two are not equivalent, since the first one gets rid of missing values. One thing that would be nice in Julia is a
Now you can work with `t' as though it is an array while
The problem is that this |
I have been thinking about going back to defining
The idea is that it is fully flexible
Maybe function names |
We now have both I'm not sure whether "lift" is the best term, even if it's used in C#, Scala and Haskell. "Propagate" would be more evocative, but it's a bit long. "Carry" could be nice ( |
I am also not fully convinced with Regarding
is much more general and can have different applications - as it transforms ternary operator into a higher order function. Then handling Note e.g. that the default implementation for And of course it is easy enough to add default So in summary there are the following questions:
Finally I would leave out discussion of |
Could this start out in Missings.jl for a while to see how useful it is and then be moved into base/stdlib if it is a success? |
OK. I have proposed it here JuliaData/Missings.jl#89. Let us see how it works out. |
We have now |
OK - closing this as having |
Folows discussion in #26631 to add generic lifting method for any function.
The implementation lifts only over positional arguments. The three implemented methods follow results of benchmarking. In particular for a single positional argument it is faster not to call
any
function (at least currently). Also inlift(f, x; kw...)
it is faster not to calllift(f)
but directly perform the test. Some limitation is that inlift(f)
approach we are not able to differentiate between single or multiple positional argument cases. Thereforelift(f, x; kw...)
style will be faster for single positional argument case.CC @nalimilan