-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Implement operators on Nullable with lifting semantics #16988
Conversation
Error was triggered e.g. by false >> 0x01.
@JeffBezanson Care to discuss the milestone? That's a relatively limited change, but essential to make |
We have been doing weekly triage, and are trying very hard to work through the milestone issues. I think it's too late to add new features. Why will it make a big difference to have this in Base now? |
I know it's late, but there's been a push recently to start using I guess we could keep these operators in |
+1 to including in 0.5 (or at least 0.5.x). We really are trying to make a big push to get rid of DataArrays and make DataFrames a stronger standard of type-niceness. |
Might a middle road be to host them in a NullableMath package for now? |
How about a |
That could work. I would say it depends on the strength of the consensus around these operators: if we all agree on the semantics, I'd say they could go into 0.5; if there's still some debate, better keep them in a package. |
I do have some objections to the content here.
|
Is the part of this change that affects |
@JeffBezanson OK, if we still have to discuss fundamental issues like this, let's go with a package. That said, the discussion is interesting, so:
Well, yeah, but the day such a feature is introduced, the instruction set will probably provide a way of combining it with vectorization (i.e. ignore errors). Anyway, that's quite theoretical, and that optimization is an implementation detail which can be abandoned.
Unfortunately, SIMD doesn't work at this point in my tests. Cf. JuliaStats/NullableArrays.jl#111 (comment). But enabling SIMD in the long-term is essential for data analysis with NullableArrays. @davidagold and @johnmyleswhite have done a lot of work to ensure the efficiency of that structure.
The apparent consensus among JuliaStats people seems to be that we should follow C# in implementing lifting for all standard operators, and use an explicit syntax for functions (a macro? the
@StefanKarpinski The first commit is a quick hack to make tests pass in corner cases without moving |
Use fast path without a branch for types with unchecked arithmetic (for which the operation can be computed even when value is missing) and a slow path for other types. The new null_safe_op() function allows custom types to opt-in to the fast path when possible. Also use this strategy in isequal(), which keeps its current (non-lifting) behavior.
I would personally prefer not trying to land these things for 0.5 to give us more time to consider them. |
I'm fine with that, but it looks like we still need a few changes in Base:
|
Those changes all seem good except the last one, which seems like a pure performance improvement, which means it's not essential to get it into the release. |
If you remove |
Yes, that's what I just realized. That sounds dangerous, since loading the package would change the behaviour of that function in a subtle way, possibly breaking code that relies on it. Maybe better keep it so that it always fails, and overwrite it from the package. That's not very nice because a warning is going to be printed all the time, but... |
Should I make a PR against NullableArrays or create a new package? Maybe better keep everything in one place until these are moved to Base. Else we need to handle deprecations in NullableArrays, which is a pain. |
I've moved this to JuliaStats/NullableArrays.jl#119 for now. Closing, but please continue discussing any point you think needs more consideration. Better do that sooner than later. |
This moves to Base operators which were defined in NullableArrays. It keeps the lifting semantics:
Nullable(x) $op Nullable(y) = Nullable(x $op y)
when not null, andNullable{promote_op{...}()
when null. It generalizes these definitions in a safe yet efficient way by introducing a newnull_safe_op(f, ::Type...)
function for a fast-path with unchecked types (see commit message).See JuliaStats/NullableArrays.jl#111 for discussion of the design. I've marked it as 0.5.0 since this really sounds like a blocker for progress on the data management front, and it shouldn't break anything.
The first commit is a quick hack to get the new tests to pass. Not sure that's the best strategy.
Cc: @johnmyleswhite @davidagold @quinnj @tshort