-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcasting in DataFrames #1804
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts after a quick read-through.
@nalimilan - this should be good to look at although it fails CI. There is some problem with |
@mbauman thank you for your time and having a look at these issues. The crucial part for me is if we need to |
With respect to The key thing to remember about broadcast is that it's entirely based upon indexing and axes. The typical way to use a |
Thank you for the tips on use of Regarding aliasing - the additional difficulty in DataFrames.jl as |
@nalimilan In Base broadcasting does not provide a cleanup on error, e.g.:
Do you think it is acceptable we do the same, or we should make sure that we do not leave a data frame half-modified after an error? |
@mbauman I have two design questions to you:
Thank you. |
@nalimilan I am waiting with doing the changes in this PR as I would welcome a comment, given the discussed options, what you think we should do with column creation when we do broadcasting:
Thank you! |
I'd say 2 is fine if implementation isn't too complex. We can always implement |
OK. I will go for 2 then (it should be easier as we only have to handle a single special case so the general mechanics in all other cases will not be affected). |
Just one more question. What should happen in this case:
currently I create I understand that in this case:
we should copy the vector and create 3 rows. |
OK so we would special-case data frames with no columns and automatically resize them as needed? Then indeed it makes sense for |
We could also disallow broadcasting into a |
As you prefer. I guess it's safer from an API standpoint to throw an error when there are no columns, that way we can make a decision later based on the collected experience with |
@mbauman + @nalimilan I have cleaned up the code following all the comments. Given the simplifications we allow now it is much cleaner and uses only I have three comments:
|
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
OK.
Why not use
If possible I think the signatures should be stricter to ensure they are only used for broadcasting. |
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
Because only |
Thank you for the comments. I have rewritten the tests to be more robust (it would not hurt to have second eyes look at the - although now they are really cumbersome to check unfortunately). I have left four your comments unresolved as you might have some further comments on these issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if you confirm the compiler optimizes the eltype computation and you apply the suggestion. Great work!
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
I have added tests, fixed one type instability in Here are some benchmarks:
The conclusion is that in "fast common patch" we are on par to what we had. If there is a need of |
@nalimilan I know it is approved, but the change is significant to please confirm to me that you are OK with this to be merged 😄. |
I am going to merge it and move forward to further broadcasting/setindex! issues if there are no more comments on this. |
CI will fail now. I will retrigger it after JuliaRegistries/General#1254 is merged. |
As noted on Slack it works by using |
Yes that's kind of strange, but OTOH you're not supposed to store |
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
I have merged this PR with master to make sure all is in sync. |
This PR is a result of a discussion to take a staged approach at
setindex!
/broadcasting in DataFrames.jl.We start with broadcasting.
This PR invalidates #1643 and #1646 but I will keep them open for a reference until nothing that was discussed there becomes irrelevant.
In general we treat
AbstractDataFrame
as a matrix in broadcasting andDataFrameRow
as a vector.The tricky part is that we want to allow adding columns using broadcasting. What I proposed works as expected, but is not entirely clean and uses Base internals a lot. Also if you review this PR please have a look at the tests as they expose corner cases (which are consistent with Base, although sometimes surprising).
@mbauman - we have discussed with @nalimilan that it would be great if you commented on this PR as the design of broadcasting in Base is complex and I am not sure if I have taken the best path. Thank you!