-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider making setproperty
copy Vector
#1846
Comments
Actually we could have But let us wait a bit to hear what other people think that |
We should focus on I think it makes sense to make it take a Pros of Defaulting
|
Not pressing 😄, but consistency in the design is very important, as then users, even if they do not know the documentation in detail, tend to do the right things (but yes - we should have separate issues to discuss it, I just wanted to put here the whole possible target design in one place). So my intuition is that we should go either for copy by default here and in #1844 or no copy by default (and in the no copy case |
To be clear, I agree. Note that I referenced consistency in both my dot points (for the constructors and for normal setfield respectively). I don't think it would be too unnatural to copy on the way in for purposes of matching how the constructor, and And then having some other behavior for Another scattered thought: I do not like the idea that |
Having As a data point, do you know how easy/clean it would be to implement |
There are many ways to do this (TensorFlow.jl does a similar tranform using traits to indicate what ops should change, but for simplicty here is one that just lists them all The simple version is simple, makng it recursive and making it handle various different ways to express the same call is a bit more fiddly
|
If we decided to define |
In general the question is if we need this |
Here's a slightly more radical idea for the more general issue of preventing non-DataFrames column mutations (which can lead to undesirable column states): the Now, this isn't quite a silver bullet because we still have the myriad custom array types, but perhaps this could become a formal API of |
There's no end to the set of things that users could do to make your application state inconsistent. At best you can just make it more tedious. We track that bit to avoid memory corruption (so we can at least given decent errors as it falls apart). In the future, I hope it'll get subsumed by the Buffer type. But I suspect that requiring magic code sequences like unset->mutate->set or freeze/melt will just lead to more awkward source code on the whole for little or no benefit. It seems a bit like the |
I guess I was called out by accident. No worries. Looks like you have a good convo rolling and know your shit. Respect my friends. |
So what is the conclusion here?
|
Yes.
Yes, which takes a
Maybe, but the way it should work if we do is to convert calls to |
I will soon start implementing the
The latter WILL NOT produce a deprecation warning as it would lead to massive deprecation warnings in actual codes. If there any objections to this plan please comment. |
@nalimilan - are you OK with this? (given #1840 is essentially finished this is the next thing I will work on) |
#1840 is merged. Anyone interested please comment on this issue as this is going to be a part of the last big PR before the next release. |
I'm fine with that. |
I understand we want For PROBLEM 1 The only consequence is that we should also then remove PROBLEM 2 The general syntax
Again broadcasting will handle all these via
|
I was in a bit of rush, when writing last comment. So to make sure all is clear let me expand on it using an example. PROBLEM 1
PROBLEM 2
So in summary the problems are:
|
That is a feature.
Again seems like a feature, |
I have EDITS below to reflect recent discussions on the whole design of indexing. @oxinabox I agree (I think 😄). I was writing using current not target specification. Also note that we have to design a consistent system that supports all cases i.e. single or multiple rows and columns that is why I have included some things that are out of scope of this specific issue, but are related. In target parlance we could have the following approach: Assumption. We want to get rid of:
Single column setting:
The key issue is that we start making a difference that Now the rules extend to the following cases for multiple columns:
|
I'm fine with We should also make it easier to create a categorical array filled with a value, e.g. with something like
That's the most problematic case. Maybe we can try without any special support for that, assuming that in most cases
It sounds OK not to support this. |
I have updated the proposed rules in the post above given the discussion related also to |
After discussing this again, it appears that it's difficult to get all the behaviors we want with only
So it doesn't look like An interesting rule to follow could be that |
Just to clarify one thing. Do we want |
In order to wrap around my thinking about this problem I have written down what can be done with
(*) could also be an error as another syntax exists for this operation and may not completely consistent with other uses. With "@bkamins 2" option we can avoid having to add |
As the person who has been making a big fuss about the deprecation of |
OK - it seems we are converging with @nalimilan with our proposals. Simplifying: The only decisions are:
|
After discussions #1866 takes an approach that The general rule (that unfortunately has to be learned by users) is that single column operations do not copy as this is typically what is expected (although admittedly this is slightly unsafe). |
I am closing this. Please reopen if you feel this needs to be discussed (but I feel we are settled that we do not implement this request). |
This is infollow up to other threads trying to remove all the ways users can resize columns out of sync and thus break the DataFrame,
#1844 (comment)
One option, and I am not completely sold on it, but that I feel is worth considering.
Is that we could make
setproperty
and single argumentsetindex
,copy the input.
This would be consistent with the behavior of the constructor as copying the
Vector
s,But unlike that there is no easy way to pass in a
copycols=false
kwarg or aDataframe!
constructor.So if we were to do that, I suggest we would have a
@nocopy
macro to bypass this.,so that
would not copy
vector
but would dispatch to some alternativenocopy_setcol
where as
would dispatch to
setcol
(which would intern dispatch tonocopy_setcol(df, col, copy(vector)
)The text was updated successfully, but these errors were encountered: