-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supertype of AbstractDataFrame and DataFrameRow #1337
Comments
I don't know the original motivations, but I can think of a few reasons why
As for why |
As for I am not very strongly for this change, but I want to put it under the discussion as this is something that e.g. R users would find convenient (of course it is not a definitive reason to allow it - we do not have to replicate R in DataFrames). In general it boils down to what you have pointed out - do we treat As for |
I think this has been discussed a lot before and the decision to make Julia data frames completely distinct from matrices has been a very conscious one (you should be able to find old issues about it). It's indeed surprising for R users, but in general R data frames are quite awkward to work with, and redesigns like dplyr adopt a completely different approach closer to databases. Anyway it's not a good idea to provide people with an apparently convenient syntax which would be very slow due to type instabilities: that would just be a trap. Regarding |
Thanks for the explanation. Regarding
and now |
That's a very good point. You're correct, if |
Ah, right. Maybe we need JuliaLang/julia#21912 is related, since it would provide a convenient syntax to create a new byrow!(df) do row
row@a += row[:b]
row
end But clearly that's not ideal. We could also add this |
I am OK to change The reasons I am mostly indifferent is that in general I do not think that The difference is between:
and
Additionally working on |
Yes, it looks like we should provide functions taking an anonymous function which would be applied to each row, as that's the only way to get type stability. That's why I recently added
|
I understand that |
I suspect it would, but I haven't looked at it in detail yet. It would be interesting to see how it's implemented. |
One could also think about a type struct RowView{T} where {T<:TypedDataFrame}
df::T
row::Int
end that overrides for r in rowviews(df)
println(r.colB)
r.colA = 34
end |
Yes, that's essentially |
Yes, the only issue is that with current proposal in #1335 |
I am closing it as now it is more or less settled that we keep |
I have also even a question (maybe it was discussed somewhere, but I think the decision could influence the design we are discussing in #1335 so I raise it now):
AbstractDataFrame
is not a subtype ofAbstractMatrix{Any}
?DataFrameRow
is not a subtype ofAbstractDataFrame
?In this way many functions that work on
AbstractMatrix
would work onDataFrame
s for free.Additionally if some
broadcast
related features were implemented like inNamedArrays
the result of such operations could remain aDataFrame
if that were sensible.I do not see any negative side effects (but I might be missing something).
The benefit is that now when the user knows that some columns in
DataFrame
are homogeneous thenArray(df[rows, cols])
conversion has to be run to be able to perform the desired operations.Of course in performance sensitive code it will still be required to do so, because conversion will infer the type of the
[rows, cols]
section of theDataFrame
, but in many cases user wants a simple transformation and performance is not an issue (see eg. https://stackoverflow.com/questions/48037732/how-to-save-julia-for-loop-returns-in-an-array-or-dataframe).The text was updated successfully, but these errors were encountered: