-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metadata
#48
add metadata
#48
Conversation
Codecov ReportBase: 95.12% // Head: 96.55% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
==========================================
+ Coverage 95.12% 96.55% +1.42%
==========================================
Files 1 1
Lines 41 58 +17
==========================================
+ Hits 39 56 +17
Misses 2 2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review here; (getting ramped up in the new job and trying to get caught up a on a lot of Julia stuff + prioritize for the future). I like the name change to just metadata
and getting this merged. I can make the required changes in Arrow.jl.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Though we may want to hold this until we have an implementation ready in DataFrames.jl and/or in Tables.jl just in case we discover unanticipated needs.
I will do implementation in DataFrames.jl then and I understand that @quinnj will do implementation in Arrow.jl - right? @nalimilan - What would you want implemented in Tables.jl? |
I have started the implementation and see one problem. I propose to discuss it here (although it is DataFrames.jl specific). Assume that I want to add metadata to some data frame that does not have metadata yet. I run @nalimilan - is this OK for you? |
Yeah it sounds fine to create the dict on the first call to avoid returning BTW, instead of recommending users mutate the |
I started implementing it and it seems that we need The
where
Side note: to be able to add metadata to @nalimilan - do you have any additional thoughts on these two points? |
In JuliaData/DataFrames.jl#3055 I have drafted the implementation so you can see the important aspects of the design. |
Indeed I also realized that. :-/ Now that you list the requirements, One alternative would be to have an Cc: @Tokazama |
Thank you for the feedback. I think we have settled the design. If there will be no additional comments tomorrow I will start updating JuliaData/DataFrames.jl#3055. |
OK - I am starting to implement the API in DataFrames.jl 😄. |
I have started working on DataFrames.jl and I already have a decision to be made. In this PR we have
|
@nalimilan - I have added deletion to the API so that we can evaluate if we like it. |
Adding this to DataAPI sounds better than having a separate package, anyway empty definitions are cheap. The |
I am OK with But then for columns we will have:
? |
src/DataAPI.jl
Outdated
One of the uses of the metadata `style` is decision | ||
how the metadata should be propagated when `x` is transformed. This interface | ||
defines the `:none` style that indicates that metadata should not be propagated | ||
under transformations. All types supporting metadata allow at least this style. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan - maybe we should make this description more precise? Currently in DataFrames.jl I needed to make a decision when :none
metadata is kept and it is kept only in two cases:
DataFrame
constructor;copy
;
all other operations drop all :none
metadata. So, essentially both table level and column level :none
metadata are attached to a concrete instance of a table or its copies (this is a safest approach, i.e. making sure that indeed when metadata could be invalidated it is dropped). Are we in agreement here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given no response I will make the definition more precise.
metadatakeys(::Any) = () | ||
|
||
""" | ||
metadata!(x, key::AbstractString, value; style) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, maybe the syntax would be more natural as metadata!(x, key => value)
? That would allow extending this in the future to pass multiple pairs if it appears to be convenient.
A counter-argument is that setindex!
doesn't use that syntax, but it's almost never called that way since x[key] = value
is nicer. Of course both syntaxes could be allowed as they are not ambiguous (we will probably never allow keys to be pairs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first reaction it makes sense.
My only reservation was that in DataFrames.jl =>
is used for operation specification language, so we would have yet a third way to interpret =>
there.
The question is how would it look for colmetadata!
?
colmetadata!(x, col, key => value; style=style)
(which does not look that nice)
Also note that metadata!(x, key => value)
would not be allowed, you would need to write metadata!(x, key => value; style=style)
.
In summary - I would keep the things as they are here and consider what we design here a low-level API.
I assume that the extra new package planned (tentatively named TableMetadataTools.jl) will provide convenient high-level functions. In practice I even expect that if we define there:
caption!(table, str) = metadata!(table, "caption", str, style=:note)
caption(table) = metadata(table, "caption")
label!(table, col, str) = colmetadata!(table, col, "label", str, style=:note)
label(table, col) = colmetadata(table, col, "caption")
this will cover 95% of use cases of metadata in practice.
In summary - I propose to discuss a convenience high-level API in TableMetadataTools.jl, as I expect that in that package we will drop the requirement to specify style
which we have in low-level API, as in high level API all styles will be :note
.
I changed @nalimilan - can you please recheck and comment if it can be merged (I guess in JuliaData/DataFrames.jl#3055 we have converged with the implementation). Thank you! |
Thank you! The ball starts rolling (the metadata discussion is the most complex addition we made ever in the ecosystem) |
Following the discussion in JuliaData/DataFrames.jl#2961 I propose to have
getmetadata
be a function defined on DataAPI.jl level. In this way in particular:getmetadata
write appropriate metadata to arrow file;DataFrame
constructor taking a table can check if it supports metadata and if it does automatically attach this metadata to aDataFrame
;