Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make it easier to work with metadata by providing more default behaviors #56

Merged
merged 10 commits into from
Nov 9, 2022

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Nov 3, 2022

Fixes #55
Fixes #51

@Tokazama - in relation to #51 I ended up recommending to return a dictionary (not just an iterator). Do you think it would be acceptable, or you would prefer to require only an iterator of pairs?

The benefit of dictionary would be that later it is easier for users to work with it. I assume that if a given value does not have already an underlying dictionary to store metadata (most have) then just Dict will be used since this is a non-lazy interface.

If someone wants a lazy iterator then using metadatakeys and then getting the required metadata is an option that is available.

@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Base: 95.34% // Head: 96.36% // Increases project coverage by +1.01% 🎉

Coverage data is based on head (dfab13a) compared to base (ca91305).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   95.34%   96.36%   +1.01%     
==========================================
  Files           1        1              
  Lines          43       55      +12     
==========================================
+ Hits           41       53      +12     
  Misses          2        2              
Impacted Files Coverage Δ
src/DataAPI.jl 96.36% <100.00%> (+1.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Tokazama
Copy link

Tokazama commented Nov 3, 2022

Do you think it would be acceptable, or you would prefer to require only an iterator of pairs?

I think the tricky part is where that style argument comes in. Are we explicitly requiring valtype(metadata(x)) <: Tuple{Any,Symbol}?

@bkamins
Copy link
Member Author

bkamins commented Nov 3, 2022

Are we explicitly requiring valtype(metadata(x)) <: Tuple{Any,Symbol}?

Yes, that is why I assume that this is not lazy. In general this dictionary would have to be freshly allocated if it were mutable - some packages, e.g. Parquet2.jl expose non-mutable dictionaries, in which case they do not need to allocate (actually both if style=false and if style=true just with different values). For this reason, if performance matters metadatakeys should be used as it is non-allocating.

So this would be a convenience functionality (per @nathanrboyer request).

@Tokazama
Copy link

Tokazama commented Nov 4, 2022

That makes a lot of sense. Should we have some sort of documentation that makes it clear that setindex!(metadata(x),...) isn't the same as metadata!(x...)?

@nalimilan
Copy link
Member

Makes sense. I also think the docs should mention that the resulting object should not be modified as it is owned by x, and that it may have to be freshly allocated.

src/DataAPI.jl Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member

Also, maybe we should define a fallback Dict(key => metadata(df, key) for key in metadatakeys(df))?

@bkamins
Copy link
Member Author

bkamins commented Nov 4, 2022

Following the comments I have:

  1. implemented fallback methods for metadata and colmetadata
  2. split docstings
  3. added a comment both for metadata and colmetadata that the returned dictionary should be only used for reading data and that it may be freshly allocated.

src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Nov 5, 2022

@ExpandingMan - can you please have a look at this PR before it is merged and released (it will have a minor in fluence on Parquet2.jl implementation, but actually it will go along what you partially have already). Thank you!

src/DataAPI.jl Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@ExpandingMan
Copy link

Looks good to me.

I haven't actually tested anything against this but it looks non-breaking.

@bkamins bkamins merged commit 69313ee into main Nov 9, 2022
@bkamins bkamins deleted the bk/metadata branch November 9, 2022 14:40
@bkamins
Copy link
Member Author

bkamins commented Nov 9, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

default to metadata! style=:default Add method for iterating metadata
4 participants