-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update metadata API #53
Conversation
Codecov ReportBase: 96.55% // Head: 95.34% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #53 +/- ##
==========================================
- Coverage 96.55% 95.34% -1.21%
==========================================
Files 1 1
Lines 58 43 -15
==========================================
- Hits 56 41 -15
Misses 2 2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I also changed the requirement from specific exception types to just throwing errors in the docstrings. |
I would propose to try to keep the momentum in this discussion (of course respecting your other duties 😄). I think that it is pretty clear what we need to decide:
The issue of column consisting of several chunks should be resolved in Parquet2.jl I think (as I have commented in #52). Thank you! |
I also commented in #52 . In response to your questions above (respectively):
So, in summary, yes, this PR looks good to me. |
Parquet2.jl can have more definitions than DataAPI.jl - this is OK. Let us just decide if
Then I would for now define
@nalimilan - what do you think. My original thinking was that every object supports metadata, but just for some objects there is no way to set metadata with e.g. @Tokazama - can you please also comment on this point, as you are creating a more general metadata mechanisms. |
It would be preferable to have
This is at the heart of why I'm hesitant to just return the underlying metadata storage type. There are going to be different approaches for how |
@Tokazama - so what would be your recommendation? I understand that:
@Tokazama - and what do you think about the default behavior for |
I can make
I can see how there's a bit of awkwardness around defining default keys, especially when we don't have default metadata elsewhere. It does leave us without any generic way of checking if an object has metadata though. |
Having something like I think it would be worth having |
I agree it's useful to have fallbacks returning BTW I realize we don't provide a way to check whether an object supports I also agree that we'd better add the |
This was intentional. I did not see a use case of querying it. I.e. in what situation would you want to add metadata without knowing if some object supports it? I will add |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
I was thinking about some package which would accept any Tables.jl table and attach metadata to it if possible. |
But what would be the use of it if after the operation you would not be sure if metadata was added? But maybe it makes sense. How would you call these functions PS. I have added |
I mean that the object would be returned to the caller, which would presumably choose a type that supports metadata if they care about it. For example the package could set column labels or units, which are nice but not strictly necessary to use the data.
Yeah, that's a reasonable choice. |
What I did:
I think that after this change (and comments) all that was expected is changed (but please comment of course, as usual maybe function naming will need discussion). The implemented changes are slightly breaking (as we stop returning |
Yeah, I think that should suffice. I've looked through the changes and they seem okay, but I haven't had time to meditate on them much. It's difficult to predict the implications of each decision, so the work put in on your end is definitely appreciated. |
Agreed. That is why initially we wanted with @nalimilan to define a minimal set of functions to give us most flexibility later. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@Tokazama + @ExpandingMan - is it OK to merge this PR as-is? |
This looks good to me, it seems to improve on most of what I commented on in my original issue.. In general, the only thing I'd ever get worried about are changes that for whatever reason lock us out of adding methods in the future, and I don't think any of this does that. Note also that I have not tagged Parquet2.jl yet so that'll give us a chance to run tests on that again. |
Is the default |
The method error is not explicitly thrown, it is only thrown if you call a function on an object it doesn't have a method for. Therefore you won't get this with |
I understand functionally what is happening here. I'm just wondering if there's a correct error to produce on my end. If there is no metadata collection attached (but it could have metadata attached), should it still be a |
The current state of this PR says that we should throw an error in this case without specifying error type. However, for example in DataFrames.jl with @nalimilan we decided to throw In general AFAICT we usually in Julia follow https://www.tensorflow.org/guide/versions rules for exceptions which, in particular state that changing the type of exception thrown is not breaking unless you documented the type of exception thrown (this is exactly now the docs do not specify exception type, but just state that exception is thrown). |
Thank you. I think the only thing that needs adding is |
@ExpandingMan - I have created an account 😄 and commented in https://gitlab.com/ExpandingMan/Parquet2.jl/-/issues/17 to keep track of todo things. If there will be no additional comments on this PR I will merge it on Monday, Oct 3, 2022 and make a 1.12.0 release. OK? |
I should comment on that as that may be the one thing that I don't feel is addressed in this PR. In the case of Parquet2.jl, exposing those methods is the only way to get the full metadata dict without copying it. I'm not sure I followed all of the above, but I still think there are likely to be a large number of cases like Parquet2.jl in which the metadata dict object we want is already "just sitting around" and not exposing a method to get it forces it to be copied. I don't think this needs to be addressed now as long as the door is open, but thought I'd mention it. |
Yeah, I don't think it's something set in stone. We're just trying to move forward conservatively so that we get work done now without requiring huge breaking changes down the road. |
Exactly - I think these methods in the future can be added to the main DataAPI.jl specification, but since this is only a performance optimization issue it can be left for later. I would just recommend to add a documentation of these two extra methods in Parquet2.jl. |
Thank you! I will release DataAPI.jl 1.12.0 soon. |
Fixes #52
In this PR I implement conclusions from #52 discussion.
Currently I implemented dropping of default methods. The question is - do we also want to drop
metadatakeys
andcolmetadatakeys
. I kept them returning()
by default for now. This is to make it easier to write generic functions (i.e. i check if there are some metadata keys, and if some type does not support metadata I get an empty tuple so I know there are no metadata without having to catch an error)