-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow changing globally the default list field name #6881
Comments
Did you have a concrete proposal on what this might look like?
I can't help feeling this makes your proposal impractical, external systems are free to name this whatever they feel like.
Where unifying heterogenous schema, you will need to cast/coerce to a common type. |
I wonder if the core problem is that the logic that compares schemas treats Lists with different field names as semantically different, even when it seems as if a different field name doesn't actually make the type semantically different. Maybe we could look into changing the comparison logic to ignore the list field name 🤔 (same for LargeList, FixedSizedList, Union, etc)? |
I think there is a broader and somewhat separate question around logical schema equality, that concerns not only this but metadata and possibly even logical type equality (is StringView the same as StringArray). IMO we shouldn't be opinionated here and instead leave this to systems integrating arrow.
Making judgements on the semantic meaning of metadata is tricky, although I do agree in that i struggle to conceive of what semantic one might attach to a field name, but there must be some reason arrow encodes it. |
That seems like a reasonable approach. However, since If we had to do it again, maybe we would support different types of Eq for |
I think providing |
Well, but then it seems like the library is making assertions about what But yes, I do see your point that practically speaking removing Making it easier (and clearly documented) to compare schemas with different, common ideas of equality (e.g. ignore list field names) I think would add significant value by avoiding seemingly unnecessary schema mismatch issues |
I'll agree to disagree on whether providing equality on a datastructure is making a judgement on how engines should choose to enforce or not enforce logical schema equivalence.
Perhaps we can roll this issue into #6735? |
FYI, I've opened a PR in Java Arrow to allow changing the default field name as it's not possible to create a list with specific field name |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The arrow implementation of Java uses
$data$
rather thanitem
like in here, which cause schema mismatch if not making sure the field name are matching.This is problematic as sometimes other places are the one that create the
list
(like DataFusion)Describe the solution you'd like
I want to be able to globally configure the name of the list field
Describe alternatives you've considered
passing the wanted field name everywhere, although it's not always possible when the list is created somewhere else (like DataFusion)
Additional context
I'm able to submit a PR to add this if wanted
The text was updated successfully, but these errors were encountered: