Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changing globally the default list field name #6881

Closed
rluvaton opened this issue Dec 15, 2024 · 8 comments
Closed

Allow changing globally the default list field name #6881

rluvaton opened this issue Dec 15, 2024 · 8 comments
Labels
question Further information is requested

Comments

@rluvaton
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The arrow implementation of Java uses $data$ rather than item like in here, which cause schema mismatch if not making sure the field name are matching.

This is problematic as sometimes other places are the one that create the list (like DataFusion)

Describe the solution you'd like
I want to be able to globally configure the name of the list field

Describe alternatives you've considered
passing the wanted field name everywhere, although it's not always possible when the list is created somewhere else (like DataFusion)

Additional context
I'm able to submit a PR to add this if wanted

@rluvaton rluvaton added the enhancement Any new improvement worthy of a entry in the changelog label Dec 15, 2024
@tustvold
Copy link
Contributor

tustvold commented Dec 15, 2024

Did you have a concrete proposal on what this might look like?

although it's not always possible when the list is created somewhere else

I can't help feeling this makes your proposal impractical, external systems are free to name this whatever they feel like.

which cause schema mismatch if not making sure the field name are matching

Where unifying heterogenous schema, you will need to cast/coerce to a common type.

@alamb
Copy link
Contributor

alamb commented Dec 15, 2024

I wonder if the core problem is that the logic that compares schemas treats Lists with different field names as semantically different, even when it seems as if a different field name doesn't actually make the type semantically different.

Maybe we could look into changing the comparison logic to ignore the list field name 🤔 (same for LargeList, FixedSizedList, Union, etc)?

@tustvold
Copy link
Contributor

tustvold commented Dec 15, 2024

Maybe we could look into changing the comparison logic to ignore the list field name 🤔 (same for LargeList, FixedSizedList, Union, etc)?

I think there is a broader and somewhat separate question around logical schema equality, that concerns not only this but metadata and possibly even logical type equality (is StringView the same as StringArray).

IMO we shouldn't be opinionated here and instead leave this to systems integrating arrow.

different field name doesn't actually make the type semantically different

Making judgements on the semantic meaning of metadata is tricky, although I do agree in that i struggle to conceive of what semantic one might attach to a field name, but there must be some reason arrow encodes it.

@alamb
Copy link
Contributor

alamb commented Dec 15, 2024

I think there is a broader and somewhat separate question around logical schema equality, that concerns not only this but metadata and possibly even logical type equality (is StringView the same as StringArray).

IMO we shouldn't be opinionated here and instead leave this to systems integrating arrow.

That seems like a reasonable approach. However, since DataType implements PartialEq I think that by necessity enforces some idea of what schema equality means

If we had to do it again, maybe we would support different types of Eq for DataType somehow (or not support the Eq trait and force downstream users to decide how to compare different Fields)

@tustvold
Copy link
Contributor

I think providing Eq that implements an exact match is not only consistent with the trait, but also important for things like tests.

@alamb
Copy link
Contributor

alamb commented Dec 16, 2024

I think providing Eq that implements an exact match is not only consistent with the trait, but also important for things like tests.

Well, but then it seems like the library is making assertions about what Schema equality means.

But yes, I do see your point that practically speaking removing PartialEq / Eq implementations would likely cause more pain than it was worth

Making it easier (and clearly documented) to compare schemas with different, common ideas of equality (e.g. ignore list field names) I think would add significant value by avoiding seemingly unnecessary schema mismatch issues

@tustvold
Copy link
Contributor

Well, but then it seems like the library is making assertions about what Schema equality means.

I'll agree to disagree on whether providing equality on a datastructure is making a judgement on how engines should choose to enforce or not enforce logical schema equivalence.

Making it easier (and clearly documented) to compare schemas with different, common ideas of equality

Perhaps we can roll this issue into #6735?

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2024
@tustvold tustvold added question Further information is requested and removed enhancement Any new improvement worthy of a entry in the changelog labels Dec 17, 2024
@rluvaton
Copy link
Contributor Author

rluvaton commented Jan 7, 2025

FYI, I've opened a PR in Java Arrow to allow changing the default field name as it's not possible to create a list with specific field name

apache/arrow-java#488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants