Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow adding new levels to ordered arrays #342

Merged
merged 4 commits into from
Apr 23, 2021
Merged

Allow adding new levels to ordered arrays #342

merged 4 commits into from
Apr 23, 2021

Conversation

nalimilan
Copy link
Member

Instead, mark arrays as unordered if one of the pools is unordered, if pools have incompatible orderings or if orders of all pairs of levels cannot be determined.
This ensures that operations supported by any AbstractArray never fail for CategoricalArray. An additional protected setting could be added to manually decide to lock levels of ordered or unordered CategoricalArrays.

Fixes JuliaData/DataFrames.jl#2672.

@nalimilan nalimilan requested a review from bkamins April 7, 2021 20:40
@@ -189,7 +180,8 @@ function merge_pools(a::CategoricalPool{T}, b::CategoricalPool) where {T}
newlevs = copy(levels(a))
ordered = isordered(a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line and also line 178 and 175 are inconsistent with the description you gave. Namely you say that pool is marked unordered if any of merged pools is unordered. But these two lines follow a different rule if one of them is empty (as then it inherits orderedness from a or b)

@@ -189,7 +180,8 @@ function merge_pools(a::CategoricalPool{T}, b::CategoricalPool) where {T}
newlevs = copy(levels(a))
ordered = isordered(a)
else
nl, ordered = mergelevels(isordered(a), a.levels, b.levels)
ordered = isordered(a) && (isordered(b) || b ⊆ a)
nl, ordered = mergelevels(ordered, a.levels, b.levels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remind me the rule when b ⊆ a is true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? The idea is that if a is ordered and b a subset of a, there's no reason to care about whether b is ordered since a contains more information anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure how b ⊆ a is defined. Is it:

by removing some (or no) values of a without reordering the remaining values you can get b?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's defined exactly like issubset, without taking order into account. But mergelevels only returns ordered=true if orders are compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Is this documented somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I've added more docs (no docstring for issubset though since CategoricalPool isn't exported and I don't want to pollute the documentation).

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming my understanding of how b is a subset of a is defined is correct)

Instead, mark arrays as unordered if one of the pools is unordered,
if pools have incompatible orderings or if orders of all pairs of levels
cannot be determined.
This ensures that operations supported by any `AbstractArray` never fail for
`CategoricalArray`. An additional `protected` setting could be added to
manually decide to lock levels of ordered or unordered `CategoricalArray`s.
@nalimilan nalimilan merged commit baabf2e into master Apr 23, 2021
@nalimilan nalimilan deleted the nl/orderedlevs branch April 23, 2021 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

describe fails on a categorical array column from RDatasets dataframe
2 participants