-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow adding new levels to ordered arrays #342
Conversation
@@ -189,7 +180,8 @@ function merge_pools(a::CategoricalPool{T}, b::CategoricalPool) where {T} | |||
newlevs = copy(levels(a)) | |||
ordered = isordered(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and also line 178 and 175 are inconsistent with the description you gave. Namely you say that pool is marked unordered if any of merged pools is unordered. But these two lines follow a different rule if one of them is empty (as then it inherits orderedness from a
or b
)
@@ -189,7 +180,8 @@ function merge_pools(a::CategoricalPool{T}, b::CategoricalPool) where {T} | |||
newlevs = copy(levels(a)) | |||
ordered = isordered(a) | |||
else | |||
nl, ordered = mergelevels(isordered(a), a.levels, b.levels) | |||
ordered = isordered(a) && (isordered(b) || b ⊆ a) | |||
nl, ordered = mergelevels(ordered, a.levels, b.levels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remind me the rule when b ⊆ a
is true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? The idea is that if a
is ordered and b
a subset of a
, there's no reason to care about whether b
is ordered since a
contains more information anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make sure how b ⊆ a
is defined. Is it:
by removing some (or no) values of
a
without reordering the remaining values you can getb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's defined exactly like issubset
, without taking order into account. But mergelevels
only returns ordered=true
if orders are compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Is this documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. I've added more docs (no docstring for issubset
though since CategoricalPool
isn't exported and I don't want to pollute the documentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(assuming my understanding of how b
is a subset of a
is defined is correct)
Instead, mark arrays as unordered if one of the pools is unordered, if pools have incompatible orderings or if orders of all pairs of levels cannot be determined. This ensures that operations supported by any `AbstractArray` never fail for `CategoricalArray`. An additional `protected` setting could be added to manually decide to lock levels of ordered or unordered `CategoricalArray`s.
dd483d4
to
f2b9f73
Compare
Instead, mark arrays as unordered if one of the pools is unordered, if pools have incompatible orderings or if orders of all pairs of levels cannot be determined.
This ensures that operations supported by any
AbstractArray
never fail forCategoricalArray
. An additionalprotected
setting could be added to manually decide to lock levels of ordered or unorderedCategoricalArray
s.Fixes JuliaData/DataFrames.jl#2672.