-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Base.isunique
#15803
Comments
I agree that this is a nice thing to have, but I'm maybe we can find a
better name? It's not really about whether the argument is unique, but
whether it has unique elements. I feel that this could lead to some
confusion.
I guess that you would also want to add a method that always returns true
for sets, and maybe for key iterators for dicts.
|
Another name I was thinking of is And of course there should be specialized methods when they make sense (eg also for subtypes of If you agree on a name, I would be happy to submit a PR. |
|
+1 for To bolster @tpapp's point, I did a quick search of registered Julia packages, and found at least 10 packages which use this idiom. A couple of additional suggestions:
|
+1 This introduces simplification wlog and improves 'at a glance' understanding. |
Is this going to cause confusion with functions (that don't currently exist, but might one day) to check if elements of an array alias each other? |
@stevengj: For the second use case (which actually uses the value of However, in the PR for this issue I don't want to complicate things, so I will just do a PR for Regarding the first use case: I really like the idiom with |
+1 for |
@tpapp, for the second use case, the point is you don't need to know the value of |
Sorry for restarting the bikeshedding so late, but what's the reason to prefer |
Scott proposed another possible name: |
@nalimilan: I would be fine with either as I see no a priori reason for picking one over the other (I guess this happens with names frequently); however, |
Choice of terms is indeed in large part arbitrary. But once a term is chosen, it's important IMHO to be consistent across the API. In the present case, I thought @StefanKarpinski @stevengj Any reason why you preferred |
FWIW, I like this argument for consistency. On Sunday, April 24, 2016, Milan Bouchet-Valat [email protected]
|
I don't think that saying the elements are "all unique" is correct. Each element is in itself unique, but the fact that they are all individually unique doesn't mean anything – that's true of any collection. When you say |
To argue against myself, the term "unique" means "unlike any other" which in this case could implicitly be taken to mean "within this collection" – and if each value is unlike any other in the collection then that is what we want. I guess I just had a preference for "distinct" since it focuses on the relationship between values and not a property of each value. Maybe either would be ok. |
Indeed no element is "unique" individually, but that also applies to "distinct". I really don't see the difference. :-) That's why I think we should keep the "unique" terminology. |
Yeah, I'm ok with |
I see the difference. A is distinct from B iff there is a distinction that distinguishes A from B (and this is the sense that alldistinct carries). A is unique iff there is exactly one occurrence of A (singleton types are allunique). |
It seems that the only thing that's preventing a merge is a decision on naming. I have a weak preference for While I understand that various arguments can be made for various names, IMO the benefit of having a function for this common idiom is the most important thing. |
I was under the impression that I won the fight for |
+1 on |
|
Thanks for the quick replies, made the change in the PR. |
Allows testing if elements in a collection are distinct (when compared with isequal). Terminates early in case of repeated elements, has methods for collections which are by construction distinct. Also added tests. See discussion at issue #15803.
Closed by 2cc803b |
I missed the change in opinion here, but |
Given that |
Not if x is a collection of collections. Elsewhere when we concatenate names like |
I actually liked the |
On reflection, I think I may have been a little hasty with my +1... I think that @StefanKarpinski's concession...
...was a mistake. Distinct-ness has a clear linguistic meaning, whereas unique is a little ambiguous in this case; "[They are] readily distinguishable" vs "the only one of its kind". I do see the merit in @tkelman's thought, though. I doubt there would be any confusion with that one. Based on which, It is worth noting however, that if we were to call this function |
If |
Would anyone object to renaming to |
Late to the party here but |
I don't see the problem with |
It's unrelated to |
The term |
I was actually quite surprised that the convention was so uniform, we have nothing else with an What is |
Perhaps the names at https://stat.ethz.ch/R-manual/R-devel/library/base/html/duplicated.html are better: |
Judging by
I'm guessing you mean "better" for DataFrames' usage rather than the function being discussed in this issue? |
I was actually suggesting |
That would indeed make sense if Base had a |
Wouldn't the more general version of this be a function which given a vector returns an Int vector of the same length where each value is the number of times that element has been seen at that point? I'm not entirely convinced that such a specific thing belongs in the standard library, however – it's pretty easy to write this function yourself, after all. |
Ah. I didn't really follow what "violate a unique key constraint ..." meant. The R precedent is good enough for me, final up/down on |
Honestly, I think I'm going to think |
I'm also just not concerned about ambiguity between |
What does the "all" mean in the name then? |
Given that we have |
(If you google |
How about allsame(). Doesn't it have the same argument for inclusion? |
@ymer That's easy to write efficiently as |
function allsame(X)
isempty(X) && return true
X1 = first(X)
return all(x -> isequal(x, X1), X)
end would be more correct (doesn't assume On the other hand Python doesn't seem to have an all_same function (though of the Python posted solutions on stackoverflow are buggy for the same reasons as above), nor does Ruby, nor does Haskell. If other languages don't include an |
But those languages don't have an |
I also prefer |
Maybe there is a common pattern here: function allsatisfy(pred, itr) # is there a nicer way to code this?
state = start(itr)
done(itr, state) && return true
(prev_elt, state) = next(itr, state)
while !done(itr, state)
(elt, state) = next(itr, state)
if pred(prev_elt, elt)
prev_elt = elt
else
return false
end
end
true
end
allsatisfy(<, 1:10) # true
allsatisfy(>, 1:10) # false
allsatisfy(==, 1:10) # false
allsatisfy(<, []) # true
allsatisfy(<, [1]) # true
allsatisfy(==, fill(1,10)) # true Caveats:
|
I am not attending these:
📆
Conceptually, anyunique is more crisp andclearly stated than anydistinct. |
A common idiom for testing if values in a collection
C
are unique isor variations, eg when the length
n
is known,These show up in various tests, assertions, inner constructors. However, if the goal is only to test for uniqueness, not to obtain a list of unique elements, then something like
is much more efficient: it can terminate early and avoids the construction of the intermediate array. Also, programmer's the intent is more clear.
I am of course reluctant to suggest adding functions to
Base
, but it would good to haveunique
andisunique
in the same module.The text was updated successfully, but these errors were encountered: