Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argmax should support generators #27612

Closed
drewrobson opened this issue Jun 16, 2018 · 16 comments
Closed

argmax should support generators #27612

drewrobson opened this issue Jun 16, 2018 · 16 comments

Comments

@drewrobson
Copy link

drewrobson commented Jun 16, 2018

I noticed that maximum(i for i = 1:10) works but argmax(i for i=1:10) does not. This seems inconsistent.

Edit: I am proposing that argmax(f(x) for x ∈ itr) should work and return an x in itr that maximizes f(x), which is quite literally the mathematical definition of argmax. I have proposed a simple way to implement this here.

(originally proposed here)

@stevengj
Copy link
Member

What does argmax even mean for an unordered iterable collection like Set?

@drewrobson
Copy link
Author

That's a good point. When this function was called indmax, it was perhaps clearer that it should be used on collections with well defined linear indexing.

Actually, I just tried argmax(Dict("a" => 1, "b" => 2, "c" => 3)) and it returns "c" on 0.7-alpha. Cool! On 0.6, indmax returned 1, which makes very little sense and is I think related to your point.

@drewrobson
Copy link
Author

drewrobson commented Jun 16, 2018

Looks like argmax is now designed to work with any collection where keys is defined.
So argmax is not defined for an unordered iterable collection like Set. I suppose argmax could still be defined for a generator g if one were to take keys(g) to be 1:length(g), if that makes sense? I think this is consistent with the original spirit of indmax, where keys(v) for some vector v is 1:length(v). Edit: the goal is not to be consistent with the original spirit of indmax's linear indexing. I think this is better.

@drewrobson
Copy link
Author

drewrobson commented Jun 16, 2018

The following 1-liner is sufficient to implement this proposal: (edit: nope! Please read here instead)
Base.keys(g::Base.Generator) = 1:length(g)
After running that in the REPL, argmax(i for i=1:10) works and returns 10 as expected. The remaining question is whether this is desirable. I await further instructions!

@drewrobson
Copy link
Author

Maybe that's a bad idea, since g[10] doesn't actually work so keys(g) = 1:length(g) would be false advertising.

Alternatively, one could have a specialized definition for argmax for generators. The implementation would be nearly identical, except replace keys(g) with 1:length(g). This way, there's no need to define keys(g) in general if that is a concern.

@drewrobson
Copy link
Author

drewrobson commented Jun 16, 2018

I think I've found the right way to address @stevengj's point above. If we want to extend argmax to generators, I think we want to do it specifically for those generators g where keys(g.iter) is defined. So just as argmax(c) currently works for a collection c if and only if keys(c) is defined, argmax(g) should work for a generator g if and only if keys(g.iter) is defined.

Therefore I am tempted once again to introduce the following:

Base.keys(g::Base.Generator) = Base.keys(g.iter)

Then argmax(i for i = 1:10) works and returns 10. Even cooler,

argmax(i for i = reshape(1:9, 3, 3))

yields CartesianIndex(3, 3). On the other hand, argmax(i for i = Set(1:10)) errors just like argmax(Set(1:10)). LGTM, but please let me know if I've gone down the path of madness. =)

@drewrobson
Copy link
Author

Ok that was definitely madness. Sorry for the stream of consciousness (like watching a car crash in slow motion?).

After sleeping on it, I think what is needed is this:

Base.keys(g::Base.Generator) = g.iter

With this, we get a perfectly reasonable definition for pairs(g) and argmax(g).

Unlike before, this actually works on a slightly less trivial example such as argmax(-x^2 for x = -5:5), which yields 0.

This works because

for p in pairs(g)
    display(p)
end

yields

-5 => -25
-4 => -16
-3 => -9
-2 => -4
-1 => -1
0 => 0
1 => -1
2 => -4
3 => -9
4 => -16
5 => -25

Returning to the question of "what does argmax even mean for an unordered iterable collection like Set?", I think ordering doesn't matter. Mathematically, argmax is defined on a set without any need for ordering, so I think my new proposed solution actually aligns quite well to the mathematical sense.

@stevengj
Copy link
Member

stevengj commented Jun 17, 2018

Argmax isn’t normally defined on a set by itself, I thought?

@drewrobson
Copy link
Author

drewrobson commented Jun 17, 2018

Sorry for the confusion. Assuming g is a generator, then the "set" I was referring to would be g.iter and the function being applied to that set would be g.f. Therefore I think a generator is in fact a very natural thing to define argmax on.

Edit: I have updated my first post above to clarify what I am proposing. I'm not trying to define argmax on a set, I'm trying to define argmax on a generator, which consists of a set-like object (g.iter) and a function (g.f).

@drewrobson
Copy link
Author

This would also set things up perfectly for #27613, which is what originally led me to this.

@drewrobson drewrobson changed the title Since maximum(itr) supports generators, argmax(itr) should too argmax should support generators Jun 17, 2018
@ararslan
Copy link
Member

This is largely orthogonal #27613.

@stevengj
Copy link
Member

I agree that defining argmax for generators makes sense, whereas defining it for arbitrary iterators does not make sense to me.

@Nosferican
Copy link
Contributor

As long as there is a key that corresponds to the element I think argmax should work for it.

@drewrobson
Copy link
Author

Agreed! Currently, the docs say argmax(itr) -> Integer, which is less general than argmax really is on 0.7. If/when this issue turns into a PR, that's probably worth updating.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jun 21, 2018

The conclusion on the triage call is that argmax(f(x) for x in itr) is a syntax illusion that looks like it should return the x in itr which maximizes f(x) but it can't do that without violating abstractions because the generator object (f(x) for x in itr) doesn't expose x values or itr for that matter. What should be done for argmax at some point (it's not a breaking change, so not urgent) is adding a two-argument argmax(f, itr) which returns the first x in itr which maximizes f(x). In that context, argmax(v) can be seen as interpreting v as a function mapping its indices to its values.

@drewrobson
Copy link
Author

syntax illusion

Nicely put! It's clear now why maximum(g::Generator) works and argmax(g::Generator) does not. Case closed.

cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Nov 21, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Dec 17, 2020
StefanKarpinski pushed a commit that referenced this issue Dec 18, 2020
Defines a descending total order, `isgreater` (not exported), where unordered values like NaNs and missing are last. This makes defining min, argmin, etc, simpler and more consistent. Also adds 2-arg versions of findmax/min, argmax/min. Defines and exports the `isunordered` predicate for testing whether a value is unordered like NaN and missing. Fixes #27613. Related: #27639, #27612, #34674.

Thanks to @tkf, @StefanKarpinski and @drewrobson for their assistance with this PR.

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
Defines a descending total order, `isgreater` (not exported), where unordered values like NaNs and missing are last. This makes defining min, argmin, etc, simpler and more consistent. Also adds 2-arg versions of findmax/min, argmax/min. Defines and exports the `isunordered` predicate for testing whether a value is unordered like NaN and missing. Fixes JuliaLang#27613. Related: JuliaLang#27639, JuliaLang#27612, JuliaLang#34674.

Thanks to @tkf, @StefanKarpinski and @drewrobson for their assistance with this PR.

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants