Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: searchsorted new keyword argument arrayby #35418

Closed
wants to merge 3 commits into from
Closed

RFC: searchsorted new keyword argument arrayby #35418

wants to merge 3 commits into from

Conversation

KlausC
Copy link
Contributor

@KlausC KlausC commented Apr 9, 2020

Fixes #9429 and fixes #35381

The new keyword argument arrayby=identity is added to methods of searchsorted, searchsortedfirst, and searchsortedlast as a complement to by. Only one of the two
transformations must be specified (different from identity).
In contrast to by the values of arrayby is only applied to the elements of vector a, not to the comparison value, which is searched for.
It is assumed, that arrayby.(a) is sorted according to the order defined by other arguments.

@KlausC KlausC changed the title searchsorted new keyword argument arrayby RFC: searchsorted new keyword argument arrayby Apr 11, 2020
@KlausC
Copy link
Contributor Author

KlausC commented Apr 26, 2020

bump

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 5, 2020

I just wanted to post here that I'm paying attention to this and have discussed it with people on the Slack #triage channel and last week's triage call. The general feeling was that adding this new keyword argument here feels unfortunate and a bit ad hoc. It would be better to accomplish this by searchsorted on mapped arrays. Perhaps @mbauman can say a bit more on that.

@mbauman
Copy link
Member

mbauman commented May 5, 2020

Yeah, my thoughts are that this is a real pain-point and a great pull request but the solution is unsatisfactory to me — largely due to the arrayby name, I think.

It's worth noting the prior art here: @haampie did some experiments here in SortingSortingOut.jl, but that too seems fiddly. I favor composition of separate functions over embedded complexity, so @tkf's vision is more in-line with where I'd like to see this go. It can be reality today with MappedArrays and the like.

So is there space for a stopgap keyword argument like this? Maybe, I suppose, but I am having a really hard time coming up with a name that feels right. My first thought on arrayby was that it'd apply a function to the entire array, especially in contrast to by. Maybe elementsby would be slightly better, but it's kinda the same thing. We could use another keyword like transformation as you initially suggested, but that feels fiddly too. I'm imagining telling folks "you don't want to use by, you want the transformation that ignores your needle" and having a hard time justifying why there are two keyword arguments that are nearly identical except one works the way they want and the other doesn't. Telling them to lazily transform their array first is a much more attractive option.

@KlausC
Copy link
Contributor Author

KlausC commented May 6, 2020

Before I close this request, I want to explain my concerns. I am in no way in favour of any new keyword argument; I am not the first one, who complains a missing feature.
There also seems to be an agreement, that the current behaviour of by is unexpected, undocumented, and unnecessary. The fact that this issue surfaced al least 4 times during the last 6 years leads me to fear, that it will generate frustration again in the future. We should at least warn the users in the documentation. #35381 (comment)

So the natural way of fixing that would be to give by=f the meaning "the array is sorted by f" and nothing else. The fact, that the current implementation applies f to the search subject is, as I understand, a consequence of a clever programmer, who tried to leverage the comparison method of Ordering, not the result of a sophisticated design.

Unfortunately Julia is now in a phase of maturity, that does not seem to allow for "breaking changes". Somehow the argument reminds me of IBM's "that's not a bug - it's a feature" answer, you received from their support on many bug reports in the 70th and 80th: for approximately every misbehaviour (except program abort) you can construct a potential user who relies on exactly that misbehaviour.

composition of separate functions over embedded complexity

I appreciate @mbauman 's thougths. Actually after I detected that by does not work for me, I was looking for a kind of lazy evaluating mapped array, failed with Generator, and did not find the MappedArrays package. So I ended with a minimal implementation of a mapped array of my own.

@KlausC KlausC closed this May 7, 2020
@haampie
Copy link
Contributor

haampie commented May 7, 2020

Truth is that the sorting and ordering API has not changed much since 2013. An adequate API for Julia 2.0 is very much possible (I would think...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants