Api rework #34

KronosTheLate · 2023-09-22T14:19:49Z

This draft PR represents an effort to rework the API of Peaks.jl, based on the disuccion in #24. As of the time of writing, the PR is mainly a proof of concept and test of the user facing API. A lot of changes are still missing, which have been listed below:

ToDo

halleysfifthinc

Thanks for working on this; I like the new piping workflow this will enable.

A couple overall comments before addressing the bullet points you raised:

Where possible, please use/replicate the existing language/terminology I've chosen in the rest of the package, e.g peaks for referring to the indices of peaks, minprom/maxprom (instead of just min/max, which is less descriptive and shadows the min/max functions), etc.
My goals for this package/API are to be maximally flexible in how it can be and is used; this directs my responses to your questions.

Which functions should be exported, and what should they be named? Is possible to maintain compatability with old API for a transition phase?

The existing API should continue to be exported, for backwards compatibility and flexibility. Not everyone likes piping syntax, and the piping API depends on the existing functions, so there is no point to not exporting them. Additionally, some of the exported functions are not being "replaced" by the new piping functions (e.g. findnextmaxima, ismaxima, etc). The new functions can be added as methods to the existing functions.

Do we need a non-mutating API?

In the context of piping, I agree it doesn't seem useful. But in keeping with the existing API, I think it makes sense to have copying/non-mutating options.

Should Peaks.jl provide minima-finding functionality?

Yes. Although reversing the sign is easy and fairly obvious, that adds a copy/allocation that can be avoided with minima focused functions.

Create great docstrings for all exported functions

Signatures, explanations and examples should be added to the existing docstrings (since they will be methods of the main function names)

For tests, I suggest chaining together existing tests/values.

src/api_rework.jl

KronosTheLate · 2023-09-24T06:50:15Z

Thanks for the thorough feedback. It is nice to have my suggestions taken seriously ^_^

I have (quite) a few responses to your initial comment 😅. I don't know if is the best format, but I will reply to them one by one. My suspicion is that for some points we will simply disagree. I am in more of a "tear it down and build a new one" mindset when it comes to the API, whereas you want to avoid breakage. Your approach is more sensible, but I would also find it a shame if the package will permanently make some suboptimal API choice(s) to avoid breakage. Particularly now that we will make significant changes anyway.

But let's discuss particulars, and see if we can reach a conclusion on the way. For the points raised so far, I believe we can have our cake and eat it too, reaching both our goals.

Responses to concrete points

Where possible, please use/replicate the existing language/terminology I've chosen in the rest of the package, e.g peaks for referring to the indices of peaks, minprom/maxprom (instead of just min/max, which is less descriptive and shadows the min/max functions), etc.

I changed the terminology intentionally, as I feel like I made improvement. I thought this would have to be significantly breaking, but see now that a non-breaking rework implementing my changes may actually be possible. I will now argue for why the changes you exemplified with are valuable improvements IMO:

I feel like a peak is characterized by more than its location (mainly height, but also prominence, width etc). It is therefore not intuitive to me that "peaks" would mean the indices of the peak location, particularly now that we will have it as a field of a named tuple. Conceptually the named tuple represents the peaks, so to me, pks.peaks would read like "the peaks of the peaks". I feel like a field inds, belonging to an object containing peaks, is extremely explicit and therefore better. This is mainly a matter of internal variable names and documentation, and will only reach the public API as the names of the fields of the returned named tuple. It will therefore be non-breaking to change peaks to inds internally in the same release that adds named tuple methods.
minprom/maxprom is more verbose than min/max, and when we are talking about the keyword arguments for a function findproms!, it is also clear that it is redundant information. When it comes to the functions by the same name, I do not see the issue. Programmatically keywords do not shadow function names, and mentally the context (kwarg vs function name) distinguishes them sufficiently. Finally, keyword arguments min/max can be the same for all the functions that calculate and filter peak features, meaning that the API is more consistent and streamlined this way. This can be made non-breaking by making a signature peaksproms!(pks; minprom=nothing, maxprom=nothing, min=minprom, max=maxprom), and using min and max internally. This would allow a smooth transition phase, and heck, it would even be free to keep both versions around. The only downside is that it may be confusing to the user to have multiple kwarg for the same thing, but I feel like the gain is greater than that cost.

My goals for this package/API are to be maximally flexible in how it can be and is used; this directs my responses to your questions.

My goals with the rework is to make it more intuitive and user friendly. If we both manage to reach our goals, this will be the best peak finding package the world has seen 😄

The existing API should continue to be exported, for backwards compatibility and flexibility.

I feel like the new function names will need to be the same as the old, as those are often the best names. But perhaps you are saying that we can add methods that take a named tuple as the first argument, and "unpack" the named tuple so into the old method of the same function? So that what is added here is mainly new methods, which would make it non-breaking? That sounds great for compatibility, so perfect for a transition phase.

I do feel like documenting two very different API's extra work, and likely to be confusing to new users. It will also complicate/duplicate the internals slightly, increasing maintenance burden. Do we agree that whatever API we agree on, the old one will be deprecated in the long term?

Not everyone likes piping syntax,

My proposal is not about the piping syntax, that was more an afterthought. In the example code I posted in the original issue, the first examples show a non-piping syntax, which was the original idea. The main point of the API rework is working with named tuples instead of individual vectors.

Yes. Although reversing the sign is easy and fairly obvious, that adds a copy/allocation that can be avoided with minima focused functions.

I would easily pay the price (on behalf of the user) of an extra allocation, if the alternative is the all functionality needs to be duplicated for the minima focus functions. I just do not feel motivated to write that much internal code and bloat the codebase to avoid a single initial allocation for a small fraction of uses. Furthermore, I can hardly see the application where it is critical to avoid that allocation. With that said, if you are motivated to put in that work, the code bloat is likely not a large problem, so I would not have an issue with having the functionality.

halleysfifthinc · 2023-09-25T16:23:13Z

I feel like a peak is characterized by more than its location (mainly height, but also prominence, width etc) ... Conceptually the named tuple represents the peaks, so to me, pks.peaks would read like "the peaks of the peaks".

I agree, that was always a oversimplification that I compromised on; a location specific word is more correct. However, I do prefer indices over inds. (An extra couple characters is not the end of the world)

Programmatically keywords do not shadow function names

That is incorrect.

julia> f(x; min) = min(x, min)
f (generic function with 1 method)

julia> f(2; min=1)
ERROR: MethodError: objects of type Int64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?
Stacktrace:
 [1] f(x::Int64; min::Int64)
   @ Main .\REPL[1]:1
 [2] top-level scope
   @ REPL[4]:1

However, peakproms is the only function where the shadowing would be a problem, so we can prefix with Base and fix the issue.

This can be made non-breaking by making a signature peaksproms!(pks; minprom=nothing, maxprom=nothing, min=minprom, max=maxprom)

Yes please. Way back in the git history, you can find an example of how I deprecated a kwarg (strict used to have a different, more verbose, name iirc).

But perhaps you are saying that we can add methods that take a named tuple as the first argument, and "unpack" the named tuple so into the old method of the same function?

Yes exactly.

Furthermore, I can hardly see the application where it is critical to avoid that allocation.

Any time peak-finding is done in a (relatively) hot loop, an extra allocation is an avoidable performance cost. As the maintainer, I accept the cost of dealing with the additional code. 🤪

Do we agree that whatever API we agree on, the old one will be deprecated in the long term?

No. While they are redundant in some sense, there is no runtime cost to the additional methods, and I don't think that the additional code will be too much work to maintain. I also don't think users will be overly confused by the documentation of the NamedTuple methods, because it is common for Julia to have functions with multiple methods where each method has slightly different argument types.

KronosTheLate · 2023-09-25T17:11:09Z

Conclusions:

The field that contains the indices will be named indices
We will make the keyword arguments min and max
- We need to remember to prefix the base function with Base. to avoid internal shadowing
- We will use a signature like minprom=nothing, maxprom=nothing, min=minprom, max=maxprom to allow the old keyword arguments to work seamlessly
The old functions should not be changed. The new NamedTuple based API should only add methods to the original functions (both mutating and non-mutating).
The old API will stay around indefinitly. Extra care will have to be taken to properly document both API's in an understandable manner for users, in particular the docstrings.
- Do we agree that the new API is an improvement, and should be "fronted"? I.e. it should appear first in docstrings and the documentation.
- Perhaps the best is to define the docstrings seperately for the different methods, and advice users to give argument type information to find the correct docstring (? peakproms!(::NamedTuple<tab>).
This realease should all-in-all contain major changes, but be non breaking

halleysfifthinc · 2023-09-25T23:17:39Z

I'm ambivalent about method docstring order; I'm not sure how configurable that is. I guess it is order dependent? You are welcome to convert the examples in the readme to the new API.

The next release will need to be breaking, as the return type of the old/current API functions will need to be changed to NamedTuples. In practice, this should be unnoticed for most cases, since I expect users are already using syntax that will destructure the return values. But, it is technically breaking.

KronosTheLate · 2023-09-26T07:11:32Z

The next release will need to be breaking, as the return type of the old/current API functions will need to be changed to NamedTuples. In practice, this should be unnoticed for most cases, since I expect users are already using syntax that will destructure the return values. But, it is technically breaking.

Wait, so you imagine that the functions will always return named tuples? I imagine this will be massively breaking, as

The NamedTuple will always return the data first, not indices
The order of the named tuple fields/values changes based on the order it called in
The named tuples will have variable length, based on how many fields are added thus far.

In other words neither the order nor the length of the named tuple will match the old returned tuple. This will essentially break all code as far I as can tell. We could try to reorder the named tuple in every function to match the old API, but that still has the issue of having variable length (old tuple is same length always, new named tuple varies depending on previously added fields). If we are going to this level of breakage, then the ideal/value of "don't break things" goes completely out the window, and we might as well change anything we feel is better.

What I has in mind was that if you pass in distrinct vectors, a tuple is returned, maintaining the old API. If you pass in a named tuple, a named tuple is returned. This should allow both API's to stay around side by side.

halleysfifthinc · 2023-09-26T18:03:14Z

Hmmmm yeah it seems I hadn't fully thought that through. NamedTuples iterate/destructure by order, but they can destructure by name.

The order of the NamedTuple should be consistent, and except for the peak heights/values atm, as the PR stands now, it has a required order for proms and widths of (:proms, :widths, :edges), since the width calculation requires the prominences.

Here's what I suggest:

You currently have data as the first field, but I suggest we standardize on an order of (inds, vals, data, proms, widths, edges).

Then we can update findmaxima/findminima to return a NamedTuple (;inds, vals, data). That is non breaking since it keeps the current order and the extra data field will be ignored by any current syntax (either slurping then indexing, where index 3 doesn't currently exist so won't conflict, or by destructuring, where slurping is opt in, e.g. idxs, vals = findmaxima(x, w; strict) destructures by order and ignores everything after the first two indices). This also solves the entry point for the NamedTuple API when saving findpeaks for a separate PR.

The remaining old API will be left as-is.

src/api_rework.jl

KronosTheLate · 2023-09-27T17:51:07Z

I was very surprised to see that any remaining elements are simply ignored. That solves the problem I thought we had, which was that the length of the returned (named) tuple would matter to the variables being assigned by destructuring.

But there is another issue. The current signatures are as follows:

peakproms -> (peaks, proms)
peakwidths -> (peaks, widths, leftedge, rightedge)
peakheights -> (peaks, heights)

If we settle on an established order of (indices, heights, data, proms, widths, edges), only peakheights still works (the other cases would, with the new "established order", return heights where proms or widths, leftedge, rightedge was expected).
To fix this, we would have to change the order of the named tuple as needed to match the old API, which just seems a bit like bending the new API out of shape to match the old API. By "bending out of shape" is refer to the constant reordering, and the following 2 choices I don't like that we would have to make:

Put data on the third index, in between different peak features. It just feels off.
Keep leftedge and rightedge separate. I liked the change of putting them both into edges

It just seems more simple to just make it so that if you pass separate vectors in, you get a tuple of vectors out. And if you pass a NamedTuple in, you get a named tuple out. In other words, let the old and new API live side by side without crossover. It seems minimally disruptive to code using the old API, while allowing the new API to be "done right", or at least without historic constraints. Would that be so bad?

KronosTheLate · 2023-10-02T18:26:27Z

Any new thoughts? Any options you feel very strongly about?

halleysfifthinc · 2023-10-09T16:39:19Z

I'm not following the first bit of your comment, so perhaps I wasn't clear when talking about a standard order; I was meaning only for the NamedTuple. So (;indices, heights, data, proms, widths, edges) for the NamedTuple API.

Other than being consistent (for type reasons), I really don't think the NamedTuple order is important, since the main interaction should normally be via named indexing or property access. Therefore, compromising by having data as the third argument¹ allows transparently converting findmaxima/minima to return a NamedTuple (;inds, vals, data) like I suggested in my previous comment. And again, the rest of the old API can be left as is, and the new NamedTuple API can append the various new data following the new standard order.

Re: your point 2 on leaving the edges separate or zipping them, I'm ambivalent. On balance, it does seem likely that any use for edges will want both sides; but I've never needed the edges before so I don't have any personal experience to have more of an opinion.

As I see it, the only reason to include data at all is to enable piping. I don't imagine the data field will be directly accessed by users (vs by the functions) very often. ↩

KronosTheLate · 2023-10-11T09:27:58Z

I think we may have talked partially past each other, but I think I understand now. The current suggestions as I understand it:

findmaxima and findminima go from returning (idxs, vals) to returning (indices=idxs, heights=vals, data=data). Adding the field data is crucial for the named tuple based API. While technically breaking, it will not break usage as in retval = findmaxima(signal); retval[1]; retval[2], nor in a, b = findmaxima(signal), so breakage should be minimal.
The functions return tuples when given individual vectors, and named tuples when given named tuples. In the showcase below, all arguments can be appended with "!" with the same signature:
1. peakproms(peaks, x) -> (peaks, proms) and peakproms(pks::NamedTuple)->pks
2. peakwidths(peaks, x, proms) -> (peaks, widths, leftedge, rightedge) and peakproms(pks::NamedTuple)->pks
3. peakheights(peaks, heights) -> (peaks, heights) and peakheights(pks::NamedTuple)->pks
The named tuple pks will always be initially created by findmaxima/findminima, initializing it with the fields in the order indices, heights, data. Subsequent functions will append fields "to the end", meaning that the order of the named tuple is dependent on the order in which other fields, such as proms, are added.

I like everything about this proposal. It is minimally breaking, while allowing both APIs for a smooth transition. I would like to define findpeaks, but it makes sense to separate that into its own PR, and agree not to implement it here.

halleysfifthinc · 2023-10-11T16:52:30Z

Yep that is all correct.

KronosTheLate · 2023-10-11T17:12:40Z

Great! So that means that we are free to go ahead with implementation ^_^

KronosTheLate · 2024-01-25T10:24:01Z

As a general note, we seem to be quite misaligned in what we think is the best docstring or code. I value readability quite a lot for both, where you seem to value technical precision and performance quite a bit over readability. I can just say that as a user, I was put off by the amount of documented functions in the API reference, and the length of each docstring. It made it seem like an hour-long project to learn how to calculate a peak, and filter by it's width. I think that it is actually very easy to get started, and that this is lost in the amount of very good technical documentation.

I also feel like a docstring does not have to inform the user about everything - that is what the online documentation is for. The docstring is, to me, a quick way to look up method signatures, and what each argument does exactly. It is more of a how to use, rather than how it works, and all finicky details. So simple and short language, and signatures without syntax that I only learned after more than a year of normal use, are really important in making a package accessible, the way I see it.

With that off my chest, I am ready to wrap this up, hoping that you can take some of it into consideration. I do not think that we will agree on the details, so discussing them is often a little frustrating and takes a lot of effort. I have already put about 5x more work into this than I was prepared for, and it has dragged on for more than half a year. This discussion is extremely scattered, and I am not really finding it that productive to work out every discussion. So feel free to wrap this up in any way you see fit - my goal is just to get a named tuple API, so that I do not have to juggle multiple vectors and pass them in correctly myself.

KronosTheLate · 2024-02-18T09:31:23Z

It has been 24 days since my last comment. My goal with it was to allow this PR to reach some conclusion, and be merged. Are you okay driving it home from here, or would you need something more from me?

halleysfifthinc · 2024-02-19T18:32:53Z

Hi! Yes I can take it from here; I've just been quite busy recently. I really appreciate and respect your effort throughout this PR! Sorry that we have such diametrically opposed ideals regarding docstrings/documentation.

KronosTheLate · 2024-02-20T12:08:24Z

Ah okay, no problem! After all, it is your package, and so if there are disagreements it is only fair that you have the final say. I feel like the discussion was productive, it just took more effort than I was prepared for xD

Yhea it seems like we have some different ideals, but I absolutely see where you come from, and so I have no issues letting you make the remaining calls from here. Good luck!

…gument validation

KronosTheLate added 8 commits September 14, 2023 15:40

Ignore autogenerated .vscode folder

baa439e

Make parent file for rework, define "findpeaks"

d235238

implement api_rework as discussed

ccb918e

Remove test code from Peaks.jl

5cb66c3

Update readme to temp state

80b8280

Add sentence about mutation to todo section

7609635

add sentence about minima finding

24e3892

Remove temporary content from readme

aac8d5a

halleysfifthinc requested changes Sep 22, 2023

View reviewed changes

src/api_rework.jl Outdated Show resolved Hide resolved

src/api_rework.jl Outdated Show resolved Hide resolved

src/api_rework.jl Outdated Show resolved Hide resolved

src/api_rework.jl Outdated Show resolved Hide resolved

make findpeaks a oneliner

9ccd30a

halleysfifthinc reviewed Sep 26, 2023

View reviewed changes

src/api_rework.jl Outdated Show resolved Hide resolved

Don't discard initially calculated heights

4a0d9dc

KronosTheLate added 4 commits September 27, 2023 20:35

Export filterpeaks, improve internals in new API

0e20f16

Change linebreaks in "filterpeaks!"

369c072

Make use of "strict" in all functions

080e970

Fix some bugs

37839c8

KronosTheLate added 2 commits October 31, 2023 13:11

export old functions

6edefcf

improvements, polish docstrings

169e5dd

KronosTheLate added 2 commits January 25, 2024 11:03

swap check order filterpeaks!

9848205

Swap check order filterpeaks!

42aba3c

halleysfifthinc added 20 commits February 26, 2024 14:14

Merge branch 'master' into api_rework

83c53cb

[nfc] whitespace changes

36b71c7

Update/rewrite docstrings and doctests

df11d3a

Don't need to depwarn old kwargs for new functions

bd468cc

Rename up => hi

40ea9a3

Update peakproms!/peakwidths! to use ismaxima/isminima for ar…

9b7da88

…gument validation

Rewrite error for bad min/max order

5493b52

Fix unnecessary Missing union when input array doesn't have missings

598477c

Update doctest outputs

d5b4281

Update deprecated kwargs in tests

d805b04

Test doctests when running tests

5c98dab

Test depwarns

e791292

Test error for when first peak is not an extrema

1c7180d

Confirm error when namedtuple has widths OR edges

66df7ad

Confirm error in plotpeaks when first peak isnt an extrema

4939c3d

Update docstrings/doctests for utils functions

862ad61

Add filterpeaks! to the docs

adc1439

Add julia cache to CI docs build

925c9af

Dont try linking to Base docs (for the moment)

b2ec0a6

fix errant ref

f8b6f0e

halleysfifthinc marked this pull request as ready for review February 28, 2024 02:34

halleysfifthinc self-requested a review February 28, 2024 02:34

halleysfifthinc approved these changes Feb 28, 2024

View reviewed changes

halleysfifthinc merged commit a3f2ddd into halleysfifthinc:master Feb 28, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Api rework #34

Api rework #34

KronosTheLate commented Sep 22, 2023 •

edited by halleysfifthinc

Loading

halleysfifthinc left a comment

KronosTheLate commented Sep 24, 2023

halleysfifthinc commented Sep 25, 2023

KronosTheLate commented Sep 25, 2023

halleysfifthinc commented Sep 25, 2023

KronosTheLate commented Sep 26, 2023

halleysfifthinc commented Sep 26, 2023

KronosTheLate commented Sep 27, 2023 •

edited

Loading

KronosTheLate commented Oct 2, 2023

halleysfifthinc commented Oct 9, 2023

KronosTheLate commented Oct 11, 2023

halleysfifthinc commented Oct 11, 2023

KronosTheLate commented Oct 11, 2023

KronosTheLate commented Jan 25, 2024

KronosTheLate commented Feb 18, 2024

halleysfifthinc commented Feb 19, 2024

KronosTheLate commented Feb 20, 2024

Api rework #34

Api rework #34

Conversation

KronosTheLate commented Sep 22, 2023 • edited by halleysfifthinc Loading

halleysfifthinc left a comment

Choose a reason for hiding this comment

KronosTheLate commented Sep 24, 2023

Responses to concrete points

halleysfifthinc commented Sep 25, 2023

KronosTheLate commented Sep 25, 2023

halleysfifthinc commented Sep 25, 2023

KronosTheLate commented Sep 26, 2023

halleysfifthinc commented Sep 26, 2023

KronosTheLate commented Sep 27, 2023 • edited Loading

KronosTheLate commented Oct 2, 2023

halleysfifthinc commented Oct 9, 2023

Footnotes

KronosTheLate commented Oct 11, 2023

halleysfifthinc commented Oct 11, 2023

KronosTheLate commented Oct 11, 2023

KronosTheLate commented Jan 25, 2024

KronosTheLate commented Feb 18, 2024

halleysfifthinc commented Feb 19, 2024

KronosTheLate commented Feb 20, 2024

KronosTheLate commented Sep 22, 2023 •

edited by halleysfifthinc

Loading

KronosTheLate commented Sep 27, 2023 •

edited

Loading