-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Api rework #34
Api rework #34
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this; I like the new piping workflow this will enable.
A couple overall comments before addressing the bullet points you raised:
- Where possible, please use/replicate the existing language/terminology I've chosen in the rest of the package, e.g
peaks
for referring to the indices of peaks,minprom
/maxprom
(instead of justmin
/max
, which is less descriptive and shadows themin
/max
functions), etc. - My goals for this package/API are to be maximally flexible in how it can be and is used; this directs my responses to your questions.
- Which functions should be exported, and what should they be named? Is possible to maintain compatability with old API for a transition phase?
The existing API should continue to be exported, for backwards compatibility and flexibility. Not everyone likes piping syntax, and the piping API depends on the existing functions, so there is no point to not exporting them. Additionally, some of the exported functions are not being "replaced" by the new piping functions (e.g. findnextmaxima
, ismaxima
, etc). The new functions can be added as methods to the existing functions.
- Do we need a non-mutating API?
In the context of piping, I agree it doesn't seem useful. But in keeping with the existing API, I think it makes sense to have copying/non-mutating options.
- Should Peaks.jl provide minima-finding functionality?
Yes. Although reversing the sign is easy and fairly obvious, that adds a copy/allocation that can be avoided with minima
focused functions.
Create great docstrings for all exported functions
Signatures, explanations and examples should be added to the existing docstrings (since they will be methods of the main function names)
For tests, I suggest chaining together existing tests/values.
Thanks for the thorough feedback. It is nice to have my suggestions taken seriously ^_^ I have (quite) a few responses to your initial comment 😅. I don't know if is the best format, but I will reply to them one by one. My suspicion is that for some points we will simply disagree. I am in more of a "tear it down and build a new one" mindset when it comes to the API, whereas you want to avoid breakage. Your approach is more sensible, but I would also find it a shame if the package will permanently make some suboptimal API choice(s) to avoid breakage. Particularly now that we will make significant changes anyway. But let's discuss particulars, and see if we can reach a conclusion on the way. For the points raised so far, I believe we can have our cake and eat it too, reaching both our goals. Responses to concrete points
I changed the terminology intentionally, as I feel like I made improvement. I thought this would have to be significantly breaking, but see now that a non-breaking rework implementing my changes may actually be possible. I will now argue for why the changes you exemplified with are valuable improvements IMO:
My goals with the rework is to make it more intuitive and user friendly. If we both manage to reach our goals, this will be the best peak finding package the world has seen 😄
I feel like the new function names will need to be the same as the old, as those are often the best names. But perhaps you are saying that we can add methods that take a named tuple as the first argument, and "unpack" the named tuple so into the old method of the same function? So that what is added here is mainly new methods, which would make it non-breaking? That sounds great for compatibility, so perfect for a transition phase. I do feel like documenting two very different API's extra work, and likely to be confusing to new users. It will also complicate/duplicate the internals slightly, increasing maintenance burden. Do we agree that whatever API we agree on, the old one will be deprecated in the long term?
My proposal is not about the piping syntax, that was more an afterthought. In the example code I posted in the original issue, the first examples show a non-piping syntax, which was the original idea. The main point of the API rework is working with named tuples instead of individual vectors.
I would easily pay the price (on behalf of the user) of an extra allocation, if the alternative is the all functionality needs to be duplicated for the minima focus functions. I just do not feel motivated to write that much internal code and bloat the codebase to avoid a single initial allocation for a small fraction of uses. Furthermore, I can hardly see the application where it is critical to avoid that allocation. With that said, if you are motivated to put in that work, the code bloat is likely not a large problem, so I would not have an issue with having the functionality. |
I agree, that was always a oversimplification that I compromised on; a location specific word is more correct. However, I do prefer
That is incorrect. julia> f(x; min) = min(x, min)
f (generic function with 1 method)
julia> f(2; min=1)
ERROR: MethodError: objects of type Int64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?
Stacktrace:
[1] f(x::Int64; min::Int64)
@ Main .\REPL[1]:1
[2] top-level scope
@ REPL[4]:1
However,
Yes please. Way back in the git history, you can find an example of how I deprecated a kwarg (
Yes exactly.
Any time peak-finding is done in a (relatively) hot loop, an extra allocation is an avoidable performance cost. As the maintainer, I accept the cost of dealing with the additional code. 🤪
No. While they are redundant in some sense, there is no runtime cost to the additional methods, and I don't think that the additional code will be too much work to maintain. I also don't think users will be overly confused by the documentation of the NamedTuple methods, because it is common for Julia to have functions with multiple methods where each method has slightly different argument types. |
Conclusions:
|
I'm ambivalent about method docstring order; I'm not sure how configurable that is. I guess it is order dependent? You are welcome to convert the examples in the readme to the new API. The next release will need to be breaking, as the return type of the old/current API functions will need to be changed to NamedTuples. In practice, this should be unnoticed for most cases, since I expect users are already using syntax that will destructure the return values. But, it is technically breaking. |
Wait, so you imagine that the functions will always return named tuples? I imagine this will be massively breaking, as
In other words neither the order nor the length of the named tuple will match the old returned tuple. This will essentially break all code as far I as can tell. We could try to reorder the named tuple in every function to match the old API, but that still has the issue of having variable length (old tuple is same length always, new named tuple varies depending on previously added fields). If we are going to this level of breakage, then the ideal/value of "don't break things" goes completely out the window, and we might as well change anything we feel is better. What I has in mind was that if you pass in distrinct vectors, a tuple is returned, maintaining the old API. If you pass in a named tuple, a named tuple is returned. This should allow both API's to stay around side by side. |
Hmmmm yeah it seems I hadn't fully thought that through. NamedTuples iterate/destructure by order, but they can destructure by name. The order of the NamedTuple should be consistent, and except for the peak heights/values atm, as the PR stands now, it has a required order for proms and widths of Here's what I suggest: You currently have Then we can update The remaining old API will be left as-is. |
I was very surprised to see that any remaining elements are simply ignored. That solves the problem I thought we had, which was that the length of the returned (named) tuple would matter to the variables being assigned by destructuring. But there is another issue. The current signatures are as follows:
If we settle on an established order of
It just seems more simple to just make it so that if you pass separate vectors in, you get a tuple of vectors out. And if you pass a NamedTuple in, you get a named tuple out. In other words, let the old and new API live side by side without crossover. It seems minimally disruptive to code using the old API, while allowing the new API to be "done right", or at least without historic constraints. Would that be so bad? |
Any new thoughts? Any options you feel very strongly about? |
I'm not following the first bit of your comment, so perhaps I wasn't clear when talking about a standard order; I was meaning only for the NamedTuple. So Other than being consistent (for type reasons), I really don't think the NamedTuple order is important, since the main interaction should normally be via named indexing or property access. Therefore, compromising by having Re: your point 2 on leaving the edges separate or zipping them, I'm ambivalent. On balance, it does seem likely that any use for edges will want both sides; but I've never needed the edges before so I don't have any personal experience to have more of an opinion. Footnotes
|
I think we may have talked partially past each other, but I think I understand now. The current suggestions as I understand it:
I like everything about this proposal. It is minimally breaking, while allowing both APIs for a smooth transition. I would like to define |
Yep that is all correct. |
Great! So that means that we are free to go ahead with implementation ^_^ |
As a general note, we seem to be quite misaligned in what we think is the best docstring or code. I value readability quite a lot for both, where you seem to value technical precision and performance quite a bit over readability. I can just say that as a user, I was put off by the amount of documented functions in the API reference, and the length of each docstring. It made it seem like an hour-long project to learn how to calculate a peak, and filter by it's width. I think that it is actually very easy to get started, and that this is lost in the amount of very good technical documentation. I also feel like a docstring does not have to inform the user about everything - that is what the online documentation is for. The docstring is, to me, a quick way to look up method signatures, and what each argument does exactly. It is more of a how to use, rather than how it works, and all finicky details. So simple and short language, and signatures without syntax that I only learned after more than a year of normal use, are really important in making a package accessible, the way I see it. With that off my chest, I am ready to wrap this up, hoping that you can take some of it into consideration. I do not think that we will agree on the details, so discussing them is often a little frustrating and takes a lot of effort. I have already put about 5x more work into this than I was prepared for, and it has dragged on for more than half a year. This discussion is extremely scattered, and I am not really finding it that productive to work out every discussion. So feel free to wrap this up in any way you see fit - my goal is just to get a named tuple API, so that I do not have to juggle multiple vectors and pass them in correctly myself. |
It has been 24 days since my last comment. My goal with it was to allow this PR to reach some conclusion, and be merged. Are you okay driving it home from here, or would you need something more from me? |
Hi! Yes I can take it from here; I've just been quite busy recently. I really appreciate and respect your effort throughout this PR! Sorry that we have such diametrically opposed ideals regarding docstrings/documentation. |
Ah okay, no problem! After all, it is your package, and so if there are disagreements it is only fair that you have the final say. I feel like the discussion was productive, it just took more effort than I was prepared for xD Yhea it seems like we have some different ideals, but I absolutely see where you come from, and so I have no issues letting you make the remaining calls from here. Good luck! |
…gument validation
This draft PR represents an effort to rework the API of Peaks.jl, based on the disuccion in #24. As of the time of writing, the PR is mainly a proof of concept and test of the user facing API. A lot of changes are still missing, which have been listed below:
ToDo
peakproms!(deepcopy(pks))
.Make all docstrings use# Optional keyword arguments
instead of [].Because hasfield(::NamedTuple) is not defined, we should changefield
toproperty
orfeature
everywhere. We could also tak of keys and valueshasfield(::NamedTuple, ::Symbol)
exists/works from Julia v1.6?Make a documentation page (Documenter.jl is great), ensuring that all relevant functions are discoverable.