Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize and export getter names #190

Merged
merged 9 commits into from
Jan 22, 2023
Merged

standardize and export getter names #190

merged 9 commits into from
Jan 22, 2023

Conversation

cecileane
Copy link
Member

@crsl4 @pbastide this is ready for discussion, started on #80. I went with these principles:

  1. use verbs, like get... or is... or has....
  2. use edge as suffix if the function returns an edge, instead of a node. No suffix if the function returns a node, to get shorter names.
  3. use minor as a suffix if the function returns the minor instead of major edge. No suffix for major, again for shorter names, also because there's no minor option in some cases (e.g. parent node of a tree node).
  4. use a verb and a preposition if the function has 2 arguments, to follow the pattern I see in other julia functions like isa, indexin, typeof etc.

Questions:

  • Do you agree that we should export them? Personally, think it's time and they would get used. example: PhyNEST.jl requires PhyloNetworks, and they needed to define their own GetChild and childnode functions.
  • Any recommendations to change the names? Should we adopt different principles? Should the order of suffixes edge and minor be switched, when both are used?
  • Any other function we should define?
  • Any other function we should rename? For example, getOtherNode(edge, node)?

What I'll do later: think about descendants and functions to get node ages or distances between nodes.

Below is a recap of the name changes so far.

new functions:

For most of them, the goal is to avoid using the internal structure (e.g. node.leaf), which is not recommended. These function will be inlined by the compiler, so there is no cost to performance

getroot(network)
isrootof(node, network)
isleaf(node)
isexternal(edge)
isparentof(node, edge)
ischildof(node, edge)
hassinglechild(node) # used to get "the" child of a node, typically a hybrid node

existing functions, renamed and exported:

getchild(edge) # replaces getChild
getchild(node) # new method
getchildren(node)  # replaces getChildren
getchildedge(node) # replaced getChildEdge
getparents(node) # replaces getParents
getparent(edge) # replaces getParent
getparent(node) # replaces getMajorParent
getparentminor(node) # replaces getMinorParent
getparentedge(node)  # replaces getMajorParentEdge
getparentedgeminor(node) # replaces getMinorParentEdge
getpartneredge(edge) # replaces getPartner

@codecov
Copy link

codecov bot commented Dec 15, 2022

Codecov Report

Merging #190 (1112fbc) into master (4ee407a) will increase coverage by 0.02%.
The diff coverage is 97.66%.

@@            Coverage Diff             @@
##           master     #190      +/-   ##
==========================================
+ Coverage   86.27%   86.29%   +0.02%     
==========================================
  Files          29       29              
  Lines       12561    12560       -1     
==========================================
+ Hits        10837    10839       +2     
+ Misses       1724     1721       -3     
Impacted Files Coverage Δ
src/addHybrid_snaq.jl 60.33% <0.00%> (ø)
src/types.jl 82.71% <ø> (ø)
src/pairwiseDistanceLS.jl 95.70% <83.33%> (ø)
src/auxiliary.jl 90.67% <98.68%> (+0.53%) ⬆️
src/addHybrid.jl 94.62% <100.00%> (ø)
src/bootstrap.jl 70.26% <100.00%> (ø)
src/compareNetworks.jl 99.32% <100.00%> (ø)
src/deleteHybrid.jl 84.34% <100.00%> (ø)
src/graph_components.jl 99.15% <100.00%> (ø)
src/interop.jl 100.00% <100.00%> (ø)
... and 7 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@cecileane
Copy link
Member Author

I went with standard deprecations without forcing warnings even when depwarn=no because doing so is not recommended.

  • for individual analyses, users will be able to reproduce their analyses even if they used internal functions like PhyloNetworks.getChild, assuming that they keep their Manifest file and re-instantiate their project from the Manifest.
  • for packages that depend on PhyloNetworks (like PhyloPlots or PhyNEST), semantic versioning will prevent them from updating to the new version of PhyloNetworks without the developers intentionally increasing their "compat" entry for PhyloNetworks. When the developers will do so, they will first run their package's tests, and will see the deprecation warnings, so they will know what new function names to use.

We'll need to remember to increase the minor version before this change makes it to a new release.

@cecileane
Copy link
Member Author

For users, the documentation has information

@cecileane
Copy link
Member Author

Any other thoughts on this: about the new names? about exporting them?

@pbastide
Copy link
Member

pbastide commented Jan 5, 2023

Thank you Cécile for working on this !

I like the new names, and I agree that we should export them. The standard deprecation mechanism you chose seem indeed appropriate.

I have only maybe one comment about the functions : there are a lot of them, which could make them difficult to discover for the users. You already link some of them in the doc with a "See also" paragraph. Maybe we could increase the links there ? Eg refer to getchild in getparent, and vice versa ?

An other approach we could take is to use keywords arguments. So eg instead of having 4 functions: getparent, getparentminor, getparentedge, getparentedgeminor, we could have only one function with two keywords:

getparent(node; parent="node", type="major")

So that default getparent is the same as before, but we would use getparent(node, type="minor") to get the minor parent node for instance, etc.
The question then is how to choose the right keywords, these could also be booleans, eg

getparent(node; edge=false, major=true)

With this semantic, we have only one function, and all the features are easily discoverable (all are documented in the same docstring).

However, some drawbacks are that (1) the function is more complex; (2) there might be some strange combinations (eg getchild(edge, edge=true) does not exist with the named functions) and (3) it makes it a bit more difficult to replace existing functions (eg getMinorParent would need to be replaced by getparent(node, major=false)), and the call is maybe less transparent.

Anyway, these are just some thoughts, and I'm happy with the current naming scheme.

@cecileane
Copy link
Member Author

Thank you Paul!

I have only maybe one comment about the functions : there are a lot of them, which could make them difficult to discover for the users. You already link some of them in the doc with a "See also" paragraph. Maybe we could increase the links there ? Eg refer to getchild in getparent, and vice versa ?

Ah yes absolutely (duh!!). It would greatly help discoverability.

An other approach we could take is to use keywords arguments. So eg instead of having 4 functions: getparent, getparentminor, getparentedge, getparentedgeminor, we could have only one function with two keywords:

getparent(node; parent="node", type="major")

That's a very neat idea. My main worry is performance. Code that has forks / branches (depending on keywords) could become slower. Multiple dispatch is super-efficient in Julia, so having 2 methods for parent(edge) and parent(node) is extremely efficient. But with forks, I'm afraid we would lose efficiency. I haven't tried to implement the idea and run benchmarks, though.

With this semantic, we have only one function, and all the features are easily discoverable (all are documented in the same docstring).

Aha, that's another great idea: we could, in fact, document all these functions in a single docstring. That could actually be much better than many different docstrings that refer to each other. For short docstrings at least.

@cecileane
Copy link
Member Author

There could be a small overhead to use keyword arguments instead of different function names, based on the silly example below, so I'd rather stick to different functions.

I consolidated the docstrings (7 fewer than earlier). You can check a preview of the documentation starting around here.

I also fixed a function that would have bugged on networks with a hybrid ladder (sorry for adding unrelated changes!), and I plan to improve the docstring for the function that calculates the hardwired-cluster "distance".

function foo(x; type=:add, what=:one)
  if what == :one
    if type == :add
      fun = addone
    else
      fun = subone
    end
  else
    if type == :add
      fun = addtwo
    else
      fun = subtwo
    end
  end
  return fun(x)
end

@noinline function addone(x) return x .+ 1; end
@noinline function addtwo(x) return x .+ 2; end
@noinline function subone(x) return x .- 1; end
@noinline function subtwo(x) return x .- 2; end

foo(5.5)
@benchmark addone(rand(1)) # mean:  89.525 ns
@benchmark foo(rand(1))    # mean: 100.739 ns

@cecileane cecileane merged commit 948975d into master Jan 22, 2023
@cecileane cecileane deleted the dev branch January 22, 2023 16:23
@pbastide
Copy link
Member

Thank you for checking the keyword option, I hadn't think of the computational overhead.
The separate functions with the common docstrings seems like the best of both worlds.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants