-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full package review for v0.3.0 #394
Conversation
…t_params_* functions
… is_epidist_params
…parameter matching
…d dists with strict matching
…st for exponential dist
Co-authored-by: Kelly McCain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joshwlambert for opening the full package review! It's really good work and a big achievement 🙌
I honestly do not have much to remark. I left some comments for your consideration, but nothing major. I am sure I missed minor issues that can still be improved, but it looks good overall. It is a lot of code so I am hoping for some more eyes on this because I'll inevitably miss aspects (I too, get tired 😄 ).
I will leave you with two more general questions:
- Is there a general testing strategy that you applied? It is a lot right now and it would help me understand whether there are any additional strategic considerations to make. For example, did you ensure for every positive (success) test you also had a negative (failure) test to ensure it is not giving a false negative?
- You know you may get comments on the use of the for loops, instead of vectorizing 😄 I honestly do not mind, but to preempt this question, could you expand on the use of for loops instead of vectorizing in some places?
#' @return A boolean `logical` whether the object is a valid `<epiparameter>` | ||
#' object. | ||
#' @export | ||
test_epiparameter <- function(x) { # nolint cyclocomp_linter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this nolint
a few times now, may it be easier to be more explicit by setting the cyclocomp
parameter to allow deeper complexity? Or leave a comment why this nolint is relevant specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having the lintr
check the cyclomatic complexity as I feel it does help me not to write overly complex functions. The default is 15 (https://lintr.r-lib.org/reference/cyclocomp_linter.html) which I think I'll leave the same for now. As you can see I don't comply with the cyclocomp linter too strictly by occasionally using # nolint cyclocomp_linter
.
I think the best approach would either be to comply with the linter more strictly and remove the # nolint
flags or use them sparingly like the current approach. I think all other Epiverse-TRACE packages use the default linter so it would be good to try and stay consistent across the organisation.
lapply(lst, function(x) { | ||
if (nse_subject %in% names(x)) { | ||
# <epiparameter> is only nested once so no need for recursive search | ||
eval(expr = condition, envir = x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels slightly vulnerable to evaluate an expression without adding some checks to it. It is an opportunity for some code to be injected unknowingly. If you can add a check to ensure it's (semi) in the format you would expect, that would make the function more robust as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this suggestion, but nothing comes immediately to mind as to how we could test this conditions before they get evaluated. Do you have ideas for how to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @joshwlambert. I just wanted to get down some, predominantly high-level, comments about the epiparameter class before any CRAN release. As I had limited time I have had to focus on R/epiparameter.R.
I'd like to take a further look at the rest of the package and the vignettes (at a glance they look very thorough) but this won't happen in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments on this file:
- At present
new_epiparameter()
is only used withinepiparameter()
. Is it worth pulling the functionality in toepiparameter()
itself. A minimal constructor is normally useful when you need to quickly recreate an object after a method has dropped it's class. chkDots()
seems to be used in some methods but not others.- I see you use
assert_epiparameter()
at the start of some epiparameter methods in other files (e.g.convert_params_to_summary_stats.epiparameter()
). Is this a "belts and braces" approach or do you think there is something fragile about the epiparameter class that makes it likely to be broken by users? - As discussed offline, I wonder if there is an intermediate more tidy-like tabular data structure that would be useful. I'm not sure but leaving here for my own pondering ...
library(jsonlite)
library(dplyr)
dat <- read_json(
path = system.file(
"extdata",
"parameters.json",
package = "epiparameter",
mustWork = TRUE
)
)
out <- lapply(
dat,
function(x) {
x[4:9] <- lapply(x[4:9], list)
as_tibble(x)
}
)
(out <- bind_rows(out))
#> # A tibble: 125 × 9
#> disease pathogen epi_name probability_distribu…¹ summary_statistics
#> <chr> <chr> <chr> <list> <list>
#> 1 Adenovirus Adenovi… incubat… <named list [2]> <named list [8]>
#> 2 Human Coronavirus Human_C… incubat… <named list [2]> <named list [8]>
#> 3 SARS SARS-Co… incubat… <named list [2]> <named list [8]>
#> 4 Influenza Influen… incubat… <named list [2]> <named list [8]>
#> 5 Influenza Influen… incubat… <named list [2]> <named list [8]>
#> 6 Influenza Influen… incubat… <named list [2]> <named list [8]>
#> 7 Measles Measles… incubat… <named list [2]> <named list [8]>
#> 8 Parainfluenza Parainf… incubat… <named list [2]> <named list [8]>
#> 9 RSV RSV incubat… <named list [2]> <named list [8]>
#> 10 Rhinovirus Rhinovi… incubat… <named list [2]> <named list [8]>
#> # ℹ 115 more rows
#> # ℹ abbreviated name: ¹probability_distribution
#> # ℹ 4 more variables: citation <list>, metadata <list>,
#> # method_assessment <list>, notes <list>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Updated the comment above as forgot to save it before submitting review.
} | ||
} | ||
|
||
if (epi_name == "offspring_distribution") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epi_name
appears to be free form but this is quite specific. Is it a remnant from past design? Is it worth being more restrictive on values allowed for epi_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some improvements to this in PR #401 to try and match non-delay distributions, which currently include offspring distributions and case fatality risks.
The data dictionary used in the JSON validation workflow uses an enum
field to check that the epi_name
is within a predefined set https://github.com/epiverse-trace/epiparameter/blob/main/inst/extdata/data_dictionary.json#L23.
However, this is a temporary solution and needs to be improved to be more scalable as more non-delay distributions are added to the package, and to accommodate users creating <epiparameter>
objects with a variety of parameter names. We could impose that the epi_name
argument must also be within the same set as defined in the data dictionary in when specified by the user in epiparameter()
but I don't think there would be that much benefit, while it could hinder users and put more development burden adding more parameter types.
Happy to discuss this further to find a more optimal solution. This can optionally be filed as an issue to continue discussion after the package review.
) | ||
|
||
# call epiparameter validator | ||
assert_epiparameter(epiparameter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed? It seems like you have done this validation within this function itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is superfluous. I like having it to ensure the object created is valid. I think it is impossible to get to the assert_epiparameter()
call with an invalid <epiparameter>
object, but just as an extra layer of defence I've added it.
As the time to run this extra function is negligible I do not think there is much reason to remove, but if users were creating many thousand/million <epiparameter>
objects this would be a good first thing to remove for a speed gain.
If there are other reasons to remove that I've overlooked please let me know.
Thank you both for helpful comments and suggestions @chartgerink & @TimTaylor! I've made several PRs with improvements resulting from this review which are either linked in my responses, or linked at the bottom of this PR (or both). In response to some other questions and comments:
There currently is not a testing strategy. I usually try and write unit tests for both exported and internal functions for expected behaviour under most common usage scenarios, and then a couple of failure expectations to check the function errors nicely. Given the namespace of this package is quite large (relative to other Epiverse-TRACE package), I've thought about moving to only testing exported functions. Some tests have been added to check that issues are resolved correctly, so it's been a fairly organic process of adding tests over time. I'd be happy to discuss possible testing strategies to make the process a bit more formal.
There is often not a formal logic on when to use vectorisation versus loops. Usually when I write code I will write loops (that's just how my brain works). Then I'll usually see how the code works once drafted and see if it can be optimised, usually by vectorising loops. Sometimes I've found I'm not able to vectorise so the loop remains, others I've found the loop more readable so left as is. If there are specific functions that contain either loops or vectorised statements that you think could be converted to the other, please raise them as an issue and I'd be happy to update (e.g, epiverse-trace/simulist#150).
Currently, the split in functionality is
Yes on the latter. It is incase an
This relates to #362, I will copy over the code chunk to that issue to continue discussion there. I won't make any changes with regard to this point before the release, but can give it some thought over the next development cycle. |
This PR is to provide a platform to review the entirety of the package.
Once this review concludes I will release v0.3.0 on GitHub and submit to CRAN.
Please see the
NEWS.md
file for an overview of changes between v0.2.0 and v0.3.0. If you would prefer to review with a partial package review only showing the changes between v0.2.0 and v0.3.0 please let me know and I can open one.This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.