Full package review for v0.3.0 #394

joshwlambert · 2024-10-09T16:56:39Z

This PR is to provide a platform to review the entirety of the package.

Once this review concludes I will release v0.3.0 on GitHub and submit to CRAN.

Please see the NEWS.md file for an overview of changes between v0.2.0 and v0.3.0. If you would prefer to review with a partial package review only showing the changes between v0.2.0 and v0.3.0 please let me know and I can open one.

This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.

…ispatch

…ch and do.call

…iples vignette

…ctions

…t_params_* functions

…n_epidist_params

… is_epidist_params

…parameter matching

…d dists with strict matching

…st for exponential dist

…lates #327

…view_to_epidist

Co-authored-by: Kelly McCain <[email protected]>

…onary

* updated NEWS with v0.3.0 bullets * updated WORDLIST

chartgerink

Thanks @joshwlambert for opening the full package review! It's really good work and a big achievement 🙌

I honestly do not have much to remark. I left some comments for your consideration, but nothing major. I am sure I missed minor issues that can still be improved, but it looks good overall. It is a lot of code so I am hoping for some more eyes on this because I'll inevitably miss aspects (I too, get tired 😄 ).

I will leave you with two more general questions:

Is there a general testing strategy that you applied? It is a lot right now and it would help me understand whether there are any additional strategic considerations to make. For example, did you ensure for every positive (success) test you also had a negative (failure) test to ensure it is not giving a false negative?
You know you may get comments on the use of the for loops, instead of vectorizing 😄 I honestly do not mind, but to preempt this question, could you expand on the use of for loops instead of vectorizing in some places?

R/accessors.R

R/checkers.R

chartgerink · 2024-10-14T14:55:42Z

R/epiparameter.R

+#' @return A boolean `logical` whether the object is a valid `<epiparameter>`
+#' object.
+#' @export
+test_epiparameter <- function(x) { # nolint cyclocomp_linter


I've seen this nolint a few times now, may it be easier to be more explicit by setting the cyclocomp parameter to allow deeper complexity? Or leave a comment why this nolint is relevant specifically?

I like having the lintr check the cyclomatic complexity as I feel it does help me not to write overly complex functions. The default is 15 (https://lintr.r-lib.org/reference/cyclocomp_linter.html) which I think I'll leave the same for now. As you can see I don't comply with the cyclocomp linter too strictly by occasionally using # nolint cyclocomp_linter.

I think the best approach would either be to comply with the linter more strictly and remove the # nolint flags or use them sparingly like the current approach. I think all other Epiverse-TRACE packages use the default linter so it would be good to try and stay consistent across the organisation.

R/epiparameter.R

R/epiparameter_db.R

chartgerink · 2024-10-14T15:12:21Z

R/epiparameter_db.R

+    lapply(lst, function(x) {
+      if (nse_subject %in% names(x)) {
+        # <epiparameter> is only nested once so no need for recursive search
+        eval(expr = condition, envir = x)


Feels slightly vulnerable to evaluate an expression without adding some checks to it. It is an opportunity for some code to be injected unknowingly. If you can add a check to ensure it's (semi) in the format you would expect, that would make the function more robust as well.

I like this suggestion, but nothing comes immediately to mind as to how we could test this conditions before they get evaluated. Do you have ideas for how to do this?

po/R-epiparameter.pot

TimTaylor

Hi @joshwlambert. I just wanted to get down some, predominantly high-level, comments about the epiparameter class before any CRAN release. As I had limited time I have had to focus on R/epiparameter.R.

I'd like to take a further look at the rest of the package and the vignettes (at a glance they look very thorough) but this won't happen in the near future.

TimTaylor · 2024-10-16T14:28:05Z

R/epiparameter.R

General comments on this file:

At present new_epiparameter() is only used within epiparameter(). Is it worth pulling the functionality in to epiparameter() itself. A minimal constructor is normally useful when you need to quickly recreate an object after a method has dropped it's class.

chkDots() seems to be used in some methods but not others.

I see you use assert_epiparameter() at the start of some epiparameter methods in other files (e.g. convert_params_to_summary_stats.epiparameter()). Is this a "belts and braces" approach or do you think there is something fragile about the epiparameter class that makes it likely to be broken by users?

As discussed offline, I wonder if there is an intermediate more tidy-like tabular data structure that would be useful. I'm not sure but leaving here for my own pondering ...

library(jsonlite) library(dplyr) dat <- read_json( path = system.file( "extdata", "parameters.json", package = "epiparameter", mustWork = TRUE ) ) out <- lapply( dat, function(x) { x[4:9] <- lapply(x[4:9], list) as_tibble(x) } ) (out <- bind_rows(out)) #> # A tibble: 125 × 9 #> disease pathogen epi_name probability_distribu…¹ summary_statistics #> <chr> <chr> <chr> <list> <list> #> 1 Adenovirus Adenovi… incubat… <named list [2]> <named list [8]> #> 2 Human Coronavirus Human_C… incubat… <named list [2]> <named list [8]> #> 3 SARS SARS-Co… incubat… <named list [2]> <named list [8]> #> 4 Influenza Influen… incubat… <named list [2]> <named list [8]> #> 5 Influenza Influen… incubat… <named list [2]> <named list [8]> #> 6 Influenza Influen… incubat… <named list [2]> <named list [8]> #> 7 Measles Measles… incubat… <named list [2]> <named list [8]> #> 8 Parainfluenza Parainf… incubat… <named list [2]> <named list [8]> #> 9 RSV RSV incubat… <named list [2]> <named list [8]> #> 10 Rhinovirus Rhinovi… incubat… <named list [2]> <named list [8]> #> # ℹ 115 more rows #> # ℹ abbreviated name: ¹probability_distribution #> # ℹ 4 more variables: citation <list>, metadata <list>, #> # method_assessment <list>, notes <list>

EDIT: Updated the comment above as forgot to save it before submitting review.

TimTaylor · 2024-10-16T14:31:02Z

R/epiparameter.R

+    }
+  }
+
+  if (epi_name == "offspring_distribution") {


epi_name appears to be free form but this is quite specific. Is it a remnant from past design? Is it worth being more restrictive on values allowed for epi_name?

I have made some improvements to this in PR #401 to try and match non-delay distributions, which currently include offspring distributions and case fatality risks.

The data dictionary used in the JSON validation workflow uses an enum field to check that the epi_name is within a predefined set https://github.com/epiverse-trace/epiparameter/blob/main/inst/extdata/data_dictionary.json#L23.

However, this is a temporary solution and needs to be improved to be more scalable as more non-delay distributions are added to the package, and to accommodate users creating <epiparameter> objects with a variety of parameter names. We could impose that the epi_name argument must also be within the same set as defined in the data dictionary in when specified by the user in epiparameter() but I don't think there would be that much benefit, while it could hinder users and put more development burden adding more parameter types.

Happy to discuss this further to find a more optimal solution. This can optionally be filed as an issue to continue discussion after the package review.

R/epiparameter.R

TimTaylor · 2024-10-16T15:09:26Z

R/epiparameter.R

+  )
+
+  # call epiparameter validator
+  assert_epiparameter(epiparameter)


Is this really needed? It seems like you have done this validation within this function itself.

I agree this is superfluous. I like having it to ensure the object created is valid. I think it is impossible to get to the assert_epiparameter() call with an invalid <epiparameter> object, but just as an extra layer of defence I've added it.

As the time to run this extra function is negligible I do not think there is much reason to remove, but if users were creating many thousand/million <epiparameter> objects this would be a good first thing to remove for a speed gain.

If there are other reasons to remove that I've overlooked please let me know.

R/epiparameter.R

joshwlambert · 2024-10-22T09:54:19Z

Thank you both for helpful comments and suggestions @chartgerink & @TimTaylor! I've made several PRs with improvements resulting from this review which are either linked in my responses, or linked at the bottom of this PR (or both).

In response to some other questions and comments:

Is there a general testing strategy that you applied? It is a lot right now and it would help me understand whether there are any additional strategic considerations to make. For example, did you ensure for every positive (success) test you also had a negative (failure) test to ensure it is not giving a false negative?

There currently is not a testing strategy. I usually try and write unit tests for both exported and internal functions for expected behaviour under most common usage scenarios, and then a couple of failure expectations to check the function errors nicely.

Given the namespace of this package is quite large (relative to other Epiverse-TRACE package), I've thought about moving to only testing exported functions. Some tests have been added to check that issues are resolved correctly, so it's been a fairly organic process of adding tests over time. I'd be happy to discuss possible testing strategies to make the process a bit more formal.

You know you may get comments on the use of the for loops, instead of vectorizing 😄 I honestly do not mind, but to preempt this question, could you expand on the use of for loops instead of vectorizing in some places?

There is often not a formal logic on when to use vectorisation versus loops. Usually when I write code I will write loops (that's just how my brain works). Then I'll usually see how the code works once drafted and see if it can be optimised, usually by vectorising loops. Sometimes I've found I'm not able to vectorise so the loop remains, others I've found the loop more readable so left as is.

If there are specific functions that contain either loops or vectorised statements that you think could be converted to the other, please raise them as an issue and I'd be happy to update (e.g, epiverse-trace/simulist#150).

At present new_epiparameter() is only used within epiparameter(). Is it worth pulling the functionality in to epiparameter() itself. A minimal constructor is normally useful when you need to quickly recreate an object after a method has dropped it's class.

Currently, the split in functionality is epiparameter() does the input checking and handles the parameter uncertainty after the <epiparameter> object has been constructed in new_epiparameter(). new_epiparameter() calculates the distribution parameters from summary statistics, if available. I could potentially merge them as, like you say, new_epiparameter() is only ever called within epiparameter(). I'd be interested to hear your opinions on when to have a minimal class constructor and whether this should be strictly applying the class attribute. I saw you had a similar issue in {incidence2} (reconverse/incidence2#103) so would be good to get your thoughts as I assume a similar design choice could be taken across packages.

chkDots() seems to be used in some methods but not others.

chkDots() should only be used in method where extra arguments should not be passed via .... I will check if they are being used consistently and whether any other methods should call it.

I see you use assert_epiparameter() at the start of some epiparameter methods in other files (e.g. convert_params_to_summary_stats.epiparameter()). Is this a "belts and braces" approach or do you think there is something fragile about the epiparameter class that makes it likely to be broken by users?

Yes on the latter. It is incase an <epiparameter> is dispatched but the object has been invalidated by a user between being imported/constructed and the generic being called. I currently haven't implemented any subsetting or assignment operator methods for <epiparameter> (e.g. $, [, etc.) so this is to offset that.

As discussed offline, I wonder if there is an intermediate more tidy-like tabular data structure that would be useful. I'm not sure but leaving here for my own pondering ...

This relates to #362, I will copy over the code chunk to that issue to continue discussion there. I won't make any changes with regard to this point before the release, but can give it some thought over the next development cycle.

joshwlambert and others added 30 commits June 10, 2024 15:33

Update CITATION.cff

e344b23

update clean_epidist_params to use switch and do.call instead of S3 d…

61f46c5

…ispatch

updated clean_epidist_params functions documentation

577ef07

update clean_epidist_params calls in accessors and new_epidist

fc5fbb0

updated clean_epidist_params tests

ece1bdc

add bullet point to design principles vignette on S3 dispatch vs swit…

fe9a010

…ch and do.call

add bullet point on dot prefix for internal functions to design princ…

759f1d7

…iples vignette

add dot prefix to clean_epidist_params and clean_epidist_params_* fun…

03cf0e8

…ctions

update .clean_epidist_params documentation to group all .clean_epidis…

27c730f

…t_params_* functions

add probability distribution name for more informative error in .clea…

bd6036e

…n_epidist_params

add "lambda" as possible parameterisation for poisson distribution in…

b4f80d1

… is_epidist_params

fixed error messages from .clean_epidist_params in unit tests

aede4e9

enforce stricter parameter matching in is_epidist_params

a301d67

remove .clean_epidist_params_weibull as it only has one parameterisation

7327502

simplify .clean_epidist_params_* functions logic with after stricter …

7df71f3

…parameter matching

fix is_epidist_params and .clean_epidist_params to work with truncate…

d164ba3

…d dists with strict matching

only append mean to prob_dist_params if in set of dists in new_epidist

f38e099

slightly increase tolerance in expect_equal for .read.epidist_db test

c402a9f

updated .clean_epidist_params documentation

00ccde8

add exponential distribution to is_epidist_params, relates #328

2bde263

added .clean_epidist_params_exp function, WIP #328

496447e

added exp option to switch in create_epidist_prob_dist, WIP #328

601340a

added exponential option to switch in family.epidist, closes #328

c59c5b5

added unit tests for .clean_epidist_params and create_epidist_prob_di…

57d1af3

…st for exponential dist

refactor epireview_to_epidist and update as_epidist documentation, re…

665c5a1

…lates #327

added unit test for as_epidist from issue #327

25ab487

linting epireview_to_epidist

9838987

fixed region in epireview_to_epidist when location or country are NA

a10fae6

add sd to summary stats when given as distribution parameter in epire…

22ce679

…view_to_epidist

add unit test for as_epidist lasssa incubation period

09b196c

joshwlambert and others added 16 commits October 8, 2024 12:55

update data_from_epireview vignette

d674f87

Co-authored-by: Kelly McCain <[email protected]>

remove epireview::get_parameter from data_from_epireview vignette

380f375

clarify PERG work in data_from_epireview vignette

c316732

rename epi_dist arg to epi_name, closes #386

1006506

renamed epi_distribution with epi_name in parameter DB and data dicti…

8c89ee4

…onary

updated sysdata

aff941a

Automatic readme update

6da54ec

add units to DB entries, closes #343

ccd68e2

add units to data dictionary, relates #343

fa9ae5e

add units to create_metadata, relates #343

25a00d3

use units in .epireview_to_epiparameter

ae6289f

update plot.epiparameter to plot units if given, relates #343

25bb82e

update sysdata

c1a75db

add plot.epiparameter test for plotting units, relates #343

e634b71

ensure multi-row epireview entries have the same units in coercion

30e10f1

Update NEWS.md for v0.3.0 release (#392)

42fbb22

* updated NEWS with v0.3.0 bullets * updated WORDLIST

joshwlambert added the Pkg review label Oct 9, 2024

chartgerink self-requested a review October 14, 2024 12:37

chartgerink approved these changes Oct 14, 2024

View reviewed changes

TimTaylor assigned TimTaylor and unassigned TimTaylor Oct 16, 2024

TimTaylor self-requested a review October 16, 2024 08:57

TimTaylor reviewed Oct 16, 2024

View reviewed changes

This was referenced Oct 21, 2024

Add deprecation details to epidist_db() #399

Merged

Fix string matching for method assessment in new_epiparameter() #401

Merged

Refactor test_epiparameter() & assert_epiparameter() #402

Merged

Simplify <epiparameter> distribution methods #403

Merged

joshwlambert closed this Oct 22, 2024

joshwlambert mentioned this pull request Oct 22, 2024

Add package reviews from v0.3.0 review #406

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full package review for v0.3.0 #394

Full package review for v0.3.0 #394

joshwlambert commented Oct 9, 2024

chartgerink left a comment

chartgerink Oct 14, 2024

joshwlambert Oct 18, 2024

chartgerink Oct 14, 2024

joshwlambert Oct 21, 2024

TimTaylor left a comment

TimTaylor Oct 16, 2024 •

edited

Loading

TimTaylor Oct 16, 2024

TimTaylor Oct 16, 2024

joshwlambert Oct 21, 2024

TimTaylor Oct 16, 2024

joshwlambert Oct 21, 2024

joshwlambert commented Oct 22, 2024

Full package review for v0.3.0 #394

Full package review for v0.3.0 #394

Conversation

joshwlambert commented Oct 9, 2024

chartgerink left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimTaylor left a comment

Choose a reason for hiding this comment

TimTaylor Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshwlambert commented Oct 22, 2024

TimTaylor Oct 16, 2024 •

edited

Loading