Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full package review for v0.2.0 #341

Merged
merged 28 commits into from
Jul 10, 2024
Merged

Full package review for v0.2.0 #341

merged 28 commits into from
Jul 10, 2024

Conversation

joshwlambert
Copy link
Member

This PR is to provide a platform to review the entirety of the package.

Once this review concludes I will release v0.2.0 on GitHub.

Please see the NEWS.md file for an overview of changes between v0.1.0 and v0.2.0.

This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.

Copy link
Contributor

@CarmenTamayo CarmenTamayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the vignette data_from_epireview.Rmd and everything worked as expected when following along the steps, I've left some comments with questions and feedback. The main issue that I'd include in my wishlist would be to ideally find a way to simplify the process of adding bibliographic information to epidists. I think that once the par_type column is simplified, this process will be easier, and not necessary to use grepl. The process of subsetting multirow entries also would ideally be simplified for the user but I understand that it's difficult to do so given the structure of epireview.
I really like the interface to visualise the parameters in database.Rmd, maybe in the future it could be expanded to have a few more relevant columns, e.g., whether entries are parameterised.

vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Show resolved Hide resolved
vignettes/articles/data_from_epireview.Rmd Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really nice, I rendered it in R, how will it be displayed to the users? in the list of vignettes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your question. It can be rendered locally with vignette("database", package = "epiparameter") or viewed on the {epiparameter} website. It is listed in the vignettes so will be findable by users using vignette(package = "epiparameter").

inst/CITATION Outdated Show resolved Hide resolved
R/coercion.R Outdated Show resolved Hide resolved
R/coercion.R Outdated Show resolved Hide resolved
R/plot.R Outdated Show resolved Hide resolved
R/epidist.R Outdated Show resolved Hide resolved
R/print.R Outdated Show resolved Hide resolved
README.Rmd Outdated Show resolved Hide resolved
R/parameter_tbl.R Outdated Show resolved Hide resolved
Copy link
Member

@jamesmbaazam jamesmbaazam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{epiparameter} v0.1.0.9000 full package review

Thanks @joshwlambert for your hard work on this package. This review provided me an opportunity to update myself with the latest developments. I have left some line comments and will leave my long form comments here for your consideration.

README

  • It would be helpful to link the database to the README so people can access it immediately they visit the package's page.
  • You point to a googlesheet where people can contribute new parameters to the database. Could you link the data dictionary in the sheet so new contributors know what the column names mean?

Getting Started Vignette

  • The Getting Started vignette contains a lot of useful information but needs a bit of re-organising to make it accessible to the user. Currently, I fear it's a bit too technical. I would propose restructuring it as follows:
    • Introductory paragraph
    • Use case of epiparameter
    • Installing and loading epiparameter
    • The full database/library of parameters:
      • Here, introduce epidist_db() (with defaults) and parameter_tbl() (See comments about parameter_tbl() below).
    • Interacting with the library:
      • Here, introduce the <epidist> class as the single entry of the database. (In my opinion, it would be helpful to leave out a lot of the details about the class and methods and focus on "functions" since that's what most users know. This way, it will remove distractions and barriers to understanding and focus on directing users on how to use the package rather than how it works in the background (the object orientation bits). The technical details can/should be moved to the Design Vignette. If you agree, I'd be happy to help with this.)
      • How to subset entries. Here, introduce epidist_db() with single_epidist = TRUE as well as the subset argument.
    • Working with a single entry:
      • density functions,
      • plotting function, etc.
      • extracting parameter values
      • accessors (currently left out but important).
      • converting or extracting parameter values from summary statistics and vice versa.
    • How to add your own entry + caveats.

parameter_tbl()

  • I think there is duplicated functionality between parameter_tbl(), the <epidist> print() method, and the database. From what I understand, parameter_tbl() is a way to subset the database and return a data.frame of the results. However, epidist_db() with single_epidist = FALSE returns a similar result but as a list. Moreover, it seems parameter_tbl() is simply a subset of the database. I think it would be helpful to have a single function that returns a consistent result of subsetting the database.

  • If you are keeping parameter_tbl(), consider adding an html argument that creates an html widget for casting to the Viewer as an HTML page, similar to stringr::str_view(), which uses stringr:::str_view_widget(). This would be helpful for people who want to see the results in a more user-friendly format than what the console provides.

Conversion and extraction functions

  • Add a @seealso tag linking convert_summary_stats_to_params() and convert_params_to_summary_stats().
  • Consider splitting extract_params() into two functions:
    • extract_params_from_range() for type = "range so that it can probably be extensible to the various types of ranges (interquartile, etc), and
    • extract_params_from_quantiles() where type = "percentile" or "quantile".
  • In the vignette Parameter extraction and conversion in {epiparameter}:
    • The use case section is really insightful but is placed too far down the vignette to motivate the functions discussed. I would suggest to start the vignette with the use case that points out the limitation in reporting unidentified distributions of percentiles and ranges (here, giving examples of studies that did that), motivate for the functionalities epiparameter provides to solve that, showcase the functionalities, then end the vignette showing how the parameters from the studies you pointed out in the beginning can be converted to a usable format.
    • The conversion examples for negative binomial all return Inf for the dispersion parameter. Could you provide an example where the dispersion is finite?
    • Do you need cold folding in that vignette?

Exported functions

  • Are all the exported object checkers needed by the user or are some only for internal use?
  • You probably shouldn't export the format() methods for <epidist> and <vb_epidist> as users are not expected to interact with it directly.

joshwlambert and others added 16 commits July 9, 2024 14:04
Co-authored-by: Hugo Gruson <[email protected]>
@joshwlambert
Copy link
Member Author

It would be helpful to link the database to the README so people can access it immediately they visit the package's page.

Added in be5bab3.

You point to a googlesheet where people can contribute new parameters to the database. Could you link the data dictionary in the sheet so new contributors know what the column names mean?

Added a link to the data dictionary in the google sheet.

Installing and loading epiparameter

I don't think installation should be in the Get Started vignette. It's stated in the README which is a more typical place to find them.

The Getting Started vignette contains a lot of useful information but needs a bit of re-organising to make it accessible to the user. Currently, I fear it's a bit too technical. I would propose restructuring it as follows:

Thanks for this! I've done some of the refactoring in b6b3787. I'll continue working to improve the documentation and make it more user friendly in the next version.

I think there is duplicated functionality between parameter_tbl(), the <epidist> print() method, and the database. From what I understand, parameter_tbl() is a way to subset the database and return a data.frame of the results. However, epidist_db() with single_epidist = FALSE returns a similar result but as a list. Moreover, it seems parameter_tbl() is simply a subset of the database. I think it would be helpful to have a single function that returns a consistent result of subsetting the database.

You're correct in thinking that parameter_tbl() outputs a subset of what epidist_db() outputs. But I think the value add of having this function, and why I don't see them as duplicating functionality, is to offer a way to work with the epidemiological parameter data in tabular form. This allows users who are familiar with tidyverse tools to work with this data more easily and should hopefully allow them to more easily subset the list of <epidist> objects to select the one(s) they want to use.

In an ideal world I think we could have the <epidist> class appear as a data.frame to the user but be a list behind the scenes. IIRC this is done in the {phylobase} package (although with S4 in that case). This would then reduce the need to supplementary helper functions such as parameter_tbl(). If you think this would be a good direction to move in let me know and I'll give it some thought. The downside of this approach is it would be a lot of development and maintenance.

If you are keeping parameter_tbl(), consider adding an html argument that creates an html widget for casting to the Viewer as an HTML page, similar to stringr::str_view(), which uses stringr:::str_view_widget(). This would be helpful for people who want to see the results in a more user-friendly format than what the console provides.

I haven't used stringr::str_view() before, could you please raise this as an issue and explain what use case this would address. Thanks!

Add a @seealso tag linking convert_summary_stats_to_params() and convert_params_to_summary_stats().

Resolved in d74c15d.

Consider splitting extract_params() into two functions:

  • extract_params_from_range() for type = "range" so that it can probably be extensible to the various types of ranges (interquartile, etc), and
  • extract_params_from_quantiles() where type = "percentile" or "quantile".

The extraction functions are going to be extended in the near future (see recent suggestions on #1) I'll consider this change when I make the other changes.

Are all the exported object checkers needed by the user or are some only for internal use?

Usually I write them for use in other functions in {epiparameter}, but I've also thought that they are potentially handy functions for a user that wants to quickly change the contents of an <epidist>. I'll leave them as exported for now, if there are any that you think should be unexported let me know.

You probably shouldn't export the format() methods for <epidist> and <vb_epidist> as users are not expected to interact with it directly.

Usually the format() methods are exported in other packages (e.g. {dplyr}) and format() methods are also available in base R for several base classes, therefore I'll leave them in for now, but could be a contender to unexport if we want to slim down the NAMESPACE.

@joshwlambert
Copy link
Member Author

Thanks everyone for providing helpful comments and suggestions! I've committed a lot of the suggested changes on the review branch (this PR) so I'll redirect this PR to main and merge the changes.

I'll then move forward with the v0.2.0 release (#339).

@joshwlambert joshwlambert changed the base branch from empty to main July 9, 2024 16:45
@joshwlambert joshwlambert merged commit bd50cc6 into main Jul 10, 2024
8 checks passed
@joshwlambert joshwlambert deleted the review branch July 10, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants