Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to restrict BiocManager::available() to Bioconductor packages #120

Closed
kevinrue opened this issue Nov 3, 2021 · 6 comments
Closed

Comments

@kevinrue
Copy link

kevinrue commented Nov 3, 2021

Right now, BiocManager::available() returns 21,757 package names to me, including CRAN.

I have

> getOption("repos")
                                                BioCsoft                                                  BioCann 
           "https://bioconductor.org/packages/3.14/bioc" "https://bioconductor.org/packages/3.14/data/annotation" 
                                                 BioCexp                                            BioCworkflows 
"https://bioconductor.org/packages/3.14/data/experiment"       "https://bioconductor.org/packages/3.14/workflows" 
                                               BioCbooks                                                     CRAN 
          "https://bioconductor.org/packages/3.14/books"                               "https://cran.rstudio.com" 

Would it make then to at least add an option to restrict the those package names to only those available from Bioonductor repositories?

Alternatively, could the function return a data.frame with a column for package name and another one for the repository where that package is available from?

Context

I wanted to illustrate BiocManager::available() in the bioc-project Carpentries lesson (in development) to showcase one way of listing of packages available from Bioconductor.
However, it feels confusing if the function also includes any package available from a repository listed in getOption("repos").

I can see why BiocManager::available() would - by default - list all the packages that it could potentially install. I just think that it would be nice to also flag/highlight/filter those that are from Bioconductor from those that would come from other repositories.

@kevinrue kevinrue changed the title Option to restrict BiocManager::available() Option to restrict BiocManager::available() to Bioconductor packages Nov 3, 2021
@mtmorgan
Copy link
Collaborator

mtmorgan commented Nov 3, 2021

I think you want to use BiocManager::repositories() to illustrate the repositories that are listed, as opposed to getOption("repos") (which in a naive session returns c(CRAN = "@CRAN@")).

Also, it might be useful to point out the pattern= argument, particularly useful to navigate annotation resources (this was the original use case motivating it)

> BiocManager::available(pattern = "*Mmusculus")

 [1] "BSgenome.Mmusculus.UCSC.mm10"        "BSgenome.Mmusculus.UCSC.mm10.masked"
 [3] "BSgenome.Mmusculus.UCSC.mm39"        "BSgenome.Mmusculus.UCSC.mm8"
 [5] "BSgenome.Mmusculus.UCSC.mm8.masked"  "BSgenome.Mmusculus.UCSC.mm9"
 [7] "BSgenome.Mmusculus.UCSC.mm9.masked"  "EnsDb.Mmusculus.v75"
 [9] "EnsDb.Mmusculus.v79"                 "PWMEnrich.Mmusculus.background"
[11] "TxDb.Mmusculus.UCSC.mm10.ensGene"    "TxDb.Mmusculus.UCSC.mm10.knownGene"
[13] "TxDb.Mmusculus.UCSC.mm39.refGene"    "TxDb.Mmusculus.UCSC.mm9.knownGene"

I guess the approach to finding Bioconductor packages would be

> db = available.packages(repos = BiocManager::repositories()["BioCsoft"])
> dim(db)
[1] 1948   17
> head(rownames(db))
[1] "a4"          "a4Base"      "a4Classif"   "a4Core"      "a4Preproc"
[6] "a4Reporting"

But back to the feature request, does a 'Bioconductor' package include just the software packages, or software + annotation + experiment? If there is some desire to allow selection, I'd be concerned about how a 'naive' user would know how to identify the repositories they are interested in obtaining information about (and whether that's any easier than using available.packages().

@kevinrue
Copy link
Author

kevinrue commented Nov 3, 2021

Actually, I appreciate the point you make about BiocManager::repositories(). It prompted me carefully read the help page again, because I always thought that the function was meant to return the list of Bioconductor repositories, prefixed to whatever is already present in options("repos"). Instead I understand now that it is meant to return "current Bioconductor and CRAN repositories" (as stated in the title of the help page!). Multiple times, I tried to read the body of the BiocManager::repositories function to clarify that but got lost in the various internal functions that are called. Thanks for clarifying that for me.

Back to the feature request, for me, 'Bioconductor' is anything from the BioC repositories:

 "BioCsoft"      "BioCann"       "BioCexp"       "BioCworkflows" "BioCbooks"

I didn't really think of it, and I guess that could be open for discussion, but this just makes sense to me.

I'm not sure how 'naive' I would expect users to be if they query and navigate packages programmatically from the console. Entirely 'naive' users would probably use the biocViews page first, or more likely Google, and word-of-mouth from colleague, supervisors, etc.
If they got to the point of querying packages through BiocManager, I would then expect the user experience to be similar to biomaRt or ExperimentHub (i.e., listing available resources and then querying specific ones)

  • first, use BiocManager::repositories() to get the list of Bioconductor repositories (URL and name)
  • then, use BiocManager::available(repos = c("BioCsoft")) for instance, to query only software packages

But reaching that point in this discussion, and bearing in mind that this whole discussion is prompted by my effort to write an introduction to Bioconductor for 'naive' users, I guess my feature request is not so much about the lesson anymore as my own naive expectations when using BiocManager::available() for the first time myself.


PS: I did see the trick about pattern= which is convenient for the families of Bioconductor packages that have particular naming conventions. Unfortunately, that leaves a lot of packages out of reach as their name alone doesn't reveal which repository they are distributed by.


PS2: as a relatively experienced programmer, I appreciate to learn about the db = available.packages(repos = ...) trick. I guess that's where my motivation to write a Carpentries lesson for 'naive' users made me feel that an additional option to the BiocManager::available() would be more accessible than an - admittedly not that complicated - bit of extra code.

kevinrue added a commit to carpentries-incubator/bioc-project that referenced this issue Nov 3, 2021
@kevinrue
Copy link
Author

kevinrue commented Nov 3, 2021

I've taken on board the feedback to update the episode here: https://carpentries-incubator.github.io/bioc-project/03-installing-bioconductor/index.html

It is currently recompiling, so make sure to give it a few minutes.

I kept it rather simple, as the lesson is meant to reflect the current state of the project rather than motivate any change to it.

Namely

I think you want to use BiocManager::repositories() to illustrate the repositories that are listed, as opposed to getOption("repos") (which in a naive session returns c(CRAN = "@cran@")).

Great suggestion. Done.

Also, it might be useful to point out the pattern= argument, particularly useful to navigate annotation resources (this was the original use case motivating it)

Great suggestion. I've added an example to point out the pattern argument. This episode is early in the context of lesson, but the use case for annotations is easy to fit in at this point already.

I guess the approach to finding Bioconductor packages would be [...]

I've left that one out, at least for the time being. As I mentioned above, this lesson is meant for 'naive' users, and considering that I've never explored the package repository that way myself, I don't think it is something that is worth including in the lesson. That said, this is a lesson in development, and I am more than happy to get any feedback to improve it.

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Nov 3, 2021

Hi Kevin,

This looks like a good resource.

Perhaps you would like to mention the intent of the BiocManager::available() function and also have a going further section that shows the code for restricting packages to a certain aspect of Bioconductor (e.g., BioCsoft). Making users aware of additional installation functions is good but for naive users, it may lead to installation troubles down the line. The general recommendation is to use BiocManager::install over any other installation mechanism because it would ensure proper versioning of Bioc packages.

Note. There's a minor typo in the key points (BiocManagerBiocManager::version()).

Best regards,
Marcel

@kevinrue
Copy link
Author

kevinrue commented Nov 3, 2021

Thanks for the additional feedback and suggestion. It's all work in progress and an exercise of balance between 1) combining the many bits of information that already exist in the various help pages, vignettes, etc. and 2) not going into excessive and overwhelming details that would otherwise sidetrack from the key points of each episode.

A "going further" section could be a good place to store this indeed. The proof will be in the pudding when the lesson will actually be delivered to participants: what is essential, what is extra, what is overkill.

I can't wait to see how the complete lesson will look in the end. I've got a vision to some extent, but it sort of evolves as I write individual episodes and figure out how to make it all flow nicely.

Thanks !

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Nov 4, 2021

I've updated the documentation to make this a bit more clear. 733a58a
The functionality to show what packages are available in each branch of Bioconductor probably belongs in biocViews.
Thanks !

@LiNk-NY LiNk-NY closed this as completed Nov 4, 2021
zkamvar pushed a commit to fishtree-attempt/bioc-project that referenced this issue Feb 15, 2023
zkamvar pushed a commit to carpentries-incubator/bioc-project that referenced this issue Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants