diff --git a/.github/workflows/test-coverage.yaml b/.github/workflows/test-coverage.yaml index 98c089ca..d78a35fb 100644 --- a/.github/workflows/test-coverage.yaml +++ b/.github/workflows/test-coverage.yaml @@ -71,3 +71,7 @@ jobs: with: name: coverage-test-failures path: ${{ runner.temp }}/package + - name: Upload coverage reports to Codecov with GitHub Action + uses: codecov/codecov-action@v4.2.0 + env: + CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} \ No newline at end of file diff --git a/DESCRIPTION b/DESCRIPTION index 792a3be2..2bc35a7e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: MiscMetabar Type: Package Title: Miscellaneous Functions for Metabarcoding Analysis -Version: 0.9.1 +Version: 0.9.2 Authors@R: person("Adrien", "Taudière", email = "adrien.taudiere@zaclys.net", role = c("aut", "cre", "cph"), comment = c(ORCID = "0000-0003-1088-1182")) Description: Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) . diff --git a/NEWS.md b/NEWS.md index c2ea00c0..08abd78b 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,21 +1,28 @@ -# MiscMetabar 0.9.1 (in development) +# MiscMetabar 0.9.2 (in development) + +- Add param `default_fun` in function `merge_samples2()` in order to replace the default function that change the sample data in case of merging. A useful parameter is `default_fun=diff_fct_diff_class`. +- Add param `kruskal_test` to `hill_pq()` function to prevent user to mis-interpret Tuckey HSD result (and letters) if the global effect of the tested factor on Hill diversity is non significant. +- Add param `vioplot` to hill_pq() function to allow violin plot instead of boxplot. +- Modify `rarefy_sample_count_by_modality` to debug the case of modality with level of length one. + +# MiscMetabar 0.9.1 ## New functions -- Add functions [taxa_as_rows()] and [taxa_as_columns()] to replace verbose called to [clean_pq()] -- Add function [ggscatt_pq()] to plot and test for effect of a numerical columns in sam_data on Hill number. Its the equivalent for numerical variables of [ggbetween_pq()] which focus on the effect of a factor. -- Add functions [var_par_pq()] , [var_par_rarperm_pq()] and [plot_var_part_pq()] to compute the partition of the variation of community and plot it. It introduce the notion of `rarperm` part in the function name. It refers to the fact that this function compute permutation of samples depth rarefaction to measure the variation due to the random process in rarefaction. -- Add function [hill_test_rarperm_pq()] to test the effect of a factor on hill diversity accounting for the variation due to random nature of the rarefaction by sample depth. -- Add function [rarefy_sample_count_by_modality()] to equalize the number of samples for each levels of a modality (factor) -- Add function [accu_plot_balanced_modality()] to plot accumulation curves with balanced modality (same number of samples per level) and depth rarefaction (same number of sequences per sample) -- Add function [adonis_rarperm_pq()] to compute multiple Permanova analyses on different sample depth rarefaction. -- Add function [ggaluv_pq()] to plot taxonomic distribution in alluvial fashion with ggplot2 (using the [ggalluvial] package) -- Add function [glmutli_pq()] to use automated model selection and multimodel inference with (G)LMs for phyloseq object +- Add functions `taxa_as_rows()` and `taxa_as_columns()` to replace verbose called to `clean_pq()` +- Add function `ggscatt_pq()` to plot and test for effect of a numerical columns in sam_data on Hill number. Its the equivalent for numerical variables of `ggbetween_pq()` which focus on the effect of a factor. +- Add functions `var_par_pq()` , `var_par_rarperm_pq()` and `plot_var_part_pq()` to compute the partition of the variation of community and plot it. It introduce the notion of `rarperm` part in the function name. It refers to the fact that this function compute permutation of samples depth rarefaction to measure the variation due to the random process in rarefaction. +- Add function `hill_test_rarperm_pq()` to test the effect of a factor on hill diversity accounting for the variation due to random nature of the rarefaction by sample depth. +- Add function `rarefy_sample_count_by_modality()` to equalize the number of samples for each levels of a modality (factor) +- Add function `accu_plot_balanced_modality()` to plot accumulation curves with balanced modality (same number of samples per level) and depth rarefaction (same number of sequences per sample) +- Add function `adonis_rarperm_pq()` to compute multiple Permanova analyses on different sample depth rarefaction. +- Add function `ggaluv_pq()` to plot taxonomic distribution in alluvial fashion with ggplot2 (using the [ggalluvial] package) +- Add function `glmutli_pq()` to use automated model selection and multimodel inference with (G)LMs for phyloseq object ## New parameters -- Add param `taxa_ranks` in function [psmelt_samples_pq()] to group results by samples AND taxonomic ranks. -- Add param `hill_scales` in functions [hill_tuckey_pq()] and [hill_pq()] to choose the level of the hill number. +- Add param `taxa_ranks` in function `psmelt_samples_pq()` to group results by samples AND taxonomic ranks. +- Add param `hill_scales` in functions `hill_tuckey_pq()` and `hill_p()` to choose the level of the hill number. - Add param `na_remove` in function `hill_pq()` to remove samples with NA in the factor fact. @@ -31,7 +38,7 @@ - Replace param `variable` by `fact` in function `ggbetween_pq()` and `hill_pq()` (keeping the variable option in `hill_pq()` for backward compatibility) - Fix a bug in the class of the return object of function `chimera_removal_vs()`. Now it return a matrix to be able to be parsed on to [dada2::getUniques()] -# MiscMetabar 0.7 (in development) +# MiscMetabar 0.7 - Add functions `chimera_detection_vs()` and `chimera_removal_vs()` to process chimera detection and removal using [vsearch](https://github.com/torognes/vsearch) software - Add functions `filter_trim()`, `sample_data_with_new_names()` and `rename_samples()` to facilitate the use of [targets](https://books.ropensci.org/targets/) for bioinformatic pipeline. diff --git a/R/Deseq2_edgeR.R b/R/Deseq2_edgeR.R index e5f64d03..a124d800 100644 --- a/R/Deseq2_edgeR.R +++ b/R/Deseq2_edgeR.R @@ -1,7 +1,8 @@ ################################################################################ #' Plot edgeR results for a phyloseq or a edgeR object. #' -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param contrast (required):This argument specifies what comparison @@ -121,7 +122,9 @@ plot_edgeR_pq <- ################################################################################ #' Plot DESeq2 results for a phyloseq or a DESeq2 object. #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @param data (required) a \code{\link{phyloseq-class}} or a #' \code{\link[DESeq2]{DESeqDataSet-class}} object. @@ -367,7 +370,6 @@ plot_deseq2_pq <- ################################################################################ #' Convert phyloseq OTU count data into DGEList for edgeR package #' -#' #' @inheritParams clean_pq #' #' @param group (required) A character vector or factor giving the experimental diff --git a/R/alpha_div_test.R b/R/alpha_div_test.R index 0a9bab27..ba20b85f 100644 --- a/R/alpha_div_test.R +++ b/R/alpha_div_test.R @@ -2,7 +2,9 @@ #' Calculate hill number and compute Tuckey post-hoc test #' @description #' -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing +#' #' Note that, by default, this function use a sqrt of the read numbers in the linear #' model in order to correct for uneven sampling depth. #' @aliases hill_tuckey_pq @@ -104,7 +106,9 @@ hill_tuckey_pq <- function( #' with different rarefaction even depth #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param fact (required) Name of the factor in `physeq@sam_data` used to plot @@ -268,7 +272,9 @@ hill_test_rarperm_pq <- function(physeq, ################################################################################ #' Automated model selection and multimodel inference with (G)LMs for phyloseq #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param formula (required) a formula for [glmulti::glmulti()] diff --git a/R/beta_div_test.R b/R/beta_div_test.R index 78fd74d4..60efda8d 100644 --- a/R/beta_div_test.R +++ b/R/beta_div_test.R @@ -1,7 +1,9 @@ ################################################################################ #' @title Performs graph-based permutation tests on phyloseq object #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' A wrapper of [phyloseqGraphTest::graph_perm_test()] for quick plot with #' important statistics @@ -95,7 +97,9 @@ graph_test_pq <- function(physeq, ################################################################################ #' @title Permanova on a phyloseq object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [vegan::adonis2()] function in the case of `physeq` object. #' @inheritParams clean_pq @@ -217,7 +221,9 @@ adonis_pq <- function(physeq, #' Permanova (adonis) on permutations of rarefaction even depth #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams adonis_pq #' @param nperm (int) The number of permutations to perform. @@ -329,7 +335,9 @@ adonis_rarperm_pq <- function(physeq, #' @title Compute and test local contributions to beta diversity (LCBD) of #' samples #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [adespatial::beta.div()] function in the case of `physeq` #' object. @@ -376,7 +384,9 @@ LCBD_pq <- function(physeq, ################################################################################ #' @title Plot and test local contributions to beta diversity (LCBD) of samples #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [adespatial::beta.div()] function in the case of `physeq` #' object. @@ -526,7 +536,9 @@ plot_LCBD_pq <- function(physeq, ################################################################################ #' @title Plot species contributions to beta diversity (SCBD) of samples #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [adespatial::beta.div()] function in the case of `physeq` #' object. @@ -592,7 +604,9 @@ plot_SCBD_pq <- function(physeq, ################################################################################ #' @title Test and plot multipatt result #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [indicspecies::multipatt()] function in the case of #' `physeq` object. @@ -673,7 +687,9 @@ multipatt_pq <- function(physeq, ################################################################################ #' Run ANCOMBC2 on phyloseq object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [ANCOMBC::ancombc2()] function #' @@ -755,7 +771,9 @@ ancombc_pq <- function(physeq, fact, levels_fact = NULL, tax_level = "Class", .. ################################################################################ #' Filter ancombc_pq results #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @param ancombc_res (required) the result of the ancombc_pq function #' For the moment only bimodal factors are possible. @@ -833,7 +851,9 @@ signif_ancombc <- function(ancombc_res, ################################################################################ #' Plot ANCOMBC2 result for phyloseq object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param ancombc_res (required) the result of the ancombc_pq function @@ -998,7 +1018,9 @@ plot_ancombc_pq <- #' Show taxa which are present in only one given level of a modality #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param modality (required) The name of a column present in the `@sam_data` slot @@ -1052,7 +1074,9 @@ taxa_only_in_one_level <- function(physeq, #' Distribution of sequences across a factor for one taxon #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param fact (required) Name of the factor in `physeq@sam_data` used to plot @@ -1112,7 +1136,10 @@ distri_1_taxa <- function(physeq, fact, taxa_name, digits = 2) { ################################################################################ #' Partition the Variation of a phyloseq object by 2, 3, or 4 Explanatory Matrices #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental +#' #' The function partitions the variation in otu_table using #' distance (Bray per default) with respect to two, three, or four explanatory #' tables, using @@ -1173,7 +1200,7 @@ var_par_pq <- dist_physeq <- phyloseq::distance(physeq, method = dist_method) } - for (i in 1:length(list_component)) { + for (i in seq_along(list_component)) { assign( names(list_component)[i], as.data.frame(unclass(physeq@sam_data[, list_component[[i]]])) @@ -1210,7 +1237,7 @@ var_par_pq <- if (dbrda_computation) { res_varpart$dbrda_result <- list() - for (i in 1:length(list_component)) { + for (i in seq_along(list_component)) { res_varpart$dbrda_result[[i]] <- anova(vegan::dbrda( as.formula(paste0( @@ -1231,13 +1258,14 @@ var_par_pq <- ################################################################################ #' Partition the Variation of a phyloseq object with rarefaction permutations #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' This is an extension of the function [var_par_pq()]. The main addition is #' the computation of nperm permutations with rarefaction even depth by #' sample. The return object #' -#' #' @inheritParams clean_pq #' @param list_component (required) A named list of 2, 3 or four vectors with #' names from the `@sam_data` slot. diff --git a/R/blast.R b/R/blast.R index 40861fbc..fedd0474 100644 --- a/R/blast.R +++ b/R/blast.R @@ -2,7 +2,8 @@ #' Blast some sequence against `refseq` slot of a \code{\link{phyloseq-class}} #' object. #' -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing #' #' @param physeq (required): a \code{\link{phyloseq-class}} object obtained #' using the `phyloseq` package. @@ -179,7 +180,8 @@ blast_to_phyloseq <- function(physeq, #' Blast all sequence of `refseq` slot of a \code{\link{phyloseq-class}} #' object against a custom database. #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' @inheritParams blast_to_phyloseq #' @param fasta_for_db path to a fasta file to make the blast database @@ -333,7 +335,8 @@ blast_pq <- function(physeq, ################################################################################ #' Filter undesirable taxa using blast against a custom database. #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' @inheritParams blast_to_phyloseq #' @param fasta_for_db path to a fasta file to make the blast database @@ -420,7 +423,8 @@ filter_asv_blast <- function(physeq, #' Blast some sequence against sequences from of a \code{\link{derep-class}} #' object. #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' @inheritParams blast_to_phyloseq #' @param derep The result of `dada2::derepFastq()`. A list of `derep-class` @@ -594,7 +598,9 @@ blast_to_derep <- function(derep, #' Add information from [blast_pq()] to the `tax_table` slot of a *phyloseq* object #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Basically a wrapper of [blast_pq()] with option `unique_per_seq = TRUE` and #' `score_filter = FALSE`. diff --git a/R/controls.R b/R/controls.R index 3393f33c..951e9177 100644 --- a/R/controls.R +++ b/R/controls.R @@ -2,7 +2,8 @@ #' Search for exact matching of sequences using complement, #' reverse and reverse-complement #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param seq2search A DNAStringSet object of sequences to search for. @@ -50,7 +51,9 @@ search_exact_seq_pq <- function(physeq, seq2search) { #' distance for all samples #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Compute distance among positive controls, #' i.e. samples which are duplicated @@ -123,7 +126,8 @@ dist_pos_control <- function(physeq, samples_names, method = "bray") { ################################################################################ #' Subset taxa using a taxa control (e.g. truffle root tips) through 3 methods. #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' @aliases subset_taxa_tax_control #' @inheritParams clean_pq @@ -146,7 +150,6 @@ dist_pos_control <- function(physeq, samples_names, method = "bray") { #' #' @examples #' -#' #' subset_taxa_tax_control(data_fungi, #' as.numeric(data_fungi@otu_table[, 300]), #' min_diff_for_cutoff = 2 diff --git a/R/dada_phyloseq.R b/R/dada_phyloseq.R index 32d096e7..74cedf81 100644 --- a/R/dada_phyloseq.R +++ b/R/dada_phyloseq.R @@ -6,7 +6,10 @@ if (getRversion() >= "2.15.1") { #' Add dna in `refseq` slot of a `physeq` object using taxa names and renames taxa #' using ASV_1, ASV_2, … #' -#' `r lifecycle::badge("stable")` +#' @description +#' +#' +#' lifecycle-stable #' #' @inheritParams clean_pq #' @@ -36,7 +39,6 @@ add_dna_to_phyloseq <- function(physeq) { #' (i) taxa names in refseq, taxonomy table and otu_table and between #' (ii) sample names in sam_data and otu_table. #' -#' #' @param physeq (required): a \code{\link{phyloseq-class}} object obtained #' using the `phyloseq` package. #' @param remove_empty_samples (logical) Do you want to remove samples @@ -203,7 +205,9 @@ clean_pq <- function(physeq, #' from various objects including dada-class and derep-class. #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' * List of fastq and fastg.gz files -> nb of reads and samples #' * List of dada-class -> nb of reads, clusters (ASV) and samples @@ -427,7 +431,9 @@ track_wkflow <- function(list_of_objects, #' for each sample #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Contrary to [track_wkflow()], only phyloseq object are possible. #' More information are available in the manual of the function [track_wkflow()] @@ -476,7 +482,9 @@ track_wkflow_samples <- function(list_pq_obj, ...) { #' or a list of DNA sequences #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param dna_seq You may directly use a character vector of DNA sequences @@ -643,7 +651,8 @@ asv2otu <- function(physeq = NULL, ################################################################################ #' Save phyloseq object in the form of multiple csv tables. #' -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param path a path to the folder to save the phyloseq object @@ -864,14 +873,15 @@ write_pq <- function(physeq, #' A wrapper of write_pq to save in all three possible formats #' #' @details -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' Write : #' - 4 separate tables #' - 1 table version #' - 1 RData file #' -#' #' @inheritParams clean_pq #' @param path a path to the folder to save the phyloseq object #' @param ... Other arguments passed on to [write_pq()] or [utils::write.table()] function. @@ -903,7 +913,8 @@ save_pq <- function(physeq, path = NULL, ...) { #' Read phyloseq object from multiple csv tables and a phylogenetic tree #' in Newick format. #' -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing #' #' @param path (required) a path to the folder to read the phyloseq object #' @param taxa_are_rows (default to FALSE) see ?phyloseq for details @@ -993,7 +1004,9 @@ read_pq <- function(path = NULL, #' Lulu reclustering of class `physeq` #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' See https://www.nature.com/articles/s41467-017-01312-x for more information #' on the method. @@ -1143,7 +1156,9 @@ lulu_pq <- function(physeq, #' MUMU reclustering of class `physeq` #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' See https://www.nature.com/articles/s41467-017-01312-x for more information #' on the original method LULU. This is a wrapper of @@ -1326,7 +1341,9 @@ mumu_pq <- function(physeq, #' Verify the validity of a phyloseq object #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' Mostly for internal use in MiscMetabar functions. #' @@ -1385,7 +1402,9 @@ verify_pq <- function( #' Subset samples using a conditional boolean vector. #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' The main objective of this function is to complete the #' [phyloseq::subset_samples()] function by propose a more easy @@ -1438,13 +1457,14 @@ subset_samples_pq <- function(physeq, condition) { #' Subset taxa using a conditional named boolean vector. #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' The main objective of this function is to complete the #' [phyloseq::subset_taxa()] function by propose a more easy way of #' subset_taxa using a named boolean vector. Names must match taxa_names. #' -#' #' @inheritParams clean_pq #' @param condition A named boolean vector to subset taxa. Length must fit #' the number of taxa and names must match taxa_names. Can also be a @@ -1550,7 +1570,9 @@ subset_taxa_pq <- function(physeq, #' Select one sample from a physeq object #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Mostly for internal used, for example in function [track_wkflow_samples()]. #' @@ -1612,7 +1634,9 @@ select_one_sample <- function(physeq, sam_name, silent = FALSE) { #' Add new taxonomic rank to a phyloseq object. #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' One of main use of this function is to add taxonomic assignment from #' a new database. @@ -1650,7 +1674,9 @@ add_new_taxonomy_pq <- function(physeq, ref_fasta, suffix = NULL, ...) { #' Summarize information from sample data in a table #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' A wrapper for the [gtsummary::tbl_summary()] function in the case of `physeq` #' object. @@ -1701,10 +1727,11 @@ tbl_sum_samdata <- function(physeq, remove_col_unique_value = TRUE, ...) { #' Add information about Guild for FUNGI the FUNGuild databse #' #' @description -#' `r lifecycle::badge("experimental")` #' -#' Please cite Nguyen et al. 2016 (\doi{doi:10.1016/j.funeco.2015.06.006}) +#' +#' lifecycle-experimental #' +#' Please cite Nguyen et al. 2016 (\doi{doi:10.1016/j.funeco.2015.06.006}) #' #' @inheritParams clean_pq #' @param taxLevels Name of the 7 columns in tax_table required by funguild @@ -1773,7 +1800,9 @@ add_funguild_info <- function(physeq, #' created with [add_funguild_info()] #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param levels_order (Default NULL) A character vector to @@ -1915,7 +1944,9 @@ plot_guild_pq <- #' Build phylogenetic trees from refseq slot of a phyloseq object #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' This function build tree phylogenetic tree and if nb_bootstrap is #' set, it build also the 3 corresponding bootstrapped tree. @@ -2080,7 +2111,9 @@ build_phytree_pq <- function(physeq, #' Test if the mean number of sequences by samples is link to the modality of #' a factor #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' The aim of this function is to provide a warnings if samples depth significantly #' vary among the modalities of a factor present in the `sam_data` slot. @@ -2122,7 +2155,9 @@ are_modality_even_depth <- function(physeq, fact, boxplot = FALSE) { #' Reorder taxa in otu_table/tax_table/refseq slot of a phyloseq object #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Note that the taxa order in a physeq object with a tree is locked by #' the order of leaf in the phylogenetic tree. @@ -2191,7 +2226,9 @@ reorder_taxa_pq <- function(physeq, names_ordered, remove_phy_tree = FALSE) { ################################################################################ #' @title Add information to sample_data slot of a phyloseq-class object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Warning: The value nb_seq and nb_otu may be outdated if you transform your #' phyloseq object, e.g. using the [subset_taxa_pq()] function @@ -2252,7 +2289,9 @@ add_info_to_sam_data <- function(physeq, #' or the `refseq` slot of a phyloseq-class object #' #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable #' #' Internally used in [vsearch_clustering()], [swarm_clustering()] and #' [asv2otu()]. @@ -2315,11 +2354,14 @@ physeq_or_string_to_dna <- function(physeq = NULL, ################################################################################ #' Remove primers using [cutadapt](https://github.com/marcelm/cutadapt/) #' -#' #' @description -#' `r lifecycle::badge("experimental")` #' -#' You need to install Cutadapt +#' +#' lifecycle-experimental +#' +#' You need to install [Cutadapt](https://cutadapt.readthedocs.io/). +#' See also https://github.com/VascoElbrecht/JAMP/blob/master/JAMP/R/Cutadapt.R for another call to cutadapt +#' from R #' #' @param path_to_fastq (Required) A path to a folder with fastq files. See #' [list_fastq_files()] for help. @@ -2336,7 +2378,9 @@ physeq_or_string_to_dna <- function(physeq = NULL, #' to run cutadapt. For examples, "source ~/miniconda3/etc/profile.d/conda.sh && conda activate cutadaptenv &&" allow to bypass the conda init which asks to restart the shell #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @return a list of command and #' @export @@ -2348,7 +2392,6 @@ physeq_or_string_to_dna <- function(physeq = NULL, #' folder_output = tempdir() #' ) #' -#' #' cutadapt_remove_primers( #' system.file("extdata", #' package = "dada2" @@ -2370,7 +2413,6 @@ physeq_or_string_to_dna <- function(physeq = NULL, #' cmd_is_run = FALSE #' ) #' -#' #' unlink(tempdir(), recursive = TRUE) #' } #' @details @@ -2469,14 +2511,13 @@ cutadapt_remove_primers <- function(path_to_fastq, } ################################################################################ - -################################################################################ - ################################################################################ #' List the taxa that founded only in one given level of a modality #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param modality (required) The name of a column present in the `@sam_data` slot @@ -2532,7 +2573,9 @@ taxa_only_in_one_level <- function(physeq, ################################################################################ #' Normalize OTU table using samples depth #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' This function implement the method proposed by #' McKnight et al. 2018 (\doi{doi:10.5061/dryad.tn8qs35}) @@ -2590,7 +2633,9 @@ normalize_prop_pq <- function(physeq, base_log = 2, constante = 10000, digits = #' Build a sample information tibble from physeq object #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Hill numbers are the number of equiprobable species giving the same diversity #' value as the observed distribution. @@ -2714,7 +2759,10 @@ psmelt_samples_pq <- #' Force taxa to be in columns in the otu_table of a physeq object #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @author Adrien Taudière #' @export @@ -2736,7 +2784,10 @@ taxa_as_columns <- function(physeq) { #' Force taxa to be in columns in the otu_table of a physeq object #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @author Adrien Taudière #' @export @@ -2757,7 +2808,9 @@ taxa_as_rows <- function(physeq) { ################################################################################ #' Rarefy (equalize) the number of samples per modality of a factor #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param fact (required): The variable to rarefy. Must be present in @@ -2799,16 +2852,25 @@ rarefy_sample_count_by_modality <- ) message("...") } - mod <- physeq@sam_data[[fact]] + mod <- as.factor(physeq@sam_data[[fact]]) n_mod <- table(mod) samples_names <- sample_names(physeq) samp_to_keep <- c() - for (modality in levels(as.factor(mod))) { + for (modality in levels(mod)) { + vec_samp_mod <- c(as.numeric(grep(modality, mod))) + + # To bypass the pb of vector of length 1 + # We build a vector of two equal values and we will take only one + # It is cause by range base behavior: + # 'If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.' + if(length(vec_samp_mod)==1){ + vec_samp_mod <- c(vec_samp_mod, vec_samp_mod) + } samp_to_keep <- c( samp_to_keep, sample( - as.numeric(grep(modality, mod)), + vec_samp_mod, size = min(n_mod), replace = FALSE ) @@ -2816,6 +2878,17 @@ rarefy_sample_count_by_modality <- } new_physeq <- subset_samples_pq(physeq, 1:nsamples(physeq) %in% samp_to_keep) + + if (length(table(new_physeq@sam_data[[fact]])) != length(table(mod))) { + warning( + paste0( + "The number of final levels (sam_data of the output phyloseq + object) is not equal to the inital (sam_data of the input + phyloseq object) number of levels in the factor: '", + fact , "'" + ) + )} + return(new_physeq) } ################################################################################ diff --git a/R/data.R b/R/data.R index cafef09a..c161a2c7 100644 --- a/R/data.R +++ b/R/data.R @@ -41,7 +41,6 @@ #' Obtain using `data_fungi_mini <- subset_taxa(data_fungi, Phylum == "Basidiomycota")` #' and then `data_fungi_mini <- subset_taxa_pq(data_fungi_mini, colSums(data_fungi_mini@otu_table) > 5000)` #' -#' #' @format A physeq object containing 45 taxa with references sequences #' described by 14 taxonomic ranks and 137 samples described by 7 sample variables: #' - *X*: the name of the fastq-file diff --git a/R/funguild.R b/R/funguild.R index a778bee1..6f6f47ea 100644 --- a/R/funguild.R +++ b/R/funguild.R @@ -1,6 +1,9 @@ #' Retrieve the FUNGuild database #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable +#' #' The original function and documentation was written by Brendan Furneaux #' in the [FUNGuildR](https://github.com/brendanf/FUNGuildR/) package. #' @@ -54,12 +57,13 @@ get_funguild_db <- function(db_url = "http://www.stbates.org/funguild_db_2.php") #' Assign Guilds to Organisms Based on Taxonomic Classification #' #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable #' #' The original function and documentation was written by Brendan Furneaux #' in the [FUNGuildR](https://github.com/brendanf/FUNGuildR/) package. #' -#' #' These functions have identical behavior if supplied with a database; however #' they download the database corresponding to their name by default. #' diff --git a/R/krona.R b/R/krona.R index 8c4e0e58..f910703f 100644 --- a/R/krona.R +++ b/R/krona.R @@ -1,7 +1,9 @@ ################################################################################ #' Make Krona files using [KronaTools](https://github.com/marbl/Krona/wiki). #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' Need the installation of kronatools on the computer ([installation instruction](https://github.com/marbl/Krona/wiki/Installing)). #' @@ -96,7 +98,9 @@ krona <- ############################################################################### #' Merge Krona files using [KronaTools](https://github.com/marbl/Krona/wiki). #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' Need the installation of kronatools on the computer #' ([installation instruction](https://github.com/marbl/Krona/wiki/Installing)). diff --git a/R/lulu.R b/R/lulu.R index 4eae8ed5..28a97777 100644 --- a/R/lulu.R +++ b/R/lulu.R @@ -1,9 +1,12 @@ #' Post Clustering Curation of Amplicon Data. #' #' @description -#' `r lifecycle::badge("stable")` #' -#' The original function and documentation was written by Tobias Guldberg Frøslev +#' +#' +#' lifecycle-stable +#' +#' The original function and documentation was written by Tobias Guldberg Frøslev #' in the [lulu](https://github.com/tobiasgf/lulu) package. #' #' This algorithm \code{lulu} consumes an OTU table and a matchlist, and diff --git a/R/miscellanous.R b/R/miscellanous.R index 84ad4be5..3945a182 100644 --- a/R/miscellanous.R +++ b/R/miscellanous.R @@ -5,7 +5,9 @@ #' that appended during PCR or NGS pipeline. #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param min_number (int) the minimum number of sequences to put @@ -33,7 +35,9 @@ as_binary_otu_table <- function(physeq, min_number = 1) { #' Compute paired distances among matrix (e.g. otu_table) #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @note the first column of the first matrix is compare to the first column of #' the second matrix, the second column of the first matrix is compare to the @@ -92,7 +96,9 @@ dist_bycol <- function(x, ################################################################################ #' List the size of all objects of the GlobalEnv. #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable #' #' Code from https://tolstoy.newcastle.edu.au/R/e6/help/09/01/1121.html #' @@ -115,7 +121,9 @@ all_object_size <- function() { #' @inheritParams clean_pq #' @param remove_space (logical; default TRUE): do we remove space? #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @author Adrien Taudière #' @@ -138,7 +146,9 @@ simplify_taxo <- function(physeq, remove_space = TRUE) { #' #' @param file_path (required): path to a file #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @author Adrien Taudière #' @@ -159,7 +169,9 @@ get_file_extension <- function(file_path) { #' @param accuracy number of digits (number of digits after zero) #' @param add_symbol if set to TRUE add the % symbol to the value #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @author Adrien Taudière #' diff --git a/R/plot_functions.R b/R/plot_functions.R index a01d6577..29715f3a 100644 --- a/R/plot_functions.R +++ b/R/plot_functions.R @@ -2,7 +2,9 @@ #' Plot the result of a mt test [phyloseq::mt()] #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @param mt (required) Result of a mt test from the function [phyloseq::mt()]. #' @param alpha (default: 0.05) Choose the cut off p-value to plot taxa. @@ -55,7 +57,10 @@ plot_mt <- #' Plot accumulation curves for \code{\link{phyloseq-class}} object #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param fact (required) Name of the factor in `physeq@sam_data` used to plot #' different lines @@ -277,7 +282,9 @@ accu_plot <- ################################################################################ #' Plot accumulation curves with balanced modality and depth rarefaction #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' This function (i) rarefy (equalize) the number of samples per modality of a #' factor and (ii) rarefy the number of sequences per sample (depth). The @@ -453,7 +460,10 @@ accu_plot_balanced_modality <- function(physeq, #' accumulation curves #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental +#' #' @param res_accuplot the result of the function accu_plot() #' @param threshold the proportion of ASV to obtain in each samples #' @@ -496,7 +506,10 @@ accu_samp_threshold <- function(res_accuplot, threshold = 0.95) { ################################################################################ #' Plot OTU circle for \code{\link{phyloseq-class}} object #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param fact (required) Name of the factor to cluster samples by modalities. #' Need to be in \code{physeq@sam_data}. @@ -712,7 +725,10 @@ circle_pq <- ################################################################################ #' Sankey plot of \code{\link{phyloseq-class}} object #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param fact Name of the factor to cluster samples by modalities. #' Need to be in \code{physeq@sam_data}. @@ -930,7 +946,10 @@ sankey_pq <- ################################################################################ #' Venn diagram of \code{\link{phyloseq-class}} object #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param fact (required): Name of the factor to cluster samples by modalities. #' Need to be in \code{physeq@sam_data}. @@ -1133,7 +1152,9 @@ venn_pq <- #' Venn diagram of \code{\link{phyloseq-class}} object using #' `ggVennDiagram::ggVennDiagram` function #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' Note that you can use ggplot2 function to customize the plot #' for ex. `+ scale_fill_distiller(palette = "BuPu", direction = 1)` @@ -1175,7 +1196,7 @@ venn_pq <- #' ggvenn_pq(data_fungi, fact = "Height") + #' ggplot2::scale_fill_distiller(palette = "BuPu", direction = 1) #' pl <- ggvenn_pq(data_fungi, fact = "Height", split_by = "Time") -#' for (i in 1:length(pl)) { +#' for (i in seq_along(pl)) { #' p <- pl[[i]] + #' scale_fill_distiller(palette = "BuPu", direction = 1) + #' theme(plot.title = element_text(hjust = 0.5, size = 22)) @@ -1298,7 +1319,9 @@ ggvenn_pq <- function(physeq = NULL, ################################################################################ #' Multiple plot function #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable #' # ggplot objects can be passed in ..., or to plotlist (as a list of ggplot # objects) @@ -1367,7 +1390,10 @@ multiplot <- ################################################################################ #' Graphical representation of hill number 0, 1 and 2 across a factor #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental +#' #' Hill numbers are the number of equiprobable species giving the same #' diversity value as the observed distribution. The Hill number 0 #' correspond to Species richness), the Hill number 1 to @@ -1391,12 +1417,12 @@ multiplot <- #' @param color_fac (optional): The variable to color the barplot. For ex. #' same as fact. Not very useful because ggplot2 plot colors can be #' change using `scale_color_XXX()` function. -#' @param letters (optional, default=FALSE): If set to TRUE, the plot +#' @param letters (optional, default FALSE): If set to TRUE, the plot #' show letters based on p-values for comparison. Use the #' \code{\link[multcompView]{multcompLetters}} function from the package #' multcompLetters. BROKEN for the moment. Note that na values in The #' variable param need to be removed (see examples) to use letters. -#' @param add_points (logical): add jitter point on boxplot +#' @param add_points (logical, default FALSE): add jitter point on boxplot #' @param add_info (logical, default TRUE) Do we add a subtitle with #' information about the number of samples per modality ? #' @param one_plot (logical, default FALSE) If TRUE, return a unique @@ -1404,6 +1430,13 @@ multiplot <- #' Note that if letters and one_plot are both TRUE, tuckey HSD results #' are discarded from the unique plot. In that case, use one_plot = FALSE #' to see the tuckey HSD results in the fourth plot of the resulting list. +#' @param kruskal_test (logical, default TRUE) Do we test for global effect of +#' our factor on each hill scales values? When kruskal_test is TRUE, the +#' resulting test value are add in each plot in subtitle (unless add_info is +#' FALSE). Moreover, if at +#' least one hill scales is not significantly link to fact (pval>0.05), +#' a message is prompt saying that Tuckey HSD plot is not informative for +#' those Hill scales and letters are not printed. #' @param plot_with_tuckey (logical, default TRUE). If one_plot is set to #' TRUE and letters to FALSE, allow to discard the tuckey plot part with #' plot_with_tuckey = FALSE @@ -1416,8 +1449,11 @@ multiplot <- #' @param na_remove (logical, default TRUE) Do we remove samples with NA in #' the factor fact ? Note that na_remove is always TRUE when using #' letters = TRUE +#' @param vioplot (logical, default FALSE) Do we plot violin plot instead of +#' boxplot ? #' @return Either an unique ggplot2 object (if one_plot is TRUE) or -#' a list of 4 ggplot2 plot: +#' a list of n+1 ggplot2 plot (with n the number of hill scale value). +#' For example, with the default scale value: #' - plot_Hill_0 : the boxplot of Hill number 0 (= species richness) #' against the variable #' - plot_Hill_1 : the boxplot of Hill number 1 (= Shannon index) @@ -1438,7 +1474,8 @@ multiplot <- #' if (requireNamespace("multcompView")) { #' p2 <- hill_pq(data_fungi, "Time", #' correction_for_sample_size = FALSE, -#' letters = TRUE, add_points = TRUE, plot_with_tuckey = FALSE +#' letters = TRUE, add_points = TRUE, +#' plot_with_tuckey = FALSE #' ) #' if (requireNamespace("patchwork")) { #' patchwork::wrap_plots(p2, guides = "collect") @@ -1448,8 +1485,10 @@ multiplot <- #' data_fungi_modif@otu_table[data_fungi_modif@sam_data$Height == "High", ] <- #' data_fungi_modif@otu_table[data_fungi_modif@sam_data$Height == "High", ] + #' sample(c(rep(0, ntaxa(data_fungi_modif) / 2), rep(100, ntaxa(data_fungi_modif) / 2))) -#' p3 <- hill_pq(data_fungi_modif, "Height", letters = TRUE) -#' p3[[1]] +#' p3 <- hill_pq(data_fungi_modif, "Height", +#' letters = TRUE, vioplot = TRUE, +#' add_points = TRUE +#' ) #' } #' } #' @seealso [psmelt_samples_pq()] and [ggbetween_pq()] @@ -1461,10 +1500,12 @@ hill_pq <- function(physeq, letters = FALSE, add_points = FALSE, add_info = TRUE, + kruskal_test = TRUE, one_plot = FALSE, plot_with_tuckey = TRUE, correction_for_sample_size = TRUE, - na_remove = TRUE) { + na_remove = TRUE, + vioplot = FALSE) { if (!is.null(variable)) { if (!is.null(fact)) { stop( @@ -1514,11 +1555,37 @@ hill_pq <- function(physeq, correction_for_sample_size = correction_for_sample_size ) p_list <- list() + + if (kruskal_test) { + kt_res <- list() + for (i in seq_along(hill_scales)) { + kt_res[[i]] <- kruskal.test(df_hill[, paste0("Hill_", hill_scales[[i]])], df_hill[, fact]) + } + if (sum(sapply(kt_res, function(x) { + x$p.value > 0.05 + })) > 0) { + message(paste0(sum(sapply(kt_res, function(x) { + x$p.value > 0.05 + })), " out of ", length(kt_res), " Hill scales do not show any global trends with you factor ", fact, ". Tuckey HSD plot is not informative for those Hill scales. Letters are not printed for those Hill scales")) + } + } + for (i in seq_along(hill_scales)) { - p_list[[i]] <- - ggplot(df_hill, aes(group = !!var, .data[[paste0("Hill_", hill_scales[[i]])]])) + - geom_boxplot(outlier.size = 2, aes(colour = as.factor(!!color_fac), y = !!var)) + - labs(x = paste0("Hill_", hill_scales[[i]])) + if (vioplot) { + p_list[[i]] <- + ggplot(df_hill, aes( + x = .data[[paste0("Hill_", hill_scales[[i]])]], + y = !!var + )) + + geom_violin(aes(colour = as.factor(!!color_fac))) + + labs(x = paste0("Hill_", hill_scales[[i]])) + } else { + p_list[[i]] <- + ggplot(df_hill, aes(group = !!var, x = .data[[paste0("Hill_", hill_scales[[i]])]])) + + geom_boxplot(outlier.size = 2, aes(colour = as.factor(!!color_fac), y = !!var)) + + labs(x = paste0("Hill_", hill_scales[[i]])) + } + if (add_points) { p_list[[i]] <- p_list[[i]] + geom_jitter(aes(y = !!var, colour = as.factor(!!color_fac)), alpha = 0.5) @@ -1533,9 +1600,21 @@ hill_pq <- function(physeq, collapse = " - '" ) ) - + if (kruskal_test) { + subtitle_plot <- paste0( + subtitle_plot, "\n", + paste0( + " Hill ", hill_scales[[i]], + " -- Kruskal-Wallis chi-squared =", + round(kt_res[[i]]$statistic, 2), + "; df = ", kt_res[[i]]$parameter, + "; p.value =", format.pval(kt_res[[i]]$p.value, 2) + ) + ) + } p_list[[i]] <- p_list[[i]] + labs(subtitle = subtitle_plot) } + if (letters) { data_h <- p_var$data[grep(paste0("Hill_", hill_scales[[i]]), p_var$data[, 5]), ] @@ -1549,20 +1628,22 @@ hill_pq <- function(physeq, data_letters <- p_list[[i]]$data %>% group_by(!!var) %>% summarize(pos_letters = max(.data[[paste0("Hill_", hill_scales[[i]])]]) + 1) %>% - inner_join(dt) - - p_list[[i]] <- p_list[[i]] + - geom_label( - data = data_letters, - aes( - x = pos_letters, - label = Letters - ), - y = ggplot_build(p_list[[i]])$data[[1]]$y, - size = 4, - stat = "unique", - parse = TRUE - ) + inner_join(dt, by = join_by(!!fact)) + + if (!kruskal_test | kt_res[[i]]$p.value < 0.05) { + p_list[[i]] <- p_list[[i]] + + geom_label( + data = data_letters, + aes( + x = pos_letters, + label = Letters, + ), + y = unique(ggplot_build(p_list[[i]])$data[[1]]$y), + size = 4, + stat = "unique", + parse = TRUE + ) + } } } @@ -1586,7 +1667,9 @@ hill_pq <- function(physeq, #' Box/Violin plots for between-subjects comparisons of Hill Number #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Note that contrary to [hill_pq()], this function does not take into #' account for difference in the number of sequences per samples/modalities. @@ -1689,7 +1772,10 @@ ggbetween_pq <- ################################################################################ #' Summarize a \code{\link{phyloseq-class}} object using a plot. #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param add_info Does the bottom down corner contain #' extra informations? @@ -1887,7 +1973,10 @@ summary_plot_pq <- function(physeq, ################################################################################ #' rotl wrapper for phyloseq data #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental +#' #' Make a phylogenetic tree using the ASV names of a physeq object and the #' Open Tree of Life tree. #' @@ -1948,7 +2037,9 @@ rotl_pq <- function(physeq, ################################################################################ #' Heat tree from `metacoder` package using `tax_table` slot #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' Note that the number of ASV is store under the name `n_obs` #' and the number of sequences under the name `nb_sequences` @@ -2022,7 +2113,10 @@ heat_tree_pq <- function(physeq, taxonomic_level = NULL, ...) { ################################################################################ #' Visualization of two samples for comparison #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param fact (default: NULL) Name of the factor in `physeq@sam_data`. #' If left to NULL use the `left_name` and `right_name` parameter as modality. @@ -2297,7 +2391,9 @@ biplot_pq <- function(physeq, ################################################################################ #' Visualization of a collection of couples of samples for comparison #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' This allow to plot all the possible [biplot_pq()] combination #' using one factor. @@ -2385,7 +2481,9 @@ multi_biplot_pq <- function(physeq, #' Plot taxonomic distribution in function of a factor with stacked bar in % #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' An alternative to `phyloseq::plot_bar()` function. #' @@ -2580,7 +2678,9 @@ plot_tax_pq <- #' one sample factor #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Note that lvl3 need to be nested in lvl2 which need to be nested #' in lvl1 @@ -2750,7 +2850,8 @@ tsne_pq <- ################################################################################ #' Plot a tsne low dimensional representation of a phyloseq object #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' Partially inspired by `phylosmith::tsne_phyloseq()` function developed by Schuyler D. Smith. #' @@ -2849,7 +2950,10 @@ plot_tsne_pq <- function(physeq, ################################################################################ #' Scaling with ranked subsampling (SRS) curve of phyloseq object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental +#' #' @inheritParams clean_pq #' @param clean_pq (logical): Does the phyloseq #' object is cleaned using the [clean_pq()] function? @@ -2882,7 +2986,9 @@ SRS_curve_pq <- function(physeq, clean_pq = FALSE, ...) { ################################################################################ #' iNterpolation and EXTrapolation of Hill numbers (with iNEXT) #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param merge_sample_by (default: NULL) if not `NULL` samples of @@ -2946,7 +3052,9 @@ iNEXT_pq <- function(physeq, ################################################################################ #' Make upset plot for phyloseq object. #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Alternative to venn plot. #' @@ -3014,7 +3122,6 @@ iNEXT_pq <- function(physeq, #' ) #' ) #' -#' #' upset_pq( #' data_fungi_mini, #' fact = "Time", @@ -3044,7 +3151,6 @@ iNEXT_pq <- function(physeq, #' ) #' ) #' -#' #' upset_pq( #' subset_taxa(data_fungi_mini, Phylum == "Basidiomycota"), #' fact = "Time", @@ -3155,7 +3261,9 @@ upset_pq <- function(physeq, ################################################################################ #' Test for differences between intersections #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams upset_pq #' @param var_to_test (default c("OTU")) : a vector of column present in @@ -3230,7 +3338,10 @@ upset_test_pq <- #' Compute different functions for different class of vector. #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental +#' #' Mainly an internal function useful in "sapply(..., tapply)" methods #' #' @param x : a vector @@ -3356,7 +3467,9 @@ diff_fct_diff_class <- ################################################################################ #' Plot the distribution of sequences or ASV in one taxonomic levels #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param fact Name of the factor to cluster samples by modalities. @@ -3413,7 +3526,9 @@ tax_bar_pq <- ################################################################################ #' Ridge plot of a phyloseq object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param fact (required) Name of the factor in `physeq@sam_data` used to plot @@ -3437,7 +3552,7 @@ tax_bar_pq <- #' \donttest{ #' if (requireNamespace("ggridges")) { #' ridges_pq(data_fungi_mini, "Time", alpha = 0.5, scale = 0.9) -#' ridges_pq(data_fungi_mini, "Sample_names", log10trans = TRUE) +#' ridges_pq(data_fungi_mini, "Sample_names", log10trans = TRUE) + facet_wrap("~Height") #' #' ridges_pq(data_fungi_mini, #' "Time", @@ -3452,6 +3567,7 @@ ridges_pq <- function(physeq, fact, nb_seq = TRUE, log10trans = TRUE, + tax_level = "Class", ...) { psm <- psmelt(physeq) psm <- psm %>% filter(Abundance > 0) @@ -3461,18 +3577,18 @@ ridges_pq <- function(physeq, } if (nb_seq) { p <- ggplot(psm, aes(y = factor(.data[[fact]]), x = Abundance)) + - ggridges::geom_density_ridges(aes(fill = Class), ...) + + ggridges::geom_density_ridges(aes(fill = .data[[tax_level]]), ...) + xlim(c(0, NA)) } else { psm_asv <- psm %>% - group_by(.data[[fact]], OTU, Class) %>% + group_by(.data[[fact]], OTU, .data[[tax_level]]) %>% summarise("count" = n()) p <- ggplot(psm_asv, aes(y = factor(.data[[fact]]), x = count)) + ggridges::geom_density_ridges( - aes(fill = Class), + aes(fill = .data[[tax_level]]), ... ) + xlim(c(0, NA)) @@ -3487,7 +3603,9 @@ ridges_pq <- function(physeq, #' Plot treemap of 2 taxonomic levels #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Note that lvl2need to be nested in lvl1 #' @@ -3617,7 +3735,9 @@ treemap_pq <- function(physeq, ################################################################################ #' Plot the partition the variation of a phyloseq object #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @param res_varpart (required) the result of the functions [var_par_pq()] #' or [var_par_rarperm_pq()] @@ -3735,14 +3855,14 @@ plot_var_part_pq <- if (show_dbrda_signif) { if (is.null(res_varpart$dbrda_result_prop_pval_signif)) { cond <- - c(1:length(res_varpart$dbrda_result))[sapply(res_varpart$dbrda_result, function(x) { + seq_along(res_varpart$dbrda_result)[sapply(res_varpart$dbrda_result, function(x) { x$`Pr(>F)`[[1]] < show_dbrda_signif_pval })] res_varpart$Xnames[cond] <- paste0(res_varpart$Xnames[cond], "*") } else { cond <- - c(1:length(res_varpart$dbrda_result))[res_varpart$dbrda_result_prop_pval_signif >= + seq_along(res_varpart$dbrda_result)[res_varpart$dbrda_result_prop_pval_signif >= min_prop_pval_signif_dbrda] res_varpart$Xnames[cond] <- paste0(res_varpart$Xnames[cond], "*") @@ -3798,7 +3918,9 @@ plot_var_part_pq <- #' Hill diversity of phyloseq object #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Basically a wrapper of function [ggstatsplot::ggscatterstats()] for #' object of class phyloseq and Hill number. @@ -3876,7 +3998,9 @@ ggscatt_pq <- function(physeq, #' Alluvial plot for taxonomy and samples factor vizualisation #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' Basically a wrapper of [ggalluvial](https://corybrunson.github.io/ggalluvial/index.html) #' package diff --git a/R/speedyseq_functions.R b/R/speedyseq_functions.R index 3bded139..41503cba 100644 --- a/R/speedyseq_functions.R +++ b/R/speedyseq_functions.R @@ -1,7 +1,9 @@ #' Merge taxa in groups (vectorized version) #' #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable #' #' Firstly release in the [speedyseq](https://github.com/mikemc/speedyseq/) R #' package by Michael R. McLaren. @@ -309,7 +311,9 @@ bad_flush_right <- function(x, bad = "BAD", na_bad = FALSE, k = length(x)) { #' Merge samples by a sample variable or factor #' @description -#' `r lifecycle::badge("stable")` +#' +#' +#' lifecycle-stable #' #' Firstly release in the [speedyseq](https://github.com/mikemc/speedyseq/) R #' package by Michael R. McLaren. @@ -333,7 +337,9 @@ bad_flush_right <- function(x, bad = "BAD", na_bad = FALSE, k = length(x)) { #' `unique_or_na` #' @param reorder Logical specifying whether to reorder the new (merged) #' samples by name -#' +#' @param default_fun Default functions if funs is not set. Per default +#' the function unique_or_na is used. See `diff_fct_diff_class()` for +#' a useful alternative. #' @export #' @return A new phyloseq-class, otu_table or sam_data object depending on #' the class of the x param @@ -357,7 +363,8 @@ setGeneric( group, fun_otu = sum, funs = list(), - reorder = FALSE) { + reorder = FALSE, + default_fun = unique_or_na) { standardGeneric("merge_samples2") } ) @@ -366,7 +373,8 @@ setGeneric( setMethod( "merge_samples2", signature("phyloseq"), - function(x, group, fun_otu = sum, funs = list(), reorder = FALSE) { + function(x, group, fun_otu = sum, funs = list(), reorder = FALSE, + default_fun = unique_or_na) { if (length(group) == 1) { stopifnot(group %in% sample_variables(x)) group <- sample_data(x)[[group]] @@ -385,7 +393,7 @@ setMethod( reorder = reorder ) if (!is.null(access(x, "sam_data"))) { - sam.merged <- merge_samples2(sample_data(x), group, funs = funs) + sam.merged <- merge_samples2(sample_data(x), group, funs = funs, default_fun = default_fun) } else { sam.merged <- NULL } @@ -403,7 +411,8 @@ setMethod( setMethod( "merge_samples2", signature("otu_table"), - function(x, group, fun_otu = sum, reorder = FALSE) { + function(x, group, fun_otu = sum, reorder = FALSE, + default_fun = unique_or_na) { stopifnot(identical(length(group), nsamples(x))) # Work with samples as rows, and remember to flip back at end if needed needs_flip <- taxa_are_rows(x) @@ -449,7 +458,8 @@ setMethod( setMethod( "merge_samples2", signature("sample_data"), - function(x, group, funs = list(), reorder = FALSE) { + function(x, group, funs = list(), reorder = FALSE, + default_fun = unique_or_na) { if (length(group) == 1) { stopifnot(group %in% sample_variables(x)) group <- x[[group]] @@ -469,7 +479,7 @@ setMethod( # For vars in the funs, run f through as_mapper; else, use the default f funs <- purrr::map2( var_in_funs, names(var_in_funs), - ~ if (.x) purrr::as_mapper(funs[[.y]]) else unique_or_na + ~ if (.x) purrr::as_mapper(funs[[.y]]) else default_fun ) ## Merge variable values, creating a new sample_data object with one row ## per group. diff --git a/R/table_functions.R b/R/table_functions.R index 76c29399..f5136c65 100644 --- a/R/table_functions.R +++ b/R/table_functions.R @@ -1,7 +1,10 @@ ################################################################################ #' Make a datatable with the taxonomy of a \code{\link{phyloseq-class}} object #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param abundance (default: TRUE) Does the number of sequences is print #' @param taxonomic_level (default: NULL) a vector of selected taxonomic @@ -10,7 +13,6 @@ #' OTU abundancy by level of the modality #' @param ... Other argument for the datatable function #' -#' #' @author Adrien Taudière #' @return A datatable #' @export @@ -93,7 +95,9 @@ tax_datatable <- function(physeq, #' Compare samples in pairs using diversity and number of ASV including #' shared ASV. #' @description -#' `r lifecycle::badge("experimental")` #' For the moment refseq slot need to be not Null. +#' +#' +#' lifecycle-experimental #' For the moment refseq slot need to be not Null. #' #' @inheritParams clean_pq #' @param bifactor (required) a factor (present in the `sam_data` slot of @@ -268,7 +272,10 @@ compare_pairs_pq <- function(physeq = NULL, ################################################################################ #' Create an visualization table to describe taxa distribution across a modality #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing +#' #' @inheritParams clean_pq #' @param modality (required) The name of a column present in the `@sam_data` slot #' of the physeq object. Must be a character vector or a factor. diff --git a/R/targets_misc.R b/R/targets_misc.R index 60799044..2c49726a 100644 --- a/R/targets_misc.R +++ b/R/targets_misc.R @@ -1,7 +1,9 @@ ################################################################################ #' List fastq files #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @param path path to files (required) #' @param paired_end do you have paired_end files? (default TRUE) @@ -52,7 +54,9 @@ list_fastq_files <- #' Rename samples of an otu_table #' #' @description -#' `r lifecycle::badge("experimental")` +#' +#' +#' lifecycle-experimental #' #' @inheritParams clean_pq #' @param names_of_samples (required) The new names of the samples @@ -92,14 +96,15 @@ rename_samples_otu_table <- function(physeq, names_of_samples) { #' [targets](https://books.ropensci.org/targets/) pipeline #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' This function filter and trim (with parameters passed on to #' [dada2::filterAndTrim()] function) forward sequences or paired end #' sequence if 'rev' parameter is set. It return the list of files to #' subsequent analysis in a targets pipeline. #' -#' #' @param fw (required) a list of forward fastq files #' @param rev a list of reverse fastq files for paired end trimming #' @param output_fw Path to output folder for forward files. By default, @@ -140,7 +145,6 @@ rename_samples_otu_table <- function(physeq, names_of_samples) { #' derep_rv_pe #' @author Adrien Taudière #' -#' #' @seealso [dada2::filterAndTrim()] filter_trim <- function(fw = NULL, @@ -212,7 +216,9 @@ filter_trim <- #' optional order #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @param file_path (required) a path to the sample_data file #' @param names_of_samples (required) a vector of sample names @@ -248,7 +254,8 @@ sample_data_with_new_names <- function(file_path, #' Rename the samples of a phyloseq slot #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing #' #' @param phyloseq_component (required) one of otu_table or sam_data slot of a #' phyloseq-class object diff --git a/R/vsearch.R b/R/vsearch.R index 7c095ec4..1cbbbf98 100644 --- a/R/vsearch.R +++ b/R/vsearch.R @@ -2,7 +2,8 @@ #' Search for a list of sequence in a fasta file against physeq reference #' sequences using [vsearch](https://github.com/torognes/vsearch) #' -#' `r lifecycle::badge("maturing")` +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param seq2search (required if path_to_fasta is NULL) Either (i) a DNAstringSet object @@ -128,7 +129,9 @@ vs_search_global <- function(physeq, #' or cluster a list of DNA sequences using SWARM #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param dna_seq NOT WORKING FOR THE MOMENT @@ -330,7 +333,9 @@ swarm_clustering <- function(physeq = NULL, #' or cluster a list of DNA sequences using vsearch software #' #' @description -#' `r lifecycle::badge("maturing")` +#' +#' +#' lifecycle-maturing #' #' @inheritParams clean_pq #' @param dna_seq You may directly use a character vector of DNA sequences @@ -473,7 +478,8 @@ vsearch_clustering <- function(physeq = NULL, #' Search for a list of sequence in an object to remove chimera taxa #' using [vsearch](https://github.com/torognes/vsearch) #' -#' `r lifecycle::badge("experimental")` +#' +#' lifecycle-experimental #' #' @param object (required) A phyloseq-class object or one of dada, derep, #' data.frame or list coercible to sequences table using the @@ -598,8 +604,8 @@ chimera_removal_vs <- #' Detect for chimera taxa using [vsearch](https://github.com/torognes/vsearch) #' -#' `r lifecycle::badge("experimental")` -#' +#' +#' lifecycle-experimental #' #' @param seq2search (required) a list of DNA sequences coercible by function #' [Biostrings::DNAStringSet()] diff --git a/README.md b/README.md index 1f1ba3bc..b3e4de53 100644 --- a/README.md +++ b/README.md @@ -217,7 +217,8 @@ make install # as root or sudo conda create -n cutadaptenv cutadapt ``` -
+
diff --git a/_pkgdown.yml b/_pkgdown.yml index 2c1d9a25..9445c8a9 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -21,7 +21,7 @@ navbar: href: articles/filter.html - text: Reclustering href: articles/Reclustering.html - - text: Tree visualization + - text: Tree building and visualization href: articles/tree_visualization.html - text: ------- - text: Diversity analysis @@ -30,6 +30,10 @@ navbar: - text: Beta-diversity href: articles/beta-div.html - text: ------- + - text: Bioinformatics actions + - text: Fastq quality check + href: articles/fastq_quality_check.html + - text: ------- - text: Examples with published dataset - text: Tengeler href: articles/tengeler.html diff --git a/docs/404.html b/docs/404.html index 9366bbb7..19fce329 100644 --- a/docs/404.html +++ b/docs/404.html @@ -16,7 +16,7 @@ - + Contributor Covenant Code of Conduct • MiscMetabarContributor Covenant Code of Conduct • MiscMetabar Skip to contents -