-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in ses.pd function #17
Comments
Thanks for reporting this. This is indeed a bug, I'll push an updated version of the package with this function fixed to CRAN soon, and make an announcement about it. |
Hello, I remain perplexed about how this bug happened and would welcome any feedback or insights. All functions in picante were tested against various example data sets and I did not catch this error. Looking at the unpatched version of the code that caused the error ses.pd <-
function (samp, tree, null.model = c("taxa.labels", "richness", "frequency", "sample.pool",
"phylogeny.pool", "independentswap", "trialswap"), runs = 999, iterations = 1000,
...)
{
pd.obs <- as.vector(pd(samp, tree, ...)$PD)
null.model <- match.arg(null.model)
pd.rand <- switch(null.model,
taxa.labels = t(replicate(runs, as.vector(pd(samp, tipShuffle(tree), ...)$PD))),
richness = t(replicate(runs, as.vector(pd(randomizeMatrix(samp,null.model="richness"), tree, ...)$PD))),
frequency = t(replicate(runs, as.vector(pd(randomizeMatrix(samp,null.model="frequency"), tree, ...)$PD))),
sample.pool = t(replicate(runs, as.vector(pd(randomizeMatrix(samp,null.model="richness"), tree, ...)$PD))),
phylogeny.pool = t(replicate(runs, as.vector(pd(randomizeMatrix(samp,null.model="richness"),
tipShuffle(tree), ...)$PD))),
independentswap = t(replicate(runs, as.vector(pd(randomizeMatrix(samp,null.model="independentswap", iterations), tree, ...)$PD))),
trialswap = t(replicate(runs, as.vector(pd(randomizeMatrix(samp,null.model="trialswap", iterations), tree, ...)$PD)))
)
pd.rand.mean <- apply(X = pd.rand, MARGIN = 2, FUN = mean, na.rm=TRUE)
pd.rand.sd <- apply(X = pd.rand, MARGIN = 2, FUN = sd, na.rm=TRUE)
pd.obs.z <- (pd.obs - pd.rand.mean)/pd.rand.sd
pd.obs.rank <- apply(X = rbind(pd.obs, pd.rand), MARGIN = 2,
FUN = rank)[1, ]
pd.obs.rank <- ifelse(is.na(pd.rand.mean),NA,pd.obs.rank)
data.frame(ntaxa=specnumber(samp),pd.obs, pd.rand.mean, pd.rand.sd, pd.obs.rank,
pd.obs.z, pd.obs.p=pd.obs.rank/(runs+1),runs=runs, row.names = row.names(samp))
} If ses.pd was called with the argument The patched version of |
Just a minor comment on Steve clarifications. The issue concerns the null PD values rather than the observed PD. As such, if include.root = TRUE within ses.pd, then the observed PD values will include the distance from the most recent common ancestor of the species in the communities and the root of the tree (as expected), BUT the null PD values will not (resulting in biases SES.PD values, particularly in low richness communities). However, when include.root = FALSE, then such distance is not included neither in the observed PD or the null PD values. Therefore, analyses conducted with include.root = FALSE are not affected, but note that the default setting of the function is include.root = TRUE. |
Hi everyone,
I write to report a bug in the ses.pd function. As you may know, the arguments of ses.pd are passed through the function pd, which means that one can specify whether to include the distance from the most recent common ancestor of the species and the root of the tree (MRCA – root distance). The default setting (in the pd function) is "include.root = TRUE". However, I have noted that ses.pd computes null PD values without including the MRCA – root distance regardless of the logical value that is specified in the include.root argument. This is, if include.root = TRUE (default = TRUE), the observed PD will include the MRCA – root distance, but the null PD values will not. For example:
require(picante)
data(phylocom)
phylocom$sample -> Sample
phylocom$phylo -> Tree
prune.sample(Sample,Tree)->Tree_F
Sample <- Sample[,Tree_F$tip.label]
set.seed(12345) # fix the seed to make the analyses reproducible
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=FALSE) ->
No_root_1 # the MRCA – root distance should be excluded from all the computations
set.seed(12345) # back to the same seed
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=TRUE) ->
root_1 # the MRCA – root distance should be included in all the computations
No_root_1
root_1
A quick look to the results reveals that as expected, the observed PD values between "No_root_1" and "root_1" are different for communities "clump1" and "clump2a" (the only two communities where the root of the tree is not traversed by the minimum spanning path connecting the species in the phylogeny), because in "No_root_1" the include.root argument was set to FALSE. HOWEVER, the descriptors of the null distributions in "No_root_1" and "root_1" are identical, and this is because the null PD values are computed without including the MRCA – root distance in all cases regardless of the logical value that is specified in the include.root argument.
The good news are that this anomaly only significantly affects communities with low species richness (less than four species). For more details, you may take a look to the following link, which includes a preprint with detailed information on the issue (a step by step tutorial) along with a simulation exercise (which informs about the communities that may be affected) and two alternatives to fix the problem (all the R code is fully available in the Supplementary Material of the preprint): https://www.biorxiv.org/content/10.1101/579300v1
Rafael Molina Venegas
The text was updated successfully, but these errors were encountered: