Skip to content
Nils Eling edited this page Oct 16, 2018 · 1 revision

This list compiles some of the most frequent questions we receive about BASiCS:

  1. Can BASiCS be used in the context of unsupervised analyses?

No. BASiCS has been designed in the context of supervised analyses where the groups of cells to be analysed are known a priori (such as experimental conditions or cell types). Therefore, BASiCS should not be used in unsupervised analyses where the aim is to discover new cell populations through clustering.

  1. Can I use BASiCS if my data does not include spike-in genes?

Yes, the latest version of BASiCS has been extended to do not require spike-in genes. When technical spike-in genes are not available, BASiCS uses a horizontal integration strategy which borrows information across multiple technical replicate. Here, these are interpreted as multiple sets of samples from the same cell population.

  1. Do I need to perform quality control of my data before running BASiCS?

Yes. To use BASiCS, you need to remove poor quality cells before hand. Moreover, we recommend to filter out very lowly expressed transcripts for which the data is less informative. The inclusion criteria may be data-specific. In BASiCS, the BASiCS_Filter function can help with this step. Alternatively, the scater package provides enhanced functionality for the pre-processing of scRNA-seq datasets.

  1. How many iterations should I use for the BASiCS_MCMC function?

The number of iterations required to achieve convergence can be dataset-specific. However, based on the default settings, we have observed that setting N = 20000, Thin = 10 and Burn = 10000 has a good performance across a range of datasets.

  1. Is there a minimum number of cells required for the analysis?

In principle, no. However, BASiCS is complex high-dimensional model and low sample sizes can lead to unstable inference. Our simulations suggest that using less than n = 40 cells is not recommendable. In those situations, we observe biased posterior estimates for gene-specific parameters. This behaviour is emphasised for lowly expressed genes.

  1. Is there a maximum number of cells that can used for the analysis?

In principle, no. However, we observed that running times increase linearly with respect to the number of cells n. Therefore, you should expect long running times for large sample sizes.

  1. What if BASiCS_MCMC is interrupted by an error?

Our simulations and case studies suggest that an error during the MCMC sampler is typically related to cases where quality control (see FAQ 3) has been too liberal (or not applied at all). If this happens, please use a more stringent filter for cells and/or genes.

Clone this wiki locally