Skip to content

Commit

Permalink
minor formatting changes
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewGhazi committed Sep 9, 2024
1 parent 2a6f2c5 commit 7f89188
Showing 1 changed file with 54 additions and 4 deletions.
58 changes: 54 additions & 4 deletions episodes/large_data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ set, as provided by the

```{r tenx-brain}
library(TENxBrainData)
sce.brain <- TENxBrainData20k()
sce.brain
```

Expand All @@ -79,7 +81,9 @@ This avoids the need for large RAM availability during analyses.

```{r tenx-brain-size}
counts(sce.brain)
object.size(counts(sce.brain))
file.info(path(counts(sce.brain)))$size
```

Expand All @@ -94,7 +98,9 @@ new file at every operation, which would unnecessarily require time-consuming di

```{r}
tmp <- counts(sce.brain)
tmp <- log2(tmp + 1)
tmp
```

Expand All @@ -110,8 +116,11 @@ function that we used in the other workflows.

```{r, message = FALSE}
library(scater)
is.mito <- grepl("^mt-", rowData(sce.brain)$Symbol)
qcstats <- perCellQCMetrics(sce.brain, subsets = list(Mt = is.mito))
qcstats
```

Expand Down Expand Up @@ -148,9 +157,10 @@ by indicating the `BPPARAM` argument in `bplapply`.

```{r}
param <- MulticoreParam(workers = 1)
bplapply(
X = c(4, 9, 16, 25),
FUN = function(x) { sqrt(x) },
FUN = sqrt,
BPPARAM = param
)
```
Expand All @@ -166,10 +176,15 @@ calculations on a Unix system:

```{r parallel-mc, message = FALSE}
library(MouseGastrulationData)
library(scran)
sce <- WTChimeraData(samples = 5, type = "processed")
sce <- logNormCounts(sce)
dec.mc <- modelGeneVar(sce, BPPARAM = MulticoreParam(2))
dec.mc
```

Expand All @@ -190,6 +205,7 @@ details).
```{r batch-param, eval=FALSE}
# 2 hours, 8 GB, 1 CPU per task, for 10 tasks.
rs <- list(walltime = 7200, memory = 8000, ncpus = 1)
bpp <- BatchtoolsParam(10, cluster = "slurm", resources = rs)
```

Expand Down Expand Up @@ -240,7 +256,9 @@ graph-based clustering using the Louvain algorithm for community detection:

```{r pca-and-cluster}
library(bluster)
sce <- runPCA(sce)
colLabels(sce) <- clusterCells(sce, use.dimred = "PCA",
BLUSPARAM = NNGraphParam(cluster.fun = "louvain"))
```
Expand All @@ -256,17 +274,21 @@ approximation can be largely ignored.

```{r cluster-annoy}
library(scran)
library(BiocNeighbors)
clusters <- clusterCells(sce, use.dimred = "PCA",
BLUSPARAM = NNGraphParam(cluster.fun = "louvain",
BNPARAM = AnnoyParam()))
table(exact = colLabels(sce), approx = clusters)
```

The similarity of the two clusterings can be quantified by calculating the pairwise Rand index:

```{r check-annoy}
rand <- pairwiseRand(colLabels(sce), clusters, mode = "index")
stopifnot(rand > 0.8)
```

Expand All @@ -280,11 +302,17 @@ the biological conclusions.

```{r bad-approx}
set.seed(1000)
y1 <- matrix(rnorm(50000), nrow = 1000)
y2 <- matrix(rnorm(50000), nrow = 1000)
Y <- rbind(y1, y2)
exact <- findKNN(Y, k = 20)
approx <- findKNN(Y, k = 20, BNPARAM = AnnoyParam())
mean(exact$index != approx$index)
```

Expand All @@ -307,11 +335,15 @@ library(BiocSingular)
# As the name suggests, it is random, so we need to set the seed.
set.seed(101000)
r.out <- runPCA(sce, ncomponents = 20, BSPARAM = RandomParam())
str(reducedDim(r.out, "PCA"))
set.seed(101001)
i.out <- runPCA(sce, ncomponents = 20, BSPARAM = IrlbaParam())
str(reducedDim(i.out, "PCA"))
```

Expand All @@ -337,10 +369,13 @@ error in PC1 coordinates compared to the RSVD results.
This code block calculates the exact PCA coordinates. Another thing to note: PC vectors are only identified up to a sign flip. We can see that the RSVD PC1 vector points in the
```{r}
set.seed(123)
e.out <- runPCA(sce, ncomponents = 20, BSPARAM = ExactParam())
str(reducedDim(e.out, "PCA"))
reducedDim(e.out, "PCA")[1:5,1:3]
reducedDim(r.out, "PCA")[1:5,1:3]
```

Expand Down Expand Up @@ -409,6 +444,7 @@ We then proceed by loading all required packages and installing the PBMC dataset

```{r seurat-data, eval = FALSE}
library(SeuratData)
InstallData("pbmc3k")
```

Expand All @@ -418,9 +454,13 @@ We then load the dataset as an `SeuratObject` and convert it to a
```{r seurat_singlecell, eval = FALSE}
# Use PBMC3K from SeuratData
pbmc <- LoadData(ds = "pbmc3k", type = "pbmc3k.final")
pbmc <- UpdateSeuratObject(pbmc)
pbmc
pbmc.sce <- as.SingleCellExperiment(pbmc)
pbmc.sce
```

Expand All @@ -429,8 +469,11 @@ we demonstrate this here on the wild-type chimera mouse gastrulation dataset.

```{r sce2sobj, message = FALSE, eval = FALSE}
sce <- WTChimeraData(samples = 5, type = "processed")
assay(sce) <- as.matrix(assay(sce))
sce <- logNormCounts(sce)
sce
```

Expand All @@ -439,7 +482,9 @@ the `as.Seurat` function.

```{r, warning = FALSE, eval = FALSE}
sobj <- as.Seurat(sce)
Idents(sobj) <- "celltype.mapped"
sobj
```

Expand Down Expand Up @@ -477,6 +522,7 @@ package.
```{r read-h5ad, message = FALSE, warning = FALSE}
example_h5ad <- system.file("extdata", "krumsiek11.h5ad",
package = "zellkonverter")
readH5AD(example_h5ad)
```

Expand All @@ -486,6 +532,7 @@ chimera mouse gastrulation dataset.

```{r write-h5ad, message = FALSE}
out.file <- tempfile(fileext = ".h5ad")
writeH5AD(sce, file = out.file)
```

Expand Down Expand Up @@ -562,15 +609,18 @@ Use the function `system.time` to obtain the runtime of each job.
sce.brain = logNormCounts(sce.brain)
system.time({i.out <- runPCA(sce.brain, ncomponents = 20,
system.time({i.out <- runPCA(sce.brain,
ncomponents = 20,
BSPARAM = ExactParam(),
BPPARAM = SerialParam())})
system.time({i.out <- runPCA(sce.brain, ncomponents = 20,
system.time({i.out <- runPCA(sce.brain,
ncomponents = 20,
BSPARAM = ExactParam(),
BPPARAM = MulticoreParam(workers = 2))})
system.time({i.out <- runPCA(sce.brain, ncomponents = 20,
system.time({i.out <- runPCA(sce.brain,
ncomponents = 20,
BSPARAM = ExactParam(),
BPPARAM = MulticoreParam(workers = 3))})
Expand Down

0 comments on commit 7f89188

Please sign in to comment.