Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
drisso committed Jun 17, 2016
2 parents e6b18ba + 8b24a1e commit d6870e8
Showing 1 changed file with 22 additions and 56 deletions.
78 changes: 22 additions & 56 deletions vignettes/slingshot.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ vignette: >

<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{clusterExperiment Vignette}
%\VignetteIndexEntry{slingshot Vignette}
-->

```{r options, results="hide", include=FALSE, cache=FALSE, results='hide', message=FALSE}
## change cache to FALSE
knitr::opts_chunk$set(fig.align="center", cache=TRUE, cache.path = "clusterExperimentTutorial_cache/", fig.path="clusterExperimentTutorial_figure/",error=FALSE, #make it stop on error
knitr::opts_chunk$set(fig.align="center", cache=TRUE, cache.path = "slingshotTutorial_cache/", fig.path="slingshotTutorial_figure/",error=FALSE, #make it stop on error
fig.width=6,fig.height=6,autodep=TRUE,out.width="600px",out.height="600px", results="markup", echo=TRUE, eval=TRUE)
#knitr::opts_knit$set(stop_on_error = 2L) #really make it stop
#knitr::dep_auto()
Expand Down Expand Up @@ -54,7 +54,7 @@ We will take our inputs from the previous sections: the normalized counts matrix

```{r datain, eval=TRUE}
## data(normClust) ...eventually
# clus.labels <- clusterMatrix(ce)[,1] is this right? documentation says rows = clusters
# clus <- clusterMatrix(ce)[,1] is this right?
## for now
load('~/Projects/oe_p63/E4_scone_none_fq_ruv1_bio_nobatch_1Kgl05.Rda')
Expand All @@ -69,69 +69,40 @@ clus <- mergeCl[! mergeCl %in% c(4,20)]

The `get_lineages` function takes as input an `n x p` matrix and a vector of clustering results of length `n`. It then maps connections between adjacent clusters using a minimum spanning tree (MST) and identifies paths through these connections that represent potential lineages.

This analysis can be performed in an entirely unsupervised manner or in a semi-supervised manner by specifying known beginning and end point clusters. We recommend that you specify a beginning cluster; this will have no effect on how the clusters are connected, but it will allow for nicer curves in datasets with a branching structure. Pre-specified end point clusters will be constrained to only one connection.
This analysis can be performed in an entirely unsupervised manner or in a semi-supervised manner by specifying known beginning and end point clusters. We recommend that you specify a root cluster; this will have no effect on how the clusters are connected, but it will allow for nicer curves in datasets with a branching structure. Pre-specified end point clusters will be constrained to only one connection.

```{r unsup_lines}
l1 <- get_lineages(X, clus)
#plot_tree(X, clus, l1, threeD = TRUE)
plot_tree(X, clus, l1, dim = 2)
# plot_tree(X, clus, l1, threeD = TRUE)
plot_tree(X, clus, l1, dim = 3)
```

Running `get_lineages` with no supervision produces the connections shown above. Since no root cluster was specified, `slingshot` picked one of the leaf-node clusters to be the beginning, based on a simple parsimony rule. The root cluster is the leaf-node cluster connected by a green line.

```{r sup_lines_start}
l2 <- get_lineages(X, clus, start.clus = '10')
#plot_tree(X, clus, l2, threeD = TRUE)
plot_tree(X, clus, l1, dim = 2)
# plot_tree(X, clus, l2, threeD = TRUE)
plot_tree(X, clus, l2, dim = 3)
```

When we specify a root cluster we get the same connections and the only difference is which line is drawn in green.

```{r sup_lines_end}
l3 <- get_lineages(X, clus, start.clus = '10', end.clus = '17')
#plot_tree(X, clus, l3, threeD = TRUE)
plot_tree(X, clus, l1, dim = 2)
```


In the call above, we have set the following parameters using a single value.

* `clusterFunction` is set to "hierarchical01" to use hierarchical clustering to cluster the co-clustering matrix of the subsamplings.
* `alphas` is set to 0.3.
* `subsample` and `sequential` are set to TRUE to perform subsampling and sequential clustering.

The parameters with a range of values are the following.

* `dimReduce`: use either PCA or most variable genes for clustering.
* `nPCADims`: use either 10 or 50 PCs for PCA.
* `nVarDims`: use either top 500 or 1000 most variable genes.
* `ks`: use between 5 and 15 as the value of `k` in `kmeans`.

As we can see from the output, we generated many different clusterings. One way to visualize them is through the `plotClusters` function.

```{r plotClusterEx1, eval=FALSE}
defaultMar <- par("mar")
plotCMar <- c(1.1,8.1,4.1,1.1)
par(mar=plotCMar)
plotClusters(ce, main="Clusters from clusterMany", axisLine=-1)
# plot_tree(X, clus, l3, threeD = TRUE)
plot_tree(X, clus, l3, dim = 3)
```

This plot shows the samples in the columns, and different clusterings on the rows. Each sample is color coded based on its clustering for that row, where the colors have been chosen to try to match up clusters across different clusterings that show large overlap. Moreover, the samples have been ordered so that each subsequent clustering (starting at the top and going down) will try to order the samples to keep the clusters together, without rearranging the clustering blocks of the previous clustering/row.

We can see that some clusters are fairly stable across different choices of dimensions while others can vary dramatically. Notice that some samples are white. This indicates that they have the value -1, meaning they were not clustered. This is from our choices to require at least 5 samples to make a cluster.

To retrieve the actual results of each clustering, we can use the `clusterMatrix` and `primaryClusters` functions.

```{r clusterMatrix, eval=FALSE}
head(clusterMatrix(ce)[,1:3])
table(primaryCluster(ce))
```
Here we demonstrate the ability to specify end point clusters, which puts a constraing on the connections. We now draw the MST subject to the constraint that given end point clusters must be leaves. Pre-specified end point clusters are connected by red lines.

After a call to `clusterMany` the primary clusters are simply defined as the first parameter combinations (i.e., the first column of `clusterMatrix`). We can change this, if we want, say, to select the third clustering as our preferred choice.
There are a few additional arguments we could have passed to `get_lineages` for more greater control:
* `dist.fun` is a function for computing distances between clusters. The default is squared distance between cluster centers normalized by their joint covariance matrix.
* `omega` is a granularity parameter, allowing the user to set an upper limit on connection distances. It takes values between 0 and 1 (or `Inf`), representing a percentage of the largest observed distance.
* `distout` is a logical value, indicating whether the user wants the pairwise cluster distance matrix to be returned with the output.

```{r setCluster, eval=FALSE}
primaryClusterIndex(ce) <- 3
ce
```
After constructing the MST, `get_lineages` identifies paths through the tree to designate as lineages. At this stage, a lineage will consist of an ordered set of cluster names, starting with the root cluster and ending with a leaf. The output of `get_lineages` is a list of these vectors, along with some additional information on how they were constructed.

# Step 2: Find a consensus with `combineMany`
# Step 2: Construct smooth lineages and order cells with `get_curves`

To find a consensus clustering across the many different clusterings created by `clusterMany` the function `combineMany` can be used next.

Expand All @@ -151,7 +122,7 @@ plotClusters(ce)

The proportion argument regulates how many times two samples need to be in the same cluster across parameters to be together in the combined clustering. Decreasing the value of `proportion` results in fewer "unclustered" (i.e., -1) samples. Another parameter that controls the number of unassigned samples is `minSize`, which discards the combined clusters with less than `minSize` samples.

# Step 3: Merge clusters together with `makeDendrogram` and `mergeClusters`
# Step 3: Find temporally expressed genes

It is not uncommon that `combineMany` will result in too many small clusters, which in practice are too closely related to be useful. Since our final goal is to find gene markers for each clusters, we argue that we can merge clusters that show no or little differential expression (DE) between them.

Expand Down Expand Up @@ -185,11 +156,6 @@ Finally, we can do a heatmap visualizing this final step of clustering.
plotHeatmap(ce, clusterSamplesData="dendrogramValue", breaks=.99)
```

# Step 4: Find marker genes with `getBestFeatures`

## Limma with voom weights

## Account for zero-inflation with MAST

# Session Info

Expand Down

0 comments on commit d6870e8

Please sign in to comment.