Skip to content

Commit

Permalink
Merge pull request #119 from nidhisoley/main
Browse files Browse the repository at this point in the history
HW4_png_md_nsoley1
  • Loading branch information
JEFworks authored Feb 15, 2024
2 parents 40289fb + aaf8f31 commit 5d355a0
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 0 deletions.
96 changes: 96 additions & 0 deletions _posts/2024-02-13-nsoley1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
layout: post
title: "Spatially Resolved Gene Expression Analysis using KMeans clustering"
author: Nidhi Soley
jhed: nsoley1
categories: [ HW4 ]
image: homework/hw4/hw4_nsoley1.png
featured: false
---

#### Describe your figure briefly so we know what you are depicting (you no longer need to use precise data visualization terms as you have been doing). Write a description to convince me that your cluster interpretation is correct. Your description may reference papers and content that allowed you to interpret your cell cluster as a particular cell-type. You must provide attribution to external resources referenced. Links are fine; formatted references are not required.

Firstly I applied KMeans clustering on the normalized gene expression data, with random K=3. Then, I chose a cluster which was scattered at few spots in spatial position (blue in color), not together like cluster 1 and 2. I was interested to see which gene is highly expressed in this cluster. I performed t-test to see the upregulated genes in this cluster of interest. I found that DSP (pvalue 2.519682e-99) was the top gene that was upregulated. Further, I visualized DSP in the physical space and in PCA space.

1. **Spatial Distribution of Clusters**: The first panel shows the spatial distribution of spots with multiple cell-types within the tissue. Each spot is represented by a point on the plot, with colors hue indicating the clusters assigned by KMeans clustering. Notably, cluster 1 (shown in red) appears scattered across the tissue, contrasting with the more cohesive distributions of clusters two and three.
2. **Spatial Distribution of Cluster of interest**: The second panel shows the spatial distribution of the spots with multiple cell-types for cluster 1.
3. **Cluster of interest in PCA space:** The spots belonging to cluster 1, are all grouped together in the reduced dimensionality space (blue), along negative PC1, and PC2.
4. **Differentially expressed genes in cluster 1:** I used t-test to find the statistically significant upregulated genes in cluster 1. The plot shows negative log10 pvalue plot for top 20 genes, DSP is the top gene with highest pvalue.
5. **Gene Expression in Physical Space**: The expression pattern of the gene DSP across the spatial coordinates shows cells corresponding to cluster 1, which were identified as scattered in spatial position (red in first/second plot), exhibit high expression of DSP (yellow) at the same spatial location based on x,y coordinates.
6. **Gene Expression in PCA Space**: The last panel illustrates the expression pattern of DSP projected onto a reduced-dimensional space obtained through PCA. Spots with high expression of DSP group together in the PCA space, and are in the similar position in PCA space as the position of cluster 1 in PCA (blue in plot 3). The yellow spots are clustered together on the negative PC1, and PC2 axis. This clustering pattern implies that the yellow spots are likely to represent a subpopulation of cells that exhibit high DSP expression, possibly associated with a particular cell type or functional state.

By integrating spatial information with gene expression analysis, I inferred that cluster 1 likely represents spots containing multiple cell types characterized by high expression of the DSP gene. The known biology of DSP, or desmoplakin, indicates its involvement in cell-cell adhesion, particularly within desmosomes, structures found in tissues subjected to mechanical stress, such as the skin and heart [1][2]. Hence, I speculate that the cell types within cluster 1 may correspond to cell type like- epithelial cells in the skin, and cardiac muscle cells (e.g., cardiomyocytes).

[1] Green KJ, Simpson CL. Desmosomes: new perspectives on a classic. J Invest Dermatol. 2007;127(11):2499-2515. <doi:10.1038/sj.jid.5700915>

[2] McGrath, J. A., & Mellerio, J. E. (2010). Ectodermal dysplasia-skin fragility syndrome. Dermatologic clinics, *28*(1), 125–129. <https://doi.org/10.1016/j.det.2009.10.014>

#### You must include the entire code you used to generate the figure so that it can be reproduced.

```{r}
library(viridis)
library(patchwork)
library(ggplot2)
data <- read.csv('/Users/nidhisoley/Desktop/GDataViz/eevee.csv.gz', row.names=1)
pos <- data[, c("aligned_x", "aligned_y")]
gexp <- data[, 4:ncol(data)]
gexpnorm <- log10(gexp/rowSums(gexp) * mean(rowSums(gexp)) + 1)
pcs <- prcomp(gexpnorm)
#K-means
clusters <- as.factor(kmeans(gexpnorm, centers = 3)$cluster)
p1 <- ggplot(data.frame(pos, Cluster = clusters)) +
geom_point(aes(x = aligned_x, y = aligned_y, col = Cluster), size = 3) +
theme_bw() +
labs(title = "Clusters in Physical Space")
#cluster of interest
cluster_of_interest <- 1
df <- data.frame(data, kmeans=(clusters))
df1 <- df[df$kmeans == cluster_of_interest,]
p2 <-ggplot(df1) + geom_point(aes(x = aligned_x, y=aligned_y, col=kmeans),
size=3) + theme_bw() +
labs(title='Cluster of interest:1 in Physical Space')+
labs(color='Cluster')
#pca
pca_df <- data.frame(PC1 = pcs$x[, 1], PC2 = pcs$x[, 2], Cluster = clusters)
cluster_1_df <- pca_df[pca_df$Cluster == cluster_of_interest, ]
p3 <- ggplot(pca_df, aes(x = PC1, y = PC2)) +
geom_point(data = cluster_1_df, aes(col = "Cluster 1"), size = 2) +
geom_point(data = pca_df[pca_df$Cluster != cluster_of_interest, ], aes(col = "Other Clusters"), size = 1) +
scale_color_manual(values = c("Cluster 1" = "blue", "Other Clusters" = "gray")) +
theme_bw() +
labs(title = "Cluster 1 in PCA space",
color = "Cluster") +
guides(color = guide_legend(title = NULL))
#t-test
cluster_1_genes <- colnames(gexpnorm)[clusters == cluster_of_interest]
results <- sapply(cluster_1_genes, function(g) {
t.test(gexpnorm[clusters == cluster_of_interest, g],
gexpnorm[clusters != cluster_of_interest, g],
alternative = 'greater')$p.val
})
significant_genes <- names((sort(results[results < 0.05])[1:19]))
significant_genes_df <- data.frame(Gene = significant_genes, p_value = results[significant_genes])
significant_genes_df$log_p_value <- -log10(significant_genes_df$p_value)
p4 <- ggplot(significant_genes_df, aes(x = reorder(Gene, log_p_value), y = log_p_value)) +
geom_bar(stat = "identity", fill = "skyblue") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
labs(title = "Top Differentially Expressed Genes in Cluster 1",
x = "Genes",
y = "-log10(p-value)")
#DSP
gene_of_interest <- 'DSP'
p5 <- ggplot(data.frame(pos, gene = gexpnorm[, gene_of_interest])) +
geom_point(aes(x = aligned_x, y = aligned_y, col = gene), size = 3) +
scale_color_viridis(option = 'plasma')+
theme_bw() +
labs(title = paste("Expression of", gene_of_interest, "in Physical Space"))+
labs(color='log10(DSP)')
p6 <- ggplot(data.frame(pcs$x, gene = gexpnorm[, gene_of_interest])) +
geom_point(aes(x = PC1, y = PC2, col = gene), size = 2) +
scale_color_viridis(option = 'plasma')+
theme_bw() +
labs(title = paste("Expression of", gene_of_interest, "in PCA space"))+
labs(color='log10(DSP)')
p1 +p2+p3+p4+p5+p6
plot_layout(ncol = 3)
```
Binary file added homework/hw4/hw4_nsoley1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5d355a0

Please sign in to comment.