Experiment Replication and Expansion – Clustering Cancer Gene Expression Data: A Comparative Study

Abstract:

Clustering algorithms have been used to help discover cancer subtypes, but comparison of the different algorithms had not been performed before the replicated paper [1]. This study provides a comprehensive comparison of the algorithms to guide future algorithm selection for cancer subtypes research. The study evaluates and compares the clustering of cancer gene expression data using seven clustering algorithms and eight different proximity measures. The corrected Rand index (cR) assessed clustering performance. The replicated analysis is different from the original study, with k-means outperforming other methods and the finite mixture of Gaussians ranking second for Affymetrix data sets. For cDNA, spectral and shared nearest neighbors performed best. Furthermore, Manhattan distance yielded the best mean cR indices for Affymetrix datasets. In addition, analysis with PCA reduced performance, likely due to information loss. Tissue type and microarray technology also influenced clustering results, with blood tissue datasets achieving better classification and higher cR indices compared to brain tissue datasets and cDNA datasets displaying better classification than Affymetrix datasets in general. Better classification performance was observed for k-means clustering and PCA compared to hierarchical clustering on a selected blood tissue dataset (cDNA and Affymetrix). This study performs a recapitulation and expands on the original study by examining the impact on classification performance of an additional proximity measure (Manhattan distance), exploring datasets from different microarray platforms, and analyzing datasets containing diverse tissue types.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Garcia_Liz_Final.Rmd		Garcia_Liz_Final.Rmd
QCB 455 Final paper.pdf		QCB 455 Final paper.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiment Replication and Expansion – Clustering Cancer Gene Expression Data: A Comparative Study

About

Releases

Packages

lizgarciao/qcb-project

Folders and files

Latest commit

History

Repository files navigation

Experiment Replication and Expansion – Clustering Cancer Gene Expression Data: A Comparative Study

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages