Skip to content

Experiment Replication and Expansion – Clustering Cancer Gene Expression Data: A Comparative Study

Notifications You must be signed in to change notification settings

lizgarciao/qcb-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Experiment Replication and Expansion – Clustering Cancer Gene Expression Data: A Comparative Study

Abstract:

Clustering algorithms have been used to help discover cancer subtypes, but comparison of the different algorithms had not been performed before the replicated paper [1]. This study provides a comprehensive comparison of the algorithms to guide future algorithm selection for cancer subtypes research. The study evaluates and compares the clustering of cancer gene expression data using seven clustering algorithms and eight different proximity measures. The corrected Rand index (cR) assessed clustering performance. The replicated analysis is different from the original study, with k-means outperforming other methods and the finite mixture of Gaussians ranking second for Affymetrix data sets. For cDNA, spectral and shared nearest neighbors performed best. Furthermore, Manhattan distance yielded the best mean cR indices for Affymetrix datasets. In addition, analysis with PCA reduced performance, likely due to information loss. Tissue type and microarray technology also influenced clustering results, with blood tissue datasets achieving better classification and higher cR indices compared to brain tissue datasets and cDNA datasets displaying better classification than Affymetrix datasets in general. Better classification performance was observed for k-means clustering and PCA compared to hierarchical clustering on a selected blood tissue dataset (cDNA and Affymetrix). This study performs a recapitulation and expands on the original study by examining the impact on classification performance of an additional proximity measure (Manhattan distance), exploring datasets from different microarray platforms, and analyzing datasets containing diverse tissue types.

About

Experiment Replication and Expansion – Clustering Cancer Gene Expression Data: A Comparative Study

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published