Non-early T-cell precursor T-cell acute lymphoblastic leukemia (non-ETP ALL) annotation (SCPCP000003) #630
UTSouthwesternDSSR
started this conversation in
Propose a new analysis
Replies: 1 comment 1 reply
-
I found out that the PATDFE (participant_id) sample has 1,857 cells (~15% of total cells), which contain at least 25% of mitochondrial genes. The mitochondrial genes are defined by any gene begins with 'MT-'. My understanding is that these cells could be the low-quality or dying cells. Is there any reason that we are keeping them? In addition, they are all labeled as "Unclassified cell" in both SingleR and CellAssign. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Proposed analysis
We plan to annotate cell types for the non-early T-cell precursor T-cell acute lymphoblastic leukemia (non-ETP ALL) samples (n=11) in SCPCP000003. Our analysis involves data cleanup for low quality cells and doublets, cell type annotation, and tumor cell identification.
Scientific goals
The goal of this analysis is to curate a validated cell type annotation for non-ETP ALL samples in the portal. Specifically, we aim to generate the following outcomes: (i) Lists of marker genes to identify cell types in non-ETP ALL; (ii) Identification of tumor cells from normal cells; (iii) Refined annotation of cell types among normal cells; (iv) Annotation of sub-groups among tumor cells, if applicable.
Methods or approach
We will start with the processed count matrices provided by ALSF.
We will remove doublets if there is any in each sample, using available tools like
DoubletFinder
.We will first perform an automated cell type annotation using
SingleR
with publicly available references, including datasets from literatures. Then we will run PCA, clustering, and UMAP visualization, where we expect same cell types cluster together.The annotation generated in step 3 will then be complemented with manually curated marker gene lists, using tools like
enrichr
orcellassign
. In addition, we will use the cell surface protein expression from CITE-seq for annotation confirmation.We will use CNV-based tools like
CopyKat
to classify tumor and normal cells. Those cells with absence of inferred copy number alterations will be annotated as normal cells.Finally, we will merge the whole cohort (i.e. similar sample types) together, and perform PCA, clustering, and UMAP visualization. We expect normal cell type to cluster together, and tumor cells to be separated by inter-sample heterogeneity. We will fine-tune the cell type annotation, if any cell groups with other cell types.
Existing modules
This module is based on the existing annotation workflow of Ewing sarcoma samples in #292, with some modification. We expect to follow the same workflow as outlined in #628 and #629, whenever applicable.
Input data
This analysis will use processed count matrices in
SingleCellExperiment
object from SCPCP000003.Scientific literature
The following datasets might be useful on curating marker gene lists and/or cell type references, based on our initial literature study:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635362/
https://www.nature.com/articles/s41598-023-39152-z
https://www.nature.com/articles/s41375-018-0127-8
Other details
This analysis will be performed on our local machine and HPC, and will be conducted in R. We plan to share the annotation within two months.
Beta Was this translation helpful? Give feedback.
All reactions