Non-early T-cell precursor T-cell acute lymphoblastic leukemia (non-ETP ALL) annotation (SCPCP000003) #630

UTSouthwesternDSSR · 2024-07-18T21:18:12Z

UTSouthwesternDSSR
Jul 18, 2024

Proposed analysis

We plan to annotate cell types for the non-early T-cell precursor T-cell acute lymphoblastic leukemia (non-ETP ALL) samples (n=11) in SCPCP000003. Our analysis involves data cleanup for low quality cells and doublets, cell type annotation, and tumor cell identification.

Scientific goals

The goal of this analysis is to curate a validated cell type annotation for non-ETP ALL samples in the portal. Specifically, we aim to generate the following outcomes: (i) Lists of marker genes to identify cell types in non-ETP ALL; (ii) Identification of tumor cells from normal cells; (iii) Refined annotation of cell types among normal cells; (iv) Annotation of sub-groups among tumor cells, if applicable.

Methods or approach

We will start with the processed count matrices provided by ALSF.
We will remove doublets if there is any in each sample, using available tools like DoubletFinder.
We will first perform an automated cell type annotation using SingleR with publicly available references, including datasets from literatures. Then we will run PCA, clustering, and UMAP visualization, where we expect same cell types cluster together.
The annotation generated in step 3 will then be complemented with manually curated marker gene lists, using tools like enrichr or cellassign. In addition, we will use the cell surface protein expression from CITE-seq for annotation confirmation.
We will use CNV-based tools like CopyKat to classify tumor and normal cells. Those cells with absence of inferred copy number alterations will be annotated as normal cells.
Finally, we will merge the whole cohort (i.e. similar sample types) together, and perform PCA, clustering, and UMAP visualization. We expect normal cell type to cluster together, and tumor cells to be separated by inter-sample heterogeneity. We will fine-tune the cell type annotation, if any cell groups with other cell types.

Existing modules

This module is based on the existing annotation workflow of Ewing sarcoma samples in #292, with some modification. We expect to follow the same workflow as outlined in #628 and #629, whenever applicable.

Input data

This analysis will use processed count matrices in SingleCellExperiment object from SCPCP000003.

Scientific literature

The following datasets might be useful on curating marker gene lists and/or cell type references, based on our initial literature study:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635362/
https://www.nature.com/articles/s41598-023-39152-z
https://www.nature.com/articles/s41375-018-0127-8

Other details

This analysis will be performed on our local machine and HPC, and will be conducted in R. We plan to share the annotation within two months.

UTSouthwesternDSSR · 2024-07-23T20:50:05Z

UTSouthwesternDSSR
Jul 23, 2024
Author

I found out that the PATDFE (participant_id) sample has 1,857 cells (~15% of total cells), which contain at least 25% of mitochondrial genes. The mitochondrial genes are defined by any gene begins with 'MT-'. My understanding is that these cells could be the low-quality or dying cells. Is there any reason that we are keeping them? In addition, they are all labeled as "Unclassified cell" in both SingleR and CellAssign.

1 reply

sjspielman Jul 24, 2024
Maintainer

I've had a look at the processed library file for this sample (library ID SCPCL000076; sample ID SCPCS000090) to get a sense of what you're seeing here. Indeed, something seems to be up with the filtering (but it's being fixed soon!) -

By default, we use miQC to filter out low-quality cells in the processed ScPCA objects. In circumstances when the miQC model fails for whatever reason, we instead use a minimum gene cutoff to filter cells. It looks like miQC indeed failed on this dataset, so filtering was not performed as robustly as it could have been. These cells are labeled as "Unclassified cells" for reasons related to this failed filtering.

Generally speaking, I would also recommend looking in the associated QC and cell type reports for the files you're going to be analyzing since some of this information will be explained and contextualized in there as well, along with other QC metrics and results you might find useful to see all in one spot. You can get these reports using the --include-reports option when running the download-data.py script, or you can download them for samples of interest directly from the ScPCA Portal.

In particular as related to your question, the QC report provides many plots and metrics about the underlying data processing, including filtering. The automated methods (like miQC) that we use in the ScPCA pipeline tend to work well, but in a few cases they can fail and don't quite work as expected. We therefore provide visualizations in the QC report to help guide you and contextualize the results.

In the miQC plot in this report, for example, you can see that many more cells than anticipated are being kept, and you are likely correct that many more should have been filtered out. While generally we prefer that you use our processed objects, this is a case where you may have a solid rationale to perform additional filtering and remove cells that are likely dead or dying, based on MT reads.

All this said, we have actually very recently reprocessed this project (and other projects). Looking at the new version of this library's results, miQC did not fail on this library. As such, the filtering problems on this library should be fixed in the next data release - the very low-quality cells you are seeing in the object will not be present in the next data release, and cell annotations are largely available for all cells.

We plan to make this new data release very soon! When we make the release, we'll let you know via our announcements page, so please stay tuned there for when you can start using it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-early T-cell precursor T-cell acute lymphoblastic leukemia (non-ETP ALL) annotation (SCPCP000003) #630

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Non-early T-cell precursor T-cell acute lymphoblastic leukemia (non-ETP ALL) annotation (SCPCP000003) #630

UTSouthwesternDSSR Jul 18, 2024

Proposed analysis

Scientific goals

Methods or approach

Existing modules

Input data

Scientific literature

Other details

Replies: 1 comment · 1 reply

UTSouthwesternDSSR Jul 23, 2024 Author

sjspielman Jul 24, 2024 Maintainer

UTSouthwesternDSSR
Jul 18, 2024

Replies: 1 comment 1 reply

UTSouthwesternDSSR
Jul 23, 2024
Author

sjspielman Jul 24, 2024
Maintainer