From d635f6dfde4364b282f129e83b1c4abba192ff67 Mon Sep 17 00:00:00 2001
From: csmagnano <chrismagnano@gmail.com>
Date: Wed, 21 Feb 2024 10:27:51 -0500
Subject: [PATCH] Scaling back - removing HCA lesson temporarily

---
 config.yaml      |   1 -
 episodes/hca.Rmd | 395 -----------------------------------------------
 2 files changed, 396 deletions(-)
 delete mode 100644 episodes/hca.Rmd

diff --git a/config.yaml b/config.yaml
index 78dfe50..65d6cd1 100644
--- a/config.yaml
+++ b/config.yaml
@@ -65,7 +65,6 @@ episodes:
 - cell_type_annotation.Rmd
 - multi-sample.Rmd
 - large_data.Rmd
-- hca.Rmd
 
 # Information for Learners
 learners:
diff --git a/episodes/hca.Rmd b/episodes/hca.Rmd
deleted file mode 100644
index 1b8e990..0000000
--- a/episodes/hca.Rmd
+++ /dev/null
@@ -1,395 +0,0 @@
----
-title: Accessing data from the Human Cell Atlas (HCA)
-teaching: 10 # Minutes of teaching in the lesson
-exercises: 2 # Minutes of exercises in the lesson
----
-
-:::::::::::::::::::::::::::::::::::::: questions 
-
-- TODO
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::: objectives
-
-- TODO
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-```{r load-styles, include=FALSE}
-library(BiocStyle)
-```
-
-# HCA Project
-
-The Human Cell Atlas (HCA) is a large project that aims to learn from and map
-every cell type in the human body. The project extracts spatial and molecular
-characteristics in order to understand cellular function and networks. It is an
-international collaborative that charts healthy cells in the human body at all
-ages. There are about 37.2 trillion cells in the human body. To read more about
-the project, head over to their website at https://www.humancellatlas.org.
-
-# CELLxGENE
-
-CELLxGENE is a database and a suite of tools that help scientists to find,
-download, explore, analyze, annotate, and publish single cell data. It includes
-several analytic and visualization tools to help you to discover single cell
-data patterns. To see the list of tools, browse to
-https://cellxgene.cziscience.com/.
-
-# CELLxGENE | Census
-
-The Census provides efficient computational tooling to access, query, and
-analyze all single-cell RNA data from CZ CELLxGENE Discover. Using a new access
-paradigm of cell-based slicing and querying, you can interact with the data
-through TileDB-SOMA, or get slices in AnnData or Seurat objects, thus
-accelerating your research by significantly minimizing data harmonization at
-https://chanzuckerberg.github.io/cellxgene-census/.
-
-# The CuratedAtlasQueryR Project
-
-To systematically characterize the immune system across tissues, demographics
-and multiple studies, single cell transcriptomics data was harmonized from the
-CELLxGENE database. Data from 28,975,366 cells that cover 156 tissues (excluding
-cell cultures), 12,981 samples, and 324 studies were collected. The metadata was
-standardized, including sample identifiers, tissue labels (based on anatomy) and
-age. Also, the gene-transcript abundance of all samples was harmonized by
-putting values on the positive natural scale (i.e. non-logarithmic).
-
-To model the immune system across studies, we adopted a consistent immune
-cell-type ontology appropriate for lymphoid and non-lymphoid tissues. We applied
-a consensus cell labeling strategy between the Seurat blueprint and Monaco
-[-@Monaco2019] to minimize biases in immune cell classification from
-study-specific standards.
-
-`CuratedAtlasQueryR` supports data access and programmatic exploration of the
-harmonized atlas. Cells of interest can be selected based on ontology, tissue of
-origin, demographics, and disease. For example, the user can select CD4 T helper
-cells across healthy and diseased lymphoid tissue. The data for the selected
-cells can be downloaded locally into popular single-cell data containers. Pseudo
-bulk counts are also available to facilitate large-scale, summary analyses of
-transcriptional profiles. This platform offers a standardized workflow for
-accessing atlas-level datasets programmatically and reproducibly.
-
-```{r,echo=FALSE}
-knitr::include_graphics(
-    "figures/HCA_sccomp_SUPPLEMENTARY_technical_cartoon_curatedAtlasQuery.png"
-)
-```
-
-# Data Sources in R / Bioconductor
-
-There are a few options to access single cell data with R / Bioconductor.
-
-| Package | Target | Description |
-|---------|-------------|---------|
-| [hca](https://bioconductor.org/packages/hca) | [HCA Data Portal API](https://www.humancellatlas.org/data-portal/) | Project, Sample, and File level HCA data |
-| [cellxgenedp](https://bioconductor.org/packages/cellxgenedp) | [CellxGene](https://cellxgene.cziscience.com/) | Human and mouse SC data including HCA |
-| [CuratedAtlasQueryR](https://stemangiola.github.io/CuratedAtlasQueryR/) | [CellxGene](https://cellxgene.cziscience.com/) | fine-grained query capable CELLxGENE data including HCA |
-
-# Installation
-
-```{r,eval=FALSE}
-if (!requireNamespace("BiocManager", quietly = TRUE))
-    install.packages("BiocManager")
-
-BiocManager::install("stemangiola/CuratedAtlasQueryR")
-```
-
-# Package load 
-
-```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE}
-library(CuratedAtlasQueryR)
-library(dplyr)
-```
-
-# HCA Metadata
-
-The metadata allows the user to get a lay of the land of what is available
-via the package. In this example, we are using the sample database URL which
-allows us to get a small and quick subset of the available metadata.
-
-```{r}
-metadata <- get_metadata(remote_url = CuratedAtlasQueryR::SAMPLE_DATABASE_URL)
-```
-
-Get a view of the first 10 columns in the metadata with `glimpse`
-
-```{r}
-metadata |>
-  select(1:10) |>
-  glimpse()
-```
-
-# A note on the piping operator
-
-The vignette materials provided by `CuratedAtlasQueryR` show the use of the
-'native' R pipe (implemented after R version `4.1.0`). For those not familiar
-with the pipe operator (`|>`), it allows you to chain functions by passing the
-left-hand side (LHS) to the first input (typically) on the right-hand side
-(RHS). 
-
-In this example, we are extracting the `iris` data set from the `datasets`
-package and 'then' taking a subset where the sepal lengths are greater than 5
-and 'then' summarizing the data for each level in the `Species` variable with a
-`mean`. The pipe operator can be read as 'then'.
-
-```{r}
-data("iris", package = "datasets")
-
-iris |>
-  subset(Sepal.Length > 5) |>
-  aggregate(. ~ Species, data = _, mean)
-```
-
-# Summarizing the metadata
-
-For each distinct tissue and dataset combination, count the number of datasets
-by tissue type. 
-
-```{r}
-metadata |>
-  distinct(tissue, dataset_id) |> 
-  count(tissue)
-```
-
-# Columns available in the metadata
-
-```{r}
-head(names(metadata), 10)
-```
-
-# Available assays
-
-```{r}
-metadata |>
-    distinct(assay, dataset_id) |>
-    count(assay)
-```
-
-# Available organisms
-
-```{r}
-metadata |>
-    distinct(organism, dataset_id) |>
-    count(organism)
-```
-
-## Download single-cell RNA sequencing counts 
-
-The data can be provided as either "counts" or counts per million "cpm" as given
-by the `assays` argument in the `get_single_cell_experiment()` function. By
-default, the `SingleCellExperiment` provided will contain only the 'counts'
-data.
-
-### Query raw counts
-
-```{r}
-single_cell_counts <- 
-    metadata |>
-    dplyr::filter(
-        ethnicity == "African" &
-        stringr::str_like(assay, "%10x%") &
-        tissue == "lung parenchyma" &
-        stringr::str_like(cell_type, "%CD4%")
-    ) |>
-    get_single_cell_experiment()
-
-single_cell_counts
-```
-
-### Query counts scaled per million
-
-This is helpful if just few genes are of interest, as they can be compared
-across samples.
-
-```{r}
-metadata |>
-  dplyr::filter(
-      ethnicity == "African" &
-      stringr::str_like(assay, "%10x%") &
-      tissue == "lung parenchyma" &
-      stringr::str_like(cell_type, "%CD4%")
-  ) |>
-  get_single_cell_experiment(assays = "cpm")
-```
-
-### Extract only a subset of genes
-
-```{r}
-single_cell_counts <-
-    metadata |>
-    dplyr::filter(
-        ethnicity == "African" &
-        stringr::str_like(assay, "%10x%") &
-        tissue == "lung parenchyma" &
-        stringr::str_like(cell_type, "%CD4%")
-    ) |>
-    get_single_cell_experiment(assays = "cpm", features = "PUM1")
-
-single_cell_counts
-```
-
-### Extracting counts as a Seurat object
-
-If needed, the H5 `SingleCellExperiment` can be converted into a Seurat object.
-Note that it may take a long time and use a lot of memory depending on how many
-cells you are requesting.
-
-```{r,eval=FALSE}
-single_cell_counts <-
-    metadata |>
-    dplyr::filter(
-        ethnicity == "African" &
-        stringr::str_like(assay, "%10x%") &
-        tissue == "lung parenchyma" &
-        stringr::str_like(cell_type, "%CD4%")
-    ) |>
-    get_seurat()
-
-single_cell_counts
-```
-
-## Save your `SingleCellExperiment`
-
-### Saving as HDF5 
-
-The recommended way of saving these `SingleCellExperiment` objects, if
-necessary, is to use `saveHDF5SummarizedExperiment` from the `HDF5Array`
-package.
-
-```{r, eval=FALSE}
-single_cell_counts |> saveHDF5SummarizedExperiment("single_cell_counts")
-```
-
-# Exercises
-
-
-
-
-
-
-
-
-:::::::::::::::::::::::::::::::::: challenge
-
-#### Exercise 1
-
-Use `count` and `arrange` to get the number of cells per tissue in descending
-order.
-
-:::::::::::::: solution
-
-```{r,eval=FALSE}
-metadata |>
-    count(tissue) |>
-    arrange(-n)
-```
-
-:::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::::::: challenge
-
-#### Exercise 2
-
-Use `dplyr`-isms to group by `tissue` and `cell_type` and get a tally of the
-highest number of cell types per tissue combination. What tissue has the most
-numerous type of cells?
-
-:::::::::::::: solution
-
-```{r,eval=FALSE}
-metadata |>
-    group_by(tissue, cell_type) |>
-    count() |>
-    arrange(-n)
-```
-:::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::::::: challenge
-
-#### Exercise 3
-
-Spot some differences between the `tissue` and `tissue_harmonised` columns.
-Use `count` to summarise.
-
-:::::::::::::: solution
-
-```{r}
-metadata |>
-    count(tissue) |>
-    arrange(-n)
-
-metadata |>
-    count(tissue_harmonised) |>
-    arrange(-n)
-```
-
-:::::::::::::::::::::::
-
-:::::::::::::::::::::::
-
-::: callout
-
-To see the full list of curated columns in the metadata, see the Details section
-in the `?get_metadata` documentation page.
-
-:::
-
-
-:::::::::::::::::::::::::::::::::: challenge
-
-#### Exercise 4
-
-Spot some differences between the `tissue` and `tissue_harmonised` columns.
-Now that we are a little familiar with navigating the metadata, let's obtain
-a `SingleCellExperiment` of 10X scRNA-seq counts of `cd8 tem` `lung` cells for
-females older than `80` with `COVID-19`. Note: Use the harmonized columns, where
-possible.
-
-:::::::::::::: solution
-
-```{r}
-metadata |> 
-    dplyr::filter(
-        sex == "female" &
-        age_days > 80 * 365 &
-        stringr::str_like(assay, "%10x%") &
-        disease == "COVID-19" &  
-        tissue_harmonised == "lung" & 
-        cell_type_harmonised == "cd8 tem"
-    ) |>
-    get_single_cell_experiment()
-```
-
-:::::::::::::::::::::::
-
-:::::::::::::::::::::::
-
-
-::::::::::::::::::::::::::::::::::::: keypoints 
-
-- TODO
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-# Session Info
-
-```{r}
-sessionInfo()
-```
-
-# Acknowledgements
-
-Thank you to [Stefano Mangiola](https://github.com/stemangiola) and his team for
-developing
-[CuratedAtlasQueryR](https://github.com/stemangiola/CuratedAtlasQueryR) and
-graciously providing the content from their vignette. Make sure to keep an eye
-out for their publication for proper citation. Their bioRxiv paper can be found
-at <https://www.biorxiv.org/content/10.1101/2023.06.08.542671v1>.
-
-# References