From 7675dabcc156623ce0c00552a1fd696a1537b0db Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Mon, 30 Sep 2024 11:12:25 -0400 Subject: [PATCH] hca_refresh question titles --- episodes/hca.Rmd | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/episodes/hca.Rmd b/episodes/hca.Rmd index 1f000ac..a0487b3 100644 --- a/episodes/hca.Rmd +++ b/episodes/hca.Rmd @@ -110,7 +110,7 @@ metadata |> glimpse() ``` -## A note on the piping operator +## A note on the pipe operator The vignette materials provided by `CuratedAtlasQueryR` show the use of the 'native' R pipe (implemented after R version `4.1.0`). For those not familiar @@ -182,9 +182,9 @@ For the sake of demonstration, we'll focus this small subset of samples: sample_subset = metadata |> filter( ethnicity == "African" & - stringr::str_like(assay, "%10x%") & + grepl("10x", assay) & tissue == "lung parenchyma" & - stringr::str_like(cell_type, "%CD4%") + grepl("CD4", cell_type) ) ``` @@ -246,7 +246,7 @@ single_cell_counts |> saveHDF5SummarizedExperiment("single_cell_counts") :::::::::::::::::::::::::::::::::: challenge -#### Exercise 1 +#### Exercise 1: Basic counting + piping Use `count` and `arrange` to get the number of cells per tissue in descending order. @@ -264,18 +264,20 @@ metadata |> :::::::::::::::::::::::::::::::::: challenge -#### Exercise 2 +#### Exercise 2: Tissue & type counting -Use `dplyr`-isms to group by `tissue` and `cell_type` and get a tally of the -highest number of cell types per tissue combination. What tissue has the most -numerous type of cells? +`count()` can group by multiple factors by simply adding another grouping column +as an additional argument. Get a tally of the highest number of cell types per +tissue combination. What tissue has the most numerous type of cells? :::::::::::::: solution ```{r,eval=FALSE} metadata |> count(tissue, cell_type) |> - arrange(-n) + arrange(-n) |> + head(n = 1) + ``` ::::::::::::::::::::::: @@ -283,7 +285,7 @@ metadata |> :::::::::::::::::::::::::::::::::: challenge -#### Exercise 3 +#### Exercise 3: Comparing metadata categories Spot some differences between the `tissue` and `tissue_harmonised` columns. Use `count` to summarise. @@ -299,6 +301,10 @@ metadata |> count(tissue_harmonised) |> arrange(-n) ``` + +For example you can see that `tissue_harmonised` merges the `cortex of kidney` +and `kidney` groups in `tissue`. + To see the full list of curated columns in the metadata, see the Details section in the `?get_metadata` documentation page. @@ -308,7 +314,7 @@ in the `?get_metadata` documentation page. :::::::::::::::::::::::::::::::::: challenge -#### Exercise 4 +#### Exercise 4: Highly specific cell groups Now that we are a little familiar with navigating the metadata, let's obtain a `SingleCellExperiment` of 10X scRNA-seq counts of `cd8 tem` `lung` cells for @@ -322,13 +328,15 @@ metadata |> filter( sex == "female" & age_days > 80 * 365 & - stringr::str_like(assay, "%10x%") & + grepl("10x", assay) & disease == "COVID-19" & tissue_harmonised == "lung" & cell_type_harmonised == "cd8 tem" ) |> get_single_cell_experiment() ``` + +You can see we don't get very many cells given the strict set of conditions we used. ::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::