diff --git a/cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.RData b/cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.RData deleted file mode 100644 index d4e4e71..0000000 Binary files a/cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.RData and /dev/null differ diff --git a/cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.RData b/cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.RData new file mode 100644 index 0000000..ef1f0c5 Binary files /dev/null and b/cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.RData differ diff --git a/cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.rdb b/cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.rdb similarity index 100% rename from cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.rdb rename to cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.rdb diff --git a/cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.rdx b/cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.rdx similarity index 100% rename from cache/unnamed-chunk-12_f33932eecda61abc5403607ab86d8040.rdx rename to cache/unnamed-chunk-12_ff65ff8596c60eb7d0bdd5aeedfe9718.rdx diff --git a/cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.RData b/cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.RData similarity index 95% rename from cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.RData rename to cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.RData index 86e1fe6..6869650 100644 Binary files a/cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.RData and b/cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.RData differ diff --git a/cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.rdb b/cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.rdb similarity index 100% rename from cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.rdb rename to cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.rdb diff --git a/cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.rdx b/cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.rdx similarity index 100% rename from cache/unnamed-chunk-19_7c3ec703dee5d30ad3a69d7268a9ea30.rdx rename to cache/unnamed-chunk-19_b12335fcd30dae61f9fefec6176876c8.rdx diff --git a/cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.RData b/cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.RData new file mode 100644 index 0000000..23da18f Binary files /dev/null and b/cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.RData differ diff --git a/cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.rdb b/cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.rdb similarity index 100% rename from cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.rdb rename to cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.rdb diff --git a/cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.rdx b/cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.rdx similarity index 100% rename from cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.rdx rename to cache/unnamed-chunk-21_81720bcf9caf24fb5eaa8c1c7e296ee8.rdx diff --git a/cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.RData b/cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.RData deleted file mode 100644 index 1feebd5..0000000 Binary files a/cache/unnamed-chunk-21_e3e49bf8cb680c8be603eec8724ca55b.RData and /dev/null differ diff --git a/hca.md b/hca.md index 74298ba..34b63e9 100644 --- a/hca.md +++ b/hca.md @@ -125,7 +125,7 @@ $ sample_id_db "0c1d320a7d0cbbc281a535912722d272", $ `_sample_name` "BPH340PrSF_Via___transition zone of… ``` -## A note on the piping operator +## A note on the pipe operator The vignette materials provided by `CuratedAtlasQueryR` show the use of the 'native' R pipe (implemented after R version `4.1.0`). For those not familiar @@ -253,9 +253,9 @@ For the sake of demonstration, we'll focus this small subset of samples: sample_subset = metadata |> filter( ethnicity == "African" & - stringr::str_like(assay, "%10x%") & + grepl("10x", assay) & tissue == "lung parenchyma" & - stringr::str_like(cell_type, "%CD4%") + grepl("CD4", cell_type) ) ``` @@ -367,7 +367,7 @@ single_cell_counts |> saveHDF5SummarizedExperiment("single_cell_counts") :::::::::::::::::::::::::::::::::: challenge -#### Exercise 1 +#### Exercise 1: Basic counting + piping Use `count` and `arrange` to get the number of cells per tissue in descending order. @@ -386,11 +386,11 @@ metadata |> :::::::::::::::::::::::::::::::::: challenge -#### Exercise 2 +#### Exercise 2: Tissue & type counting -Use `dplyr`-isms to group by `tissue` and `cell_type` and get a tally of the -highest number of cell types per tissue combination. What tissue has the most -numerous type of cells? +`count()` can group by multiple factors by simply adding another grouping column +as an additional argument. Get a tally of the highest number of cell types per +tissue combination. What tissue has the most numerous type of cells? :::::::::::::: solution @@ -398,7 +398,8 @@ numerous type of cells? ``` r metadata |> count(tissue, cell_type) |> - arrange(-n) + arrange(-n) |> + head(n = 1) ``` ::::::::::::::::::::::: @@ -406,7 +407,7 @@ metadata |> :::::::::::::::::::::::::::::::::: challenge -#### Exercise 3 +#### Exercise 3: Comparing metadata categories Spot some differences between the `tissue` and `tissue_harmonised` columns. Use `count` to summarise. @@ -423,6 +424,10 @@ metadata |> count(tissue_harmonised) |> arrange(-n) ``` + +For example you can see that `tissue_harmonised` merges the `cortex of kidney` +and `kidney` groups in `tissue`. + To see the full list of curated columns in the metadata, see the Details section in the `?get_metadata` documentation page. @@ -432,7 +437,7 @@ in the `?get_metadata` documentation page. :::::::::::::::::::::::::::::::::: challenge -#### Exercise 4 +#### Exercise 4: Highly specific cell groups Now that we are a little familiar with navigating the metadata, let's obtain a `SingleCellExperiment` of 10X scRNA-seq counts of `cd8 tem` `lung` cells for @@ -447,7 +452,7 @@ metadata |> filter( sex == "female" & age_days > 80 * 365 & - stringr::str_like(assay, "%10x%") & + grepl("10x", assay) & disease == "COVID-19" & tissue_harmonised == "lung" & cell_type_harmonised == "cd8 tem" @@ -469,6 +474,8 @@ reducedDimNames(0): mainExpName: NULL altExpNames(0): ``` + +You can see we don't get very many cells given the strict set of conditions we used. ::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::::::: diff --git a/md5sum.txt b/md5sum.txt index 94030b4..c01b248 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -9,7 +9,7 @@ "episodes/cell_type_annotation.Rmd" "dc23fda097f772bec1b7172277298221" "site/built/cell_type_annotation.md" "2024-09-30" "episodes/multi-sample.Rmd" "2d38d9903358ea8a8067abd82a1f1f54" "site/built/multi-sample.md" "2024-09-24" "episodes/large_data.Rmd" "b9710492c6792ea435778c4e42f27e02" "site/built/large_data.md" "2024-09-24" -"episodes/hca.Rmd" "e01d3fd1e07f158bed08b72d657ae1d1" "site/built/hca.md" "2024-09-24" +"episodes/hca.Rmd" "20f753a47fcae8ed5d0631fbc582f549" "site/built/hca.md" "2024-09-30" "instructors/instructor-notes.md" "205339793f625a1844a768dea8e4a9c8" "site/built/instructor-notes.md" "2024-09-24" "learners/reference.md" "40fc1d0be2412d2d9d434a5bc84e4de8" "site/built/reference.md" "2024-09-24" "learners/setup.md" "25772142a26fe3c0cebbe650f5683269" "site/built/setup.md" "2024-09-24"