Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hca_refresh question titles #45

Merged
merged 1 commit into from
Sep 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 20 additions & 12 deletions episodes/hca.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ metadata |>
glimpse()
```

## A note on the piping operator
## A note on the pipe operator

The vignette materials provided by `CuratedAtlasQueryR` show the use of the
'native' R pipe (implemented after R version `4.1.0`). For those not familiar
Expand Down Expand Up @@ -182,9 +182,9 @@ For the sake of demonstration, we'll focus this small subset of samples:
sample_subset = metadata |>
filter(
ethnicity == "African" &
stringr::str_like(assay, "%10x%") &
grepl("10x", assay) &
tissue == "lung parenchyma" &
stringr::str_like(cell_type, "%CD4%")
grepl("CD4", cell_type)
)
```

Expand Down Expand Up @@ -246,7 +246,7 @@ single_cell_counts |> saveHDF5SummarizedExperiment("single_cell_counts")

:::::::::::::::::::::::::::::::::: challenge

#### Exercise 1
#### Exercise 1: Basic counting + piping

Use `count` and `arrange` to get the number of cells per tissue in descending
order.
Expand All @@ -264,26 +264,28 @@ metadata |>

:::::::::::::::::::::::::::::::::: challenge

#### Exercise 2
#### Exercise 2: Tissue & type counting

Use `dplyr`-isms to group by `tissue` and `cell_type` and get a tally of the
highest number of cell types per tissue combination. What tissue has the most
numerous type of cells?
`count()` can group by multiple factors by simply adding another grouping column
as an additional argument. Get a tally of the highest number of cell types per
tissue combination. What tissue has the most numerous type of cells?

:::::::::::::: solution

```{r,eval=FALSE}
metadata |>
count(tissue, cell_type) |>
arrange(-n)
arrange(-n) |>
head(n = 1)

```
:::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::: challenge

#### Exercise 3
#### Exercise 3: Comparing metadata categories

Spot some differences between the `tissue` and `tissue_harmonised` columns.
Use `count` to summarise.
Expand All @@ -299,6 +301,10 @@ metadata |>
count(tissue_harmonised) |>
arrange(-n)
```

For example you can see that `tissue_harmonised` merges the `cortex of kidney`
and `kidney` groups in `tissue`.

To see the full list of curated columns in the metadata, see the Details section
in the `?get_metadata` documentation page.

Expand All @@ -308,7 +314,7 @@ in the `?get_metadata` documentation page.

:::::::::::::::::::::::::::::::::: challenge

#### Exercise 4
#### Exercise 4: Highly specific cell groups

Now that we are a little familiar with navigating the metadata, let's obtain
a `SingleCellExperiment` of 10X scRNA-seq counts of `cd8 tem` `lung` cells for
Expand All @@ -322,13 +328,15 @@ metadata |>
filter(
sex == "female" &
age_days > 80 * 365 &
stringr::str_like(assay, "%10x%") &
grepl("10x", assay) &
disease == "COVID-19" &
tissue_harmonised == "lung" &
cell_type_harmonised == "cd8 tem"
) |>
get_single_cell_experiment()
```

You can see we don't get very many cells given the strict set of conditions we used.
:::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::::::::
Expand Down
Loading