Skip to content

Commit

Permalink
Minor edits to content #6)
Browse files Browse the repository at this point in the history
  • Loading branch information
jkanche authored Jul 17, 2024
1 parent dacc205 commit dafe033
Show file tree
Hide file tree
Showing 7 changed files with 45 additions and 27 deletions.
14 changes: 13 additions & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,27 @@ execute:

website:
title: "Bioc2024: Interoperability between R and Python using BiocPy"
description: "Explore the data structures and packages available in [BiocPy](https://github.com/biocpy), a project that aims to facilitate Bioconductor workflows in Python"
repo-url: https://github.com/BiocPy/BiocWorkshop2024
repo-actions: [issue]
favicon: ./assets/short.png
image: ./assets/full.png
back-to-top-navigation: true
sidebar:
# search: true
logo: ./assets/logo.png
contents:
- index.qmd
- section: "Tutorials"
contents:
- tutorials/genomic_ranges.qmd
- tutorials/annotate_cell_types.qmd
- tutorials/sessioninfo.qmd
tools:
- icon: github
href: https://github.com/BiocPy/
page-footer:
center:
- text: "(c) Jayaram Kancherla & Aaron Lun."

# navbar:
# left:
Expand Down
Binary file added assets/workshop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 7 additions & 5 deletions index.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Welcome
# Interoperability between R and Python using BiocPy

## Welcome

Welcome to our workshop on exploring the data structures and packages
available in [BiocPy](https://github.com/biocpy), a project that aims
Expand All @@ -14,15 +16,15 @@ in the same manner as in R/Bioconductor.
All packages in BiocPy are published to PyPI, and the code is open-source
on [GitHub](https://github.com/BiocPy).

![](./assets/workshop.png)


### Core contributors

- [Jayaram Kancherla](https://github.com/jkanche)
- [Aaron Lun](https://github.com/LTLA)

Always looking for more contributions from the community to improve our packages! Checkout the issues or discussion in our GitHub organization.

----
We are looking for more contributions from the community to improve our packages! If you are interested, please check out the issues or discussion in our GitHub organization.

## Other resources

Expand All @@ -31,4 +33,4 @@ Always looking for more contributions from the community to improve our packages

## Developer notes

This is a reproducible Quarto book with reusable snippets. To learn more about Quarto books visit <https://quarto.org/docs/books>. Check out [Session Info](./chapters/sessioninfo.qmd) for more information.
This is a reproducible Quarto book with reusable snippets. To learn more about Quarto books visit <https://quarto.org/docs/books>. Check out [sessioninfo](./tutorials/sessioninfo.qmd) for more information.
2 changes: 1 addition & 1 deletion rpackages.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ library(BiocManager)
BiocManager::install(
c("scRNAseq", "celldex", "SingleR", "scuttle", "reticulate",
"rmarkdown", "knitr", "downlit", "xml2", "ggplot2", "edgeR",
"AnnotationHub", "TxDb.Hsapiens.UCSC.hg38.refGene"))
"AnnotationHub"))
33 changes: 20 additions & 13 deletions tutorials/annotate_cell_types.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ This will install the `scRNAseq`, `celldex`, `SingleR`, packages from Bioconduct

:::

## Accessing and Exploring Single-Cell Datasets
## 1. Accessing and Exploring Single-Cell Datasets

Now that we have the necessary packages installed, let's explore the `scrnaseq` package and learn how to access public single-cell RNA-seq datasets. Datasets published to the `scrnaseq` package are decorated with metadata such as the study title, species, number of cells, etc., to facilitate discovery. Let's see how we can list and search for datasets.

### List All Datasets
### 1.1 List All Datasets

The `list_datasets()` function in Python or `surveyDatasets()` in R will display all available datasets published to the `scRNAseq` collection along with their metadata.

Expand All @@ -68,9 +68,9 @@ head(all_ds[, c("name", "title", "version")], 3)

:::

This R|Python code lists all available datasets in the `scrnaseq` package and displays their names, titles, and versions.
This lists all available datasets in the `scrnaseq` package and displays their names, titles, and versions.

### Search for Datasets
### 1.2 Search for Datasets

You can also search for datasets based on metadata using `search_datasets()` in Python or `searchDatasets()` in R. This supports both simple text queries and complex boolean expressions.

Expand All @@ -94,7 +94,7 @@ head(pancreas_ds[, c("name", "title", "version")], 3)

This R|Python code searches for datasets containing the term "pancreas" and displays their names, titles, and versions.

#### Advanced Searches
#### 1.2.1 Advanced Searches

For more complex searches involving boolean operations, use `define_text_query()` in Python or `defineTextQuery()` in R. Here's an example to find datasets using the mouse reference genome (`GRCm38`) and containing the words `neuro` or `pancrea`.

Expand Down Expand Up @@ -131,13 +131,13 @@ head(res[,c("name", "title", "version")], 3)
```
:::

This R|Python code performs a complex search to find datasets tagged as "mouse" in the reference genome field and containing the keywords "neuro" or "pancrea".
This performs a complex search to find datasets tagged as "mouse" in the reference genome field and containing the keywords "neuro" or "pancrea".

::: {.callout-important}
Once a dataset is identified, always list the name and version of the dataset in your scripts for reproducibility.
:::

## Download dataset
## 2. Download dataset

After identifying a dataset of interest, use `fetch_dataset()` in Python or `fetchDataset()` in R to download the dataset. This will load the dataset as a `SingleCellExperiment` object.

Expand Down Expand Up @@ -165,7 +165,7 @@ sce

:::

### Side-quest on `SingleCellExperiment` in Python
### 2.1 Side-quest on `SingleCellExperiment` in Python

The Python implementation of the `SingleCellExperiment` class adheres to Bioconductor's specification and offers similar interface and methods. Our goal is to make it simple for analysts to switch between R and Python. A key difference is the shift from functional to an object-oriented paradigm.

Expand Down Expand Up @@ -216,13 +216,13 @@ print("coerce to AnnData: ", sce.to_anndata())

:::

## Annotate Cell Types
## 3. Annotate Cell Types

We can now annotate cell types by using reference datasets and matching cells based on their expression profiles. In this tutorial, we will use [singleR](https://github.com/SingleR-inc/SingleR) in R or its Python equivalent [singler](https://github.com/BiocPy/singler).

Before running the `singler` algorithm, we need to download an appropriate reference dataset from the `celldex` package.

### Access Reference Datasets from `celldex`
### 3.1 Access Reference Datasets from `celldex`

Similar to the `scRNAseq` package, the `celldex` package provides access to the collection of reference expression datasets with curated cell type labels, for use in procedures like automated annotation of single-cell data or deconvolution of bulk RNA-seq to reference datasets. These datasets are also stored in language-agnostic representations for use in downstream analyses.

Expand Down Expand Up @@ -281,7 +281,7 @@ table(cell_labels$labels)

:::

## Analyze Single-cell RNA-seq datasets
## 4. Analyze Single-cell RNA-seq datasets

![single-cell-methods](../assets/single-cell-space.jpg)

Expand All @@ -299,7 +299,7 @@ results = scranpy.analyze_sce(sce)
print(results.tsne)
```

### Seems like magic?
### 4.1 Seems like magic?

Running the `analyze_sce()` function uses the default parameters to run the single-cell workflow. If you want to customize or want to have fine-grained control on the analysis steps, set the parameter `dry_run=True`.

Expand All @@ -315,7 +315,7 @@ print(scranpy.analyze_sce(sce, dry_run=True))
Users can also run individual steps from the analysis without having to perform the full analysis, e.g. compute log normalized counts or find markers, etc.
:::

## Visualize Results
## 5. Visualize Results

I can't have a tutorial without a section on visualization or figures.

Expand Down Expand Up @@ -346,6 +346,13 @@ During the QC step, some cells were filtered, hence we filter the matches and th
We'll leave this as an exercise for the reader to change the order of steps: 1) run the dataset through the QC step 2) filter cells, and then 3) annotate using singleR.
:::

## 6. Exercises

1. Share or Upload your datasets to scrna-seq, Instructions to upload are available in their respective [R/Bioc](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html) and [Python](https://github.com/BiocPy/scrnaseq) packages.
2. Explore top markers for each cluster identified by scranpy.
3. Perform multi-modal analysis (scranpy supports RNA, ADT, CRISPR).
4. save your results and explore in [Kana](https://github.com/kanaverse/kana).

Congratulations! You have now completed the tutorial on accessing single-cell datasets using `scRNAseq` and `ArtifactDB`, and annotating cell types using reference datasets from `celldex`. For more detailed usage and advanced analyses, refer to the respective documentation of these packages.

By integrating R and Python workflows, you can leverage the strengths of both languages and perform comprehensive single-cell analysis. Keep exploring and happy analyzing!
9 changes: 3 additions & 6 deletions tutorials/genomic_ranges.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,7 @@ BiocManager::install(c("AnnotationHub"),

## 1. Save Annotations as RDS

Let's download the human reference genome and save the exon positions grouped by transcripts.
We need to do a bit of pre-processing to get this information.

For the purpose of the tutorial, we'll limit the exons to chromosome 22.
Let's download the human reference genome and save the exon positions grouped by transcripts. For the purpose of the tutorial, we'll limit the exons to chromosome 22.

::: {.panel-tabset}

Expand Down Expand Up @@ -183,7 +180,7 @@ print(promoters)
```

:::{.callout-note}
Please be aware that because gene symbols may not be unique, this GenomicRanges object might contain duplicates. You might want to resolve duplicate symbols by making the symbols unique. We will leave this as an exercise for the reader.
Please be aware that because gene symbols may not be unique, this `GenomicRanges` object might contain duplicates. You might want to resolve duplicate symbols by making the symbols unique. We will leave this as an exercise for the reader.
:::

:::
Expand Down Expand Up @@ -311,7 +308,7 @@ print(f"Percentage of peaks overlapping with exons: {percent_overlapping:.2f}%")

:::

This analysis can provide insights into whether the protein of interest (captured by the ChIP-seq) tends to bind within gene bodies, potentially influencing gene expression, splicing, or other co-transcriptional processes.
This analysis can provide insights into whether the protein of interest (captured by the ChIP-seq: "EZH2") tends to bind within gene bodies, potentially influencing gene expression, splicing, or other co-transcriptional processes.

## 4. Advanced Operations

Expand Down
2 changes: 1 addition & 1 deletion tutorials/sessioninfo.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Session Info! {.unnumbered}
# Session Info {.unnumbered}

The code base for this repository is available at [https://github.com/BiocPy/tutorial](https://github.com/BiocPy/tutorial).

Expand Down

0 comments on commit dafe033

Please sign in to comment.