Skip to content

Commit

Permalink
setup python reqs and finish section on workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
jkanche committed Feb 21, 2024
1 parent e1ed8b1 commit ff3578a
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 7 deletions.
88 changes: 82 additions & 6 deletions chapters/workflow.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,96 @@
engine: knitr
---

# Interchange data between Python and R

In this section, we will demonstrate a workflow that stores genomic data in language agnostic formats to provide easy access to datasets and analysis results across multiple programming frameworks such as R and Python. This functionality is provided by the [ArtifactDB](https://github.com/artifactdb) framework.

To get started, we will download the zilionis lung dataset from the [scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html) package. Then, we will store this dataset in language agnostic format using the [alabaster suite](https://github.com/ArtifactDB/alabaster.base) of R packages.

```{r}
library(scRNAseq)
library(alabaster)
sce <- ZilionisLungData()
library(alabaster)
# dir_path <- paste(getwd(), "datasets", sep="/")
saveObject(sce, path=paste(getwd(), "zilinoislung", sep="/"))
```

:::{.callout-note}
You can also save this dataset as an RDS object and access it in Python. Check out the [interop with R](./interop.qmd) section for more details.
:::

We can now load this dataset in Python using the [dolomite suite](https://github.com/ArtifactDB/dolomite-base) of Python packages. Both dolomite and alabaster are part of the ArtifactDB ecosystem to read artifacts stored in language agnostic formats.

```{python}
from dolomite_base import read_object
obj = read_object("./zilinoislung")
print(obj)
```
data = read_object("./zilinoislung")
print(data)
```

To illustrate this workflow, we will use the [CellTypist](https://github.com/Teichlab/celltypist) model to annotate cell types for this dataset. CellTypist works on an `AnnData` representation.

```{python}
adata = data.to_anndata()
```

Before we annotate, lets download the human lung atlas model from celltypist.

```{python}
import celltypist
from celltypist import models
models.download_models()
model_name = "Human_Lung_Atlas.pkl"
model = models.Model.load(model = model_namel)
print(model)
```

Now lets annotate our dataset

```{python}
predictions = celltypist.annotate(adata, model = model_name, majority_voting = True)
print(predictions.predicted_labels)
```

:::{.callout-note}
The celltypist workflow is based on the tutorial described [here](https://colab.research.google.com/github/Teichlab/celltypist/blob/main/docs/notebook/celltypist_tutorial.ipynb#scrollTo=postal-chicken).
:::

Now lets get the `AnnData` object with the predicted labels embedded into the `obs` dataframe.

```{python}
adata = predictions.to_adata()
```

We can now reverse the workflow and save this object into an Artifactdb format from Python. But first the object needs to be converted to a `SingleCellExperiment` class.

```{python}
from singlecellexperiment import SingleCellExperiment
sce = SingleCellExperiment.from_anndata(adata)
print(sce)
```

We now use the dolomite package to save it into language agnostic format.

```{python}
import dolomite_base
dolomite_base.save_object(df, "./zilinoislung_with_celltypist")
```

Finally read the object back in R.

```{r}
sce_with_celltypist = readObject(path=paste(getwd(), "zilinoislung_with_celltypist", sep="/"))
sce_with_celltypist
```

and that's it. Using these two generics read: `readObject`(R), `read_object`(Python), and save: `saveObject`(R), `save_object`(Python), you can save most Bioconductor objects into language agnostic formats.

----

## Further reading

- ArtifactDB GitHub organization - https://github.com/ArtifactDB.
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,6 @@ mudata
delayedarray[dask]
joblib
dolomite
hdf5array
hdf5array
celltypist
rpy2

0 comments on commit ff3578a

Please sign in to comment.