Skip to content

Commit

Permalink
hdf5 exercise
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewGhazi committed Aug 21, 2024
1 parent 045370a commit 85de456
Showing 1 changed file with 31 additions and 15 deletions.
46 changes: 31 additions & 15 deletions episodes/large_data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@ data.frame(approx_error = error) |>
geom_histogram()
```

It's almost never .001 in this case.
It's almost never more than .001 in this case.

:::

Expand Down Expand Up @@ -448,14 +448,14 @@ sobj

[Scanpy](https://scanpy.readthedocs.io) is a scalable toolkit for analyzing
single-cell gene expression data built jointly with
[anndata](https://anndata.readthedocs.io/). It includes
preprocessing, visualization, clustering, trajectory inference and differential
expression testing. The Python-based implementation efficiently deals with
datasets of more than one million cells. Scanpy is developed and maintained by
the [Theis lab]() and is released under a
[BSD-3-Clause license](https://github.com/scverse/scanpy/blob/master/LICENSE).
Scanpy is part of the [scverse](https://scverse.org/), a Python-based ecosystem
for single-cell omics data analysis.
[anndata](https://anndata.readthedocs.io/). It includes preprocessing,
visualization, clustering, trajectory inference and differential expression
testing. The Python-based implementation efficiently deals with datasets of more
than one million cells. Scanpy is developed and maintained by the [Theis lab]()
and is released under a [BSD-3-Clause
license](https://github.com/scverse/scanpy/blob/master/LICENSE). Scanpy is part
of the [scverse](https://scverse.org/), a Python-based ecosystem for single-cell
omics data analysis.

At the core of scanpy's single-cell functionality is the `anndata` data structure,
scanpy's integrated single-cell data container, which is conceptually very similar
Expand Down Expand Up @@ -486,7 +486,7 @@ We can also write a `SingleCellExperiment` to an H5AD file with the
chimera mouse gastrulation dataset.

```{r write-h5ad, message = FALSE}
out.file <- tempfile(pattern = ".h5ad")
out.file <- tempfile(fileext = ".h5ad")
writeH5AD(sce, file = out.file)
```

Expand All @@ -507,10 +507,10 @@ sessionInfo()

#### Exercise 1: Out of memory representation

Write the counts matrix of the wild-type chimera
mouse gastrulation dataset to an HDF5 file. Create another counts matrix that
reads the data from the HDF5 file. Compare memory usage of holding the entire
matrix in memory as opposed to holding the data out of memory.
Write the counts matrix of the wild-type chimera mouse gastrulation dataset to
an HDF5 file. Create another counts matrix that reads the data from the HDF5
file. Compare memory usage of holding the entire matrix in memory as opposed to
holding the data out of memory.

:::::::::::::: hint

Expand All @@ -521,7 +521,23 @@ function for writing to HDF5 from the `r Biocpkg("HDF5Array")` package.

:::::::::::::: solution

TODO
```{r}
wt_out = tempfile(fileext = ".h5")
wt_counts = counts(WTChimeraData())
writeHDF5Array(wt_counts,
name = "wt_counts",
file = wt_out)
oom_wt = HDF5Array(wt_out, "wt_counts")
object.size(wt_counts)
object.size(oom_wt)
```

:::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::::::::
Expand Down

0 comments on commit 85de456

Please sign in to comment.