Skip to content

Datasets

Paul Hoffman edited this page Mar 20, 2020 · 1 revision

Dataset Packages

SeuratData uses R packages to bundle and distribute datasets in an R-native manner. By using R packages, neither SeuratData nor the user need to know where the data are stored. Instead, R itself handles the downloading and storage of datasets. This setup also allows the user to load a dataset from any directory, as R handles the loading of data itself as well.

Storing the Data

Datasets can be bundled with packages in two forms: a Seurat object loadable through the data mechanism or an h5Seurat file accessible with LoadData

Documentation and Citations

Documentation of datasets should be done using standard Roxygen syntax

Dataset packages should also include citation information in a CITATION file located at inst/CITATION. The original source of the dataset should be listed as the citation

Metadata and Other Information

Dataset metadata, not cell-level metadata provided in the Seurat object, is stored in the dataset's package DESCRIPTION.

Key Value
Package Name of package, should be name_of_dataset.SeuratData
Date Date package was built in YYYY-MM-DD format, used for versioning
Type Should be Package
Title Short description of dataset
Version Version of Seurat dataset was built under
Author or Authors@R Name(s) and contact information dataset package builders and maintainers
Description ...
License License of data, typically a Creative Commons license (eg. CC BY 4.0)
Encoding Character encoding used by package, typcially UTF-8
LazyData ...
RoxygenNote Version of Roxygen dataset documentation was generated with
Suggests Packages and package versions used to generate dataset, should include a version Seurat

The DESCRIPTION for the CBMC dataset provided by SeuratData is as follows

Package: cbmc.SeuratData
Date: 2019-07-17
Type: Package
Title: scRNAseq and 13-antibody sequencing of CBMCs
Version: 3.0.0
Authors@R: c(
    person(given = 'Satija', family = 'Lab', email = '[email protected]', role = c('aut', 'cre'))
    )
Description:
    species: human
    system: CBMC (cord blood)
    ncells: 8617
    tech: CITE-seq
    default.dataset: raw
License: CC BY 4.0
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
Suggests:
    Seurat (>= 3.0.0)
Clone this wiki locally