Skip to content

Latest commit

 

History

History
61 lines (47 loc) · 3.26 KB

File metadata and controls

61 lines (47 loc) · 3.26 KB

Overview

We expect a custom_selections HDF5 group at the root of the file, containing information about the custom selections. The group itself contains the parameters and results subgroups.

Definitions:

  • num_cells: number of cells remaining after QC filtering. This is typically determined from the cell_filtering step.
  • rna_available: whether RNA data is available. This is typically determined from the inputs step.
  • rna_num_features: number of features in the RNA data. This is typically determined from the inputs step.
  • adt_available: whether ADT data is available. This is typically determined from the inputs step.
  • adt_num_features: number of features in the ADT data. This is typically determined from the inputs step.
  • crispr_available: whether CRISPR data is available. This is typically determined from the inputs step.
  • crispr_num_features: number of features in the ADT data. This is typically determined from the inputs step.

Parameters

parameters should contain:

  • selections: a group defining the custom selections. Each child is named after a user-created selection. Each child is an integer dataset of arbitrary length containing the indices of the selected cells. Note that indices refer to the dataset after QC filtering and should be less than num_cells. Indices in each dataset should also be unique and sorted.
  • lfc_threshold: a scalar float specifying the log-fold change threshold used to compute effect sizes. This should be non-negative.
  • compute_auc: a scalar integer to be interpreted as a boolean, indicating whether AUCs were computed.

Results

results should contain per_selection, a group containing the marker results for each selection after a comparison to a group containing all other cells. Each child of per_selection corresponds to a selection and is itself a group named after that selection. Each per_selection/<selection_name> group contains further subgroups, one named after each modality:

  • If rna_available = true, there should be an "RNA" subgroup.
  • If adt_available = true, there should be an "ADT" subgroup.
  • If crispr_available = true, there should be a "CRISPR" subgroup.

Each modality-specific subgroup contains the statistics for that modality:

  • means: a float dataset of length equal to the number of features in that modality (as determined from the relevant *_num_features), containing the mean expression of each gene in the selection.
  • detected: a float dataset of length equal to the number of features in that modality, containing the proportion of cells with detected expression of each gene in the selection.
  • lfc: a float dataset of length equal to the number of features in that modality, containing the log-fold change in the selection compared to all other cells.
  • delta_detected: same as lfc, but for the delta-detected, i.e., difference in the percentage of detected expression.
  • cohen: same as lfc, but for Cohen's d.
  • auc: same as lfc, but for the AUCs. This may be omitted if compute_auc is falsey.

History

Updated in version 3.0, with the following changes from the previous version:

  • Added the log-FC threshold parameter.
  • Allow AUCs to be optional, depending on whether they were computed.