We expect a custom_selections
HDF5 group at the root of the file, containing information about the custom selections.
The group itself contains the parameters
and results
subgroups.
Definitions:
num_cells
: number of cells remaining after QC filtering. This is typically determined from thecell_filtering
step.rna_available
: whether RNA data is available. This is typically determined from theinputs
step.rna_num_features
: number of features in the RNA data. This is typically determined from theinputs
step.adt_available
: whether ADT data is available. This is typically determined from theinputs
step.adt_num_features
: number of features in the ADT data. This is typically determined from theinputs
step.crispr_available
: whether CRISPR data is available. This is typically determined from theinputs
step.crispr_num_features
: number of features in the ADT data. This is typically determined from theinputs
step.
parameters
should contain:
selections
: a group defining the custom selections. Each child is named after a user-created selection. Each child is an integer dataset of arbitrary length containing the indices of the selected cells. Note that indices refer to the dataset after QC filtering and should be less thannum_cells
. Indices in each dataset should also be unique and sorted.lfc_threshold
: a scalar float specifying the log-fold change threshold used to compute effect sizes. This should be non-negative.compute_auc
: a scalar integer to be interpreted as a boolean, indicating whether AUCs were computed.
results
should contain per_selection
, a group containing the marker results for each selection after a comparison to a group containing all other cells.
Each child of per_selection
corresponds to a selection and is itself a group named after that selection.
Each per_selection/<selection_name>
group contains further subgroups, one named after each modality:
- If
rna_available = true
, there should be an"RNA"
subgroup. - If
adt_available = true
, there should be an"ADT"
subgroup. - If
crispr_available = true
, there should be a"CRISPR"
subgroup.
Each modality-specific subgroup contains the statistics for that modality:
means
: a float dataset of length equal to the number of features in that modality (as determined from the relevant*_num_features
), containing the mean expression of each gene in the selection.detected
: a float dataset of length equal to the number of features in that modality, containing the proportion of cells with detected expression of each gene in the selection.lfc
: a float dataset of length equal to the number of features in that modality, containing the log-fold change in the selection compared to all other cells.delta_detected
: same aslfc
, but for the delta-detected, i.e., difference in the percentage of detected expression.cohen
: same aslfc
, but for Cohen's d.auc
: same aslfc
, but for the AUCs. This may be omitted ifcompute_auc
is falsey.
Updated in version 3.0, with the following changes from the previous version:
- Added the log-FC threshold parameter.
- Allow AUCs to be optional, depending on whether they were computed.