We expect a marker_detection
HDF5 group at the root of the file, containing marker statistics for each cluster.
The group itself contains the parameters
and results
subgroups.
Definitions:
num_clusters
: number of clusters in the analysis. This is typically determined from thechoose_clustering
step.rna_available
: whether RNA data is available. This is typically determined from theinputs
step.rna_num_features
: number of features in the RNA data. This is typically determined from theinputs
step.adt_available
: whether ADT data is available. This is typically determined from theinputs
step.adt_num_features
: number of features in the ADT data. This is typically determined from theinputs
step.crispr_available
: whether CRISPR data is available. This is typically determined from theinputs
step.crispr_num_features
: number of features in the ADT data. This is typically determined from theinputs
step.
parameters
will contain:
lfc_threshold
: a scalar float specifying the log-fold change threshold used to compute effect sizes. This should be non-negative.compute_auc
: a scalar integer to be interpreted as a boolean, indicating whether AUCs were computed.
results
should contain per_cluster
, a group containing the marker results for each cluster.
Each child of per_cluster
corresponds to a cluster and is itself a group named after its cluster index (0 to num_clusters - 1
).
Each per_cluster/<cluster_index>
group contains further subgroups, one named after each modality:
- If
rna_available = true
, there should be an"RNA"
subgroup. - If
adt_available = true
, there should be an"ADT"
subgroup. - If
crispr_available = true
, there should be a"CRISPR"
subgroup.
Each modality-specific subgroup contains the statistics for that modality:
means
: a float dataset of length equal to the number of features for this modality (as determined from the relevant*_num_features
), containing the mean expression of each feature in the current cluster.detected
: a float dataset of length equal to the number of features, containing the proportion of cells with detected expression of each feature in the current cluster.lfc
: an group containing statistics for the log-fold changes from all pairwise comparisons involving the current cluster. This contains:min
: a float dataset of length equal to the number of features, containing the minimum log-fold change across all pairwise comparisons for each feature.mean
: a float dataset of length equal to the number of features, containing the mean log-fold change across all pairwise comparisons for each feature.min_rank
: a float dataset of length equal to the number of features, containing the minimum rank of the log-fold changes across all pairwise comparisons for each feature.
delta_detected
: same aslfc
, but for the delta-detected (i.e., difference in the percentage of detected expression).cohen
: same aslfc
, but for Cohen's d.auc
: same aslfc
, but for the AUCs. This may be omitted ifcompute_auc
is falsey.
Updated in version 3.0, with the following changes from the previous version:
- Added the log-FC threshold parameter.
- Allow AUCs to be optional, depending on whether they were computed.