Skip to content

Latest commit

 

History

History
60 lines (46 loc) · 3.31 KB

File metadata and controls

60 lines (46 loc) · 3.31 KB

Overview

We expect a marker_detection HDF5 group at the root of the file, containing marker statistics for each cluster. The group itself contains the parameters and results subgroups.

Definitions:

  • num_clusters: number of clusters in the analysis. This is typically determined from the choose_clustering step.
  • rna_available: whether RNA data is available. This is typically determined from the inputs step.
  • rna_num_features: number of features in the RNA data. This is typically determined from the inputs step.
  • adt_available: whether ADT data is available. This is typically determined from the inputs step.
  • adt_num_features: number of features in the ADT data. This is typically determined from the inputs step.
  • crispr_available: whether CRISPR data is available. This is typically determined from the inputs step.
  • crispr_num_features: number of features in the ADT data. This is typically determined from the inputs step.

Parameters

parameters will contain:

  • lfc_threshold: a scalar float specifying the log-fold change threshold used to compute effect sizes. This should be non-negative.
  • compute_auc: a scalar integer to be interpreted as a boolean, indicating whether AUCs were computed.

Results

results should contain per_cluster, a group containing the marker results for each cluster. Each child of per_cluster corresponds to a cluster and is itself a group named after its cluster index (0 to num_clusters - 1). Each per_cluster/<cluster_index> group contains further subgroups, one named after each modality:

  • If rna_available = true, there should be an "RNA" subgroup.
  • If adt_available = true, there should be an "ADT" subgroup.
  • If crispr_available = true, there should be a "CRISPR" subgroup.

Each modality-specific subgroup contains the statistics for that modality:

  • means: a float dataset of length equal to the number of features for this modality (as determined from the relevant *_num_features), containing the mean expression of each feature in the current cluster.
  • detected: a float dataset of length equal to the number of features, containing the proportion of cells with detected expression of each feature in the current cluster.
  • lfc: an group containing statistics for the log-fold changes from all pairwise comparisons involving the current cluster. This contains:
    • min: a float dataset of length equal to the number of features, containing the minimum log-fold change across all pairwise comparisons for each feature.
    • mean: a float dataset of length equal to the number of features, containing the mean log-fold change across all pairwise comparisons for each feature.
    • min_rank: a float dataset of length equal to the number of features, containing the minimum rank of the log-fold changes across all pairwise comparisons for each feature.
  • delta_detected: same as lfc, but for the delta-detected (i.e., difference in the percentage of detected expression).
  • cohen: same as lfc, but for Cohen's d.
  • auc: same as lfc, but for the AUCs. This may be omitted if compute_auc is falsey.

History

Updated in version 3.0, with the following changes from the previous version:

  • Added the log-FC threshold parameter.
  • Allow AUCs to be optional, depending on whether they were computed.