Skip to content

Commit

Permalink
Fleshed out the rationale for using LOWEST scaling across blocks.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed Dec 20, 2024
1 parent 68ff38d commit 2e80a2d
Showing 1 changed file with 10 additions and 6 deletions.
16 changes: 10 additions & 6 deletions include/scran_norm/center_size_factors.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,21 @@ enum class CenterBlockMode : char { PER_BLOCK, LOWEST };
*/
struct CenterSizeFactorsOptions {
/**
* Strategy for handling blocks in `compute_blocked()`.
* Strategy for handling blocks in `center_size_factors_blocked()`.
*
* With the `PER_BLOCK` strategy, size factors are scaled separately for each block so that they have a mean of 1 within each block.
* The scaled size factors are identical to those obtained by separate invocations of `center_size_factors()` on the size factors for each block.
* This can be desirable to ensure consistency with independent analyses of each block - otherwise, the centering would depend on the size factors across all blocks.
* This can be desirable to ensure consistency with independent analyses of each block - otherwise, the centering would depend on the size factors in other blocks.
* However, any systematic differences in the size factors between blocks are lost, i.e., systematic changes in coverage between blocks will not be normalized.
*
* With the `LOWEST` strategy, we compute the mean size factor for each block and we divide all size factors by the minimum mean.
* In effect, we normalize across blocks by downscale all blocks to match the coverage of the lowest-coverage block.
* This is useful for datasets with highly heterogeneous coverage of different blocks as it avoids egregious upscaling of low-coverage blocks.
* (By contrast, downscaling is always safe as it simply discards information across all blocks by shrinking log-fold changes towards zero at low expression.)
* With the `LOWEST` strategy, we compute the mean size factor for each block and we divide all size factors by the lowest mean.
* Here, our normalization strategy involves downscaling all blocks to match the coverage of the lowest-coverage block.
* This is useful for datasets with highly variable coverage between different blocks as it avoids egregious upscaling of low-coverage blocks.
* Specifically, strong upscaling allows the log-transformation to ignore any shrinkage from the pseudo-count.
* This is problematic as it inflates differences between cells at log-values derived from low counts, increasing noise and overstating log-fold changes.
* Downscaling is safer as it allows the pseudo-count to shrink the log-differences between cells towards zero at low counts,
* effectively sacrificing some information in the higher-coverage batches so that they can be compared to the low-coverage batches
* (which is preferable to exaggerating the informativeness of the latter for comparison to the former).
*/
CenterBlockMode block_mode = CenterBlockMode::LOWEST;

Expand Down

0 comments on commit 2e80a2d

Please sign in to comment.