add alternative ssGSEA #258

GWMcElfresh · 2024-06-08T06:03:47Z

Hi all,

This is an alternative way to score pathways in transcriptomics data.

It's based on the Wasserstein statistic/earth mover's distance between ranks. This statistic computes how "perturbed" the pathway-ranks for a specific pathway is compared to the whole transcriptome. It is very straightforward to visualize and compute for whole transcriptome data (pseudobulk, spatial, bulk).

The distance is the integral between these two curves:

It is more difficult to visualize the distance in single cell RNASeq, as the distance is pretty dependent on the number of genes expressed within the cell (for us, nFeature_RNA).

I think this can be improved by subsetting to just the expressed genes:

and then normalized (perhaps to the max number of genes across all cells? TBD, I need to experiment more with more single cell data.)

I'll add more details later!

-GW

GWMcElfresh · 2024-06-23T19:12:19Z

Just for benchmarking, scoring RIRA's T cells with MsigDB's hallmark gene sets takes ~17 hours on 4 cores. BB01 (timecourse) takes about 14 hours with 4 cores.

bbimber · 2024-10-09T04:07:00Z

@GWMcElfresh: i didnt review this super carefully, but if you feel like this is good you can merge it

GWMcElfresh · 2024-10-09T04:39:26Z

@GWMcElfresh: i didnt review this super carefully, but if you feel like this is good you can merge it

Thanks, I think the wasserstein based measures are probably well tested (in theory, not code), but the dts based stuff (tested in code, not theory) still passes with a magnitude like 60k higher than I think it should.

still cooking a bit, but I might pull back on the scope of this function and only support lightly support dts as an experimental capability, with wasserstein defaults. The speedup from a6fa79c is really hard to turn down.

GWMcElfresh added 8 commits June 7, 2024 22:57

add alternative ssGSEA

24a4b95

Merge branch 'master' into WassersteinSsgsea

1b19ae0

Delete tests/testdata/tmpoutput/diversity_output.csv

6248719

separate scoring function, add docs, improve sanity checks

05391d3

simplify dependencies, add to DESCRIPTION, improve aesthetics

3cc877d

update docs

dded55f

export function, add tests

61af002

Merge branch 'master' into WassersteinSsgsea

4248e0e

GWMcElfresh marked this pull request as ready for review June 21, 2024 15:24

GWMcElfresh added 10 commits July 20, 2024 21:28

swap from rank distributions to proper ecdfs based on ranks

ab6b6ac

update docs, fix number of genes sampled in ecdf

2d17e84

update test

8d27be3

speed up wasserstein distance caclulation

a6fa79c

Merge branch 'master' into WassersteinSsgsea

ca93222

Merge branch 'master' into WassersteinSsgsea

b235c5f

Merge branch 'master' into WassersteinSsgsea

29de15b

Merge branch 'master' into WassersteinSsgsea

9882787

Merge branch 'master' into WassersteinSsgsea

dfcbf66

Merge branch 'master' into WassersteinSsgsea

3413d1d

GWMcElfresh added 2 commits October 11, 2024 15:37

Merge branch 'master' into WassersteinSsgsea

8ea5b93

Merge branch 'master' into WassersteinSsgsea

b11eb43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add alternative ssGSEA #258

add alternative ssGSEA #258

GWMcElfresh commented Jun 8, 2024 •

edited

Loading

GWMcElfresh commented Jun 23, 2024 •

edited

Loading

bbimber commented Oct 9, 2024

GWMcElfresh commented Oct 9, 2024 •

edited

Loading

add alternative ssGSEA #258

Are you sure you want to change the base?

add alternative ssGSEA #258

Conversation

GWMcElfresh commented Jun 8, 2024 • edited Loading

GWMcElfresh commented Jun 23, 2024 • edited Loading

bbimber commented Oct 9, 2024

GWMcElfresh commented Oct 9, 2024 • edited Loading

GWMcElfresh commented Jun 8, 2024 •

edited

Loading

GWMcElfresh commented Jun 23, 2024 •

edited

Loading

GWMcElfresh commented Oct 9, 2024 •

edited

Loading