up vs downregulated genes in a geneset #18

redst4r · 2023-05-25T06:37:42Z

the geneset-class supports sets where some genes are positively associated with a signature (UP) and other that are negatively associated (DN).

Do we even utilize that in the scoring?

One way (seen in some papers):

calculate a score s_up for positive genes in the set
calculate a score s_dn for negative genes in the set
the final score is their difference s_up-s_dn

The text was updated successfully, but these errors were encountered:

Gibbsdavidl · 2023-05-25T18:31:20Z

Yeah, there’s some gene sets that are defined as being ‘down’ and some are defined in two parts, one up, one down. Currently, if you have a gene set defined as DN, and the genes are low ranked, then the score is high. So in the Hallmark genesets at MSigDB, there’s a UV response gene set that’s defined in two parts, UP and DN… and it was my goal to make the package compatible with MSigDB. https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_UV_RESPONSE_DN.html But in the future, might not be important, but there’s a use case anyway.

…

-dave

On Wed, May 24, 2023 at 11:37 PM redst4r ***@***.***> wrote: the geneset-class supports sets where some genes are positively associated with a signature (UP) and other that are negatively associated (DN). Do we even utilize that in the scoring? One way (seen in some papers): - calculate a score s_up for positive genes in the set - calculate a score s_dn for negative genes in the set - the final score is their difference s_up-s_dn — Reply to this email directly, view it on GitHub <#18>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEJFSIZPLLT2YFDEDUV5KLXH344BANCNFSM6AAAAAAYOLZTLI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

redst4r · 2023-05-25T19:19:07Z

it's definitely a good thing to account for, a bunch of those "newer" genesets are weighted (some with negative weights), e.g. Progeny

I just wasn't sure if the scoring functions handle it accordingly. It seems to be done
in score_fun.

This is for the non-ranked version

        if (gs.mode == 'UP') and (ranked == False):
            res0 = method_selector(gs, x, 'counts', gs.genes_up, method, method_params)

        elif (gs.mode == 'DN') and (ranked == False):
            res0 = method_selector(gs, x, 'counts', gs.genes_dn, method, method_params)

        elif (gs.mode == 'BOTH') and (ranked == False):
            res0_up = method_selector(gs, x, 'counts', gs.genes_up, method, method_params)
            res0_dn = method_selector(gs, x, 'counts', gs.genes_dn, method, method_params)
            res0 = (res0_up + res0_dn)

Here, it looks like UP and DN get handled the same way, whereas I'd have expected the "sign of DN to be flipped", i.e a cell with high experssion of a DN gene gets a low score.
Similar for BOTH!

For ranked=True:

        elif (gs.mode == 'UP') and (ranked == True):
            res0 = method_selector(gs, x, 'uprank', gs.genes_up, method, method_params)

        elif (gs.mode == 'DN') and (ranked == True):
            res0 = method_selector(gs, x, 'dnrank', gs.genes_dn, method, method_params)

        elif (gs.mode == 'BOTH') and (ranked == True):
            res0_up = method_selector(gs, x, 'uprank', gs.genes_up , method, method_params)
            res0_dn = method_selector(gs, x, 'dnrank', gs.genes_dn, method, method_params)
            res0 = (res0_up + res0_dn)

Looks like it's inverting the ranks for the DN genes by using x['dnrank'] (didn't confirm though).
A cell with high expression of a down gene would be at the bottom of x[dnrank] and get a low score, which is correct.

Conclusion

Seems to work correctly for ranked=True, but does the wrong thing for ranked=False.
I'm not quite sure how to deal with it in the ranked=False case. Can we just flip the sign or is there something else needed?

Gibbsdavidl · 2023-07-24T23:15:04Z

Yeah, thanks for thinking about this. Agree that in the ranked case we're ok, and with testing it seems to work.
However, in the counts case, maybe s_up-s_dn is a good idea! A low score for s_dn, is what's desired with a geneset_DN. If s_dn is high, that means the gene expression is also high, and contrary to what we want..

proposal:
'counts': s_up - s_dn
'ranked': s_up + s_dn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

up vs downregulated genes in a geneset #18

up vs downregulated genes in a geneset #18

redst4r commented May 25, 2023

Gibbsdavidl commented May 25, 2023 via email

redst4r commented May 25, 2023 •

edited

Loading

Gibbsdavidl commented Jul 24, 2023

up vs downregulated genes in a geneset #18

up vs downregulated genes in a geneset #18

Comments

redst4r commented May 25, 2023

Gibbsdavidl commented May 25, 2023 via email

redst4r commented May 25, 2023 • edited Loading

Conclusion

Gibbsdavidl commented Jul 24, 2023

redst4r commented May 25, 2023 •

edited

Loading