Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

up vs downregulated genes in a geneset #18

Open
redst4r opened this issue May 25, 2023 · 3 comments
Open

up vs downregulated genes in a geneset #18

redst4r opened this issue May 25, 2023 · 3 comments

Comments

@redst4r
Copy link
Collaborator

redst4r commented May 25, 2023

the geneset-class supports sets where some genes are positively associated with a signature (UP) and other that are negatively associated (DN).

Do we even utilize that in the scoring?

One way (seen in some papers):

  • calculate a score s_up for positive genes in the set
  • calculate a score s_dn for negative genes in the set
  • the final score is their difference s_up-s_dn
@Gibbsdavidl
Copy link
Collaborator

Gibbsdavidl commented May 25, 2023 via email

@redst4r
Copy link
Collaborator Author

redst4r commented May 25, 2023

it's definitely a good thing to account for, a bunch of those "newer" genesets are weighted (some with negative weights), e.g. Progeny

I just wasn't sure if the scoring functions handle it accordingly. It seems to be done
in score_fun.

This is for the non-ranked version

        if (gs.mode == 'UP') and (ranked == False):
            res0 = method_selector(gs, x, 'counts', gs.genes_up, method, method_params)

        elif (gs.mode == 'DN') and (ranked == False):
            res0 = method_selector(gs, x, 'counts', gs.genes_dn, method, method_params)

        elif (gs.mode == 'BOTH') and (ranked == False):
            res0_up = method_selector(gs, x, 'counts', gs.genes_up, method, method_params)
            res0_dn = method_selector(gs, x, 'counts', gs.genes_dn, method, method_params)
            res0 = (res0_up + res0_dn)

Here, it looks like UP and DN get handled the same way, whereas I'd have expected the "sign of DN to be flipped", i.e a cell with high experssion of a DN gene gets a low score.
Similar for BOTH!

For ranked=True:

        elif (gs.mode == 'UP') and (ranked == True):
            res0 = method_selector(gs, x, 'uprank', gs.genes_up, method, method_params)

        elif (gs.mode == 'DN') and (ranked == True):
            res0 = method_selector(gs, x, 'dnrank', gs.genes_dn, method, method_params)

        elif (gs.mode == 'BOTH') and (ranked == True):
            res0_up = method_selector(gs, x, 'uprank', gs.genes_up , method, method_params)
            res0_dn = method_selector(gs, x, 'dnrank', gs.genes_dn, method, method_params)
            res0 = (res0_up + res0_dn)

Looks like it's inverting the ranks for the DN genes by using x['dnrank'] (didn't confirm though).
A cell with high expression of a down gene would be at the bottom of x[dnrank] and get a low score, which is correct.

Conclusion

Seems to work correctly for ranked=True, but does the wrong thing for ranked=False.
I'm not quite sure how to deal with it in the ranked=False case. Can we just flip the sign or is there something else needed?

@Gibbsdavidl
Copy link
Collaborator

Yeah, thanks for thinking about this. Agree that in the ranked case we're ok, and with testing it seems to work.
However, in the counts case, maybe s_up-s_dn is a good idea! A low score for s_dn, is what's desired with a geneset_DN. If s_dn is high, that means the gene expression is also high, and contrary to what we want..

proposal:
'counts': s_up - s_dn
'ranked': s_up + s_dn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants