Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number of genes in tl.rank_velocity_genes result #858

Closed
natallah opened this issue Apr 12, 2022 · 5 comments
Closed

number of genes in tl.rank_velocity_genes result #858

natallah opened this issue Apr 12, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@natallah
Copy link

...

scv.tl.rank_dynamical_genes(adata, groupby='clusters',n_genes=2000)

I want scVelo to output the top 2000 genes for each cluster, however only 706 genes were output for each cluster. There is no error message and it seems to run fine and work well, so I don't know that this is a bug and is likely moreso that I do not understand how scVelo is selecting the genes to output (I understand the ranking and statistics, but not why the list is cut at 706 rather than 2000 genes). Of note, I also computed velocities on every gene using scv.tl.recover_dynamics(adata, var_names='all') so the issue shouldnt be that some genes do not have velocities associated with them.

@natallah natallah added the bug Something isn't working label Apr 12, 2022
@WeilerP
Copy link
Member

WeilerP commented Apr 12, 2022

@natallah would you please provide the full pipeline you are running including the traceback and package versions are asked for in the issue template? Thanks!
Note also that passing var_names='all' to recover_dynamics does not mean that one model per gene can always be fitted as there might be insufficient number of counts. You can, for example, check for how many genes a model was fit by counting the number of entries in adata.var['fit_alpha'] wich are not NaN, e.g. something along the lines of adata.var['fit_var_names'].isnull() would give you the NaN entries.

@natallah
Copy link
Author

natallah commented Apr 12, 2022

scvelo version 0.2.4 was used.

I do not have the traceback anymore, but there was no error when I ran it (sorry!). I recognize that you will not be able to look into my specific question/use case. I assume that the other entries are NaN, which I'll check. So then is the way that rank_velocity_genes works to output all genes with velocity measurements regardless of statistical significance, ranked (up to the max number of genes)? Is there any way to see which were statistically significant?

@WeilerP
Copy link
Member

WeilerP commented Apr 12, 2022

I do not have the traceback anymore, but there was no error when I ran it (sorry!).

Sorry, did not necessarily mean error stack trace but also the logging statements (along with the corresponding code) provided by scvelo/scanpy which can be helpful to pinpoint why a certain behaviour is being observed (see e.g. #775 where the number of selected highly variable genes differed from the specified number).

So then is the way that rank_velocity_genes works to output all genes with velocity measurements regardless of statistical significance, ranked (up to the max number of genes)?

No, there is a bit of gene selection going on (see here) which could also explain why might also contribute to the number of genes being smaller than 2000.

Is there any way to see which were statistically significant?

No, unfortunately not yet (see #154).

@natallah
Copy link
Author

natallah commented Apr 14, 2022

Sorry for the delay! Attached are the logging statements and code. I separated the commands into small chunks so that I could troubleshoot easier. I also commented out irrelevant lines (for the purposes of this issue).

2_velocity_analysis_loom_macs_all.txt
5_differential_velocity_macs.txt
Velocity_compute-14593416-natallah.txt
Velocity_diff-14735181-natallah.txt
theislab/scvelo/files/8490567/5_differential_velocity_macs.txt)

Also I appreciate your comments! I think many of the genes simply did not have enough cells to compute the dynamics and then possibly for others the gene selection that you pointed out may be causing the discrepancy between the number I requested and the number I am getting in the output. This is totally fine, but I wanted to make sure that I understood the output that I was getting prior to drawing any conclusions.

@WeilerP
Copy link
Member

WeilerP commented Oct 23, 2022

@natallah, I'm closing this for now. Please provide reproducible code snippets I can simply copy and paste, here on GitHub. Additionally, please provide the accompanying traceback. Sorry, I do not have the time to go through different, random text files that do not contain merely the essential information.

@WeilerP WeilerP closed this as completed Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants