number of genes in tl.rank_velocity_genes result #858

natallah · 2022-04-12T11:20:08Z

...

scv.tl.rank_dynamical_genes(adata, groupby='clusters',n_genes=2000)

I want scVelo to output the top 2000 genes for each cluster, however only 706 genes were output for each cluster. There is no error message and it seems to run fine and work well, so I don't know that this is a bug and is likely moreso that I do not understand how scVelo is selecting the genes to output (I understand the ranking and statistics, but not why the list is cut at 706 rather than 2000 genes). Of note, I also computed velocities on every gene using scv.tl.recover_dynamics(adata, var_names='all') so the issue shouldnt be that some genes do not have velocities associated with them.

WeilerP · 2022-04-12T11:54:48Z

@natallah would you please provide the full pipeline you are running including the traceback and package versions are asked for in the issue template? Thanks!
Note also that passing var_names='all' to recover_dynamics does not mean that one model per gene can always be fitted as there might be insufficient number of counts. You can, for example, check for how many genes a model was fit by counting the number of entries in adata.var['fit_alpha'] wich are not NaN, e.g. something along the lines of adata.var['fit_var_names'].isnull() would give you the NaN entries.

natallah · 2022-04-12T12:38:38Z

scvelo version 0.2.4 was used.

I do not have the traceback anymore, but there was no error when I ran it (sorry!). I recognize that you will not be able to look into my specific question/use case. I assume that the other entries are NaN, which I'll check. So then is the way that rank_velocity_genes works to output all genes with velocity measurements regardless of statistical significance, ranked (up to the max number of genes)? Is there any way to see which were statistically significant?

WeilerP · 2022-04-12T13:15:22Z

I do not have the traceback anymore, but there was no error when I ran it (sorry!).

Sorry, did not necessarily mean error stack trace but also the logging statements (along with the corresponding code) provided by scvelo/scanpy which can be helpful to pinpoint why a certain behaviour is being observed (see e.g. #775 where the number of selected highly variable genes differed from the specified number).

So then is the way that rank_velocity_genes works to output all genes with velocity measurements regardless of statistical significance, ranked (up to the max number of genes)?

No, there is a bit of gene selection going on (see here) which could also explain why might also contribute to the number of genes being smaller than 2000.

Is there any way to see which were statistically significant?

No, unfortunately not yet (see #154).

natallah · 2022-04-14T15:53:13Z

Sorry for the delay! Attached are the logging statements and code. I separated the commands into small chunks so that I could troubleshoot easier. I also commented out irrelevant lines (for the purposes of this issue).

2_velocity_analysis_loom_macs_all.txt
5_differential_velocity_macs.txt
Velocity_compute-14593416-natallah.txt
Velocity_diff-14735181-natallah.txt
theislab/scvelo/files/8490567/5_differential_velocity_macs.txt)

Also I appreciate your comments! I think many of the genes simply did not have enough cells to compute the dynamics and then possibly for others the gene selection that you pointed out may be causing the discrepancy between the number I requested and the number I am getting in the output. This is totally fine, but I wanted to make sure that I understood the output that I was getting prior to drawing any conclusions.

WeilerP · 2022-10-23T10:42:08Z

@natallah, I'm closing this for now. Please provide reproducible code snippets I can simply copy and paste, here on GitHub. Additionally, please provide the accompanying traceback. Sorry, I do not have the time to go through different, random text files that do not contain merely the essential information.

natallah added the bug Something isn't working label Apr 12, 2022

WeilerP closed this as completed Oct 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of genes in tl.rank_velocity_genes result #858

number of genes in tl.rank_velocity_genes result #858

natallah commented Apr 12, 2022

WeilerP commented Apr 12, 2022

natallah commented Apr 12, 2022 •

edited

Loading

WeilerP commented Apr 12, 2022

natallah commented Apr 14, 2022 •

edited

Loading

WeilerP commented Oct 23, 2022

number of genes in tl.rank_velocity_genes result #858

number of genes in tl.rank_velocity_genes result #858

Comments

natallah commented Apr 12, 2022

WeilerP commented Apr 12, 2022

natallah commented Apr 12, 2022 • edited Loading

WeilerP commented Apr 12, 2022

natallah commented Apr 14, 2022 • edited Loading

WeilerP commented Oct 23, 2022

natallah commented Apr 12, 2022 •

edited

Loading

natallah commented Apr 14, 2022 •

edited

Loading