You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently realized that the gene sets that you are using in the tutorial differ quite a bit from the gene sets that are used in Seurat's cc.genes (https://satijalab.org/seurat/reference/cc.genes.html). For a dataset that I am not allowed to show because it is not published yet, I get very different results depending on the gene set. And the results that use the cc.genes gene set look much more convincing.
Unfortunately, I do not have the time to test whether there is a difference in the dataset that you looked at. But maybe there should be some words on the importance of the choice of the gene sets in the tutorial.
The text was updated successfully, but these errors were encountered:
Hey @fabianrost84,
Thanks for highlighting this. We're just looking into a large update of this tutorial and will make sure to include this note.
For context: there are two sets of widely used gene sets for CC scoring. One from Macosko et al (used here), and one from Tirosh et al (used in Seurat and Scanpy tutorials)... I've used both as well, and since publishing have noticed that the Tirosh work better in some cases. I have found as well that you can introduce an offset in S vs G2M scores to make the Macosko et al gene set work better again, but that's not done anywhere yet... so the Tirosh gene set might just be more useful.
In case someone comes looking for the Tirosh gene set (formatted to mouse gene names) here, here is the file: Tirosh_cell_cycle_genes_mouse.txt
The notebook code would just have to be replaced by the relevant code in the scanpy tutorial:
s_genes = cell_cycle_genes[:43]
g2m_genes = cell_cycle_genes[43:]
cell_cycle_genes = [x for x in cell_cycle_genes if x in adata.var_names]
I recently realized that the gene sets that you are using in the tutorial differ quite a bit from the gene sets that are used in Seurat's
cc.genes
(https://satijalab.org/seurat/reference/cc.genes.html). For a dataset that I am not allowed to show because it is not published yet, I get very different results depending on the gene set. And the results that use thecc.genes
gene set look much more convincing.Unfortunately, I do not have the time to test whether there is a difference in the dataset that you looked at. But maybe there should be some words on the importance of the choice of the gene sets in the tutorial.
The text was updated successfully, but these errors were encountered: