Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set more permissive parameters? #24

Open
jmfa opened this issue Dec 14, 2020 · 3 comments
Open

How to set more permissive parameters? #24

jmfa opened this issue Dec 14, 2020 · 3 comments

Comments

@jmfa
Copy link

jmfa commented Dec 14, 2020

Hi,
I’m currently running SCIPhI on some simulated sets but i’m getting some peculiar patterns and I’m not sure whether this is a bug of the tool or a misspecification of parameter settings or something else.
So, the simulated data I’m using consists of mpileups with 40 cells, ~10k sites (all variable) and sequencing depth ~ 5X.

My idea is to lower the SCIPhI settings to its minimum thus letting it be really permissive and allow for most sites to be picked up for phylogenetic reconstruction.
I tried the following command line (in which I tried to set the parameters controlling for depth to 0):

sciphi -o test --in sampleNames -u 0 --ncf 0 --mff $minfreq --md 0 --mmw 4 --mnp 1 --ms 0 --mc 0 --unc true --mf 0 -l 200000 --seed $RANDOM ${sim}.mpileup

However this is what I get:

Reading the config file: ... done!
Reading the mpileup file: num Samples: 41
total # mut: 0 currently used: 0
normal - freq: 0.135070376076846 tmp: 0.135070376076846 SD: 0.01 count: 0 trails: 0
normal - overDis: 100 tmp: 100 SD: 5 count: 0 trails: 0
normla - alpha: 13.5070376076846 beta: 86.4929623923154
mutation - overDis: 2 tmp: 2 SD: 0.1 count: 0 trails: 0
mutation - alpha: 0.819906165230872 beta: 1.27014075215369
drop: 0.9 SD: 0.01 count: 0 trails: 0
lambda: 0 SD: 0.01 count: 0 trails: 0
1
done!
numUniqMuts: 0
dataUsage<0>: 0.1 0.1
369 0
newDataSize: 37 36.9
The new best score is: -367258.331190865
num Samples: 41
total # mut: 369 currently used: 37
[…]

As you can see, the total number of mutations identified is much lower than 10,000.
So my question is:
Is this just a question of low power for detection (and therefore expected), or am I setting the parameters wrong?

Thank you very much in advance,
J

@winni2k
Copy link
Contributor

winni2k commented Dec 14, 2020

Hi Joao,
It looks like sciphi has excluded almost all mutations before running tree reconstruction. There only appear to be 369 mutations left after filtering.

I have tried running sciphi on low coverage data as well, and observed similar oddities. I hacked around in the source code a bit, and I have come to the conclusion that the site filters tend to filter out many or all sites when average coverage gets near 3x. I tried disabling the filters, but I could not get sciphi to run quickly after that. I can share my version of sciphi with disabled site filters if you are interested in trying it. Perhaps you will have more luck.

@winni2k
Copy link
Contributor

winni2k commented Dec 14, 2020

Also, you might try https://github.com/raphael-group/SBMClone instead. It performs surprisingly well in my hands on simulated data.

@jmfa
Copy link
Author

jmfa commented Dec 14, 2020

Thanks @winni2k!
Will definitely take a look at SBMClone. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants