Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple alleles: bug in mapAllelesCFtable? #143

Closed
cecileane opened this issue Sep 25, 2020 · 3 comments
Closed

multiple alleles: bug in mapAllelesCFtable? #143

cecileane opened this issue Sep 25, 2020 · 3 comments

Comments

@cecileane
Copy link
Member

@crsl4 : would you have time to look into this? I don't unfortunately...

See google group question and attached file, thanks to @brunoasm : it seems that mapAllelesCFtable! would return an empty list of repeated species, such that mergeRows() is not called during readTableCF! and the resulting DataCF object still has 13,757,814 4-taxon sets instead of a few 1000s:

CF = PhyloNetworks.mapAllelesCFtable("pnetworks_pops.txt","all_cf.csv") # 13,757,814 rows × 14 columns data frame
dataCF_summarized = PhyloNetworks.readTableCF!(CF) # DataCF, number of quartets: 13757814

yet mergeRows worked:

dataCF_summarized = PhyloNetworks.readTableCF!(PhyloNetworks.mergeRows(CF, [1,2,3,4,5,8,11,14]))# 2375 unique 4-taxon sets were found.

(used PhyloNetworks v0.11.0 and julia v1.5.1)

@crsl4
Copy link
Member

crsl4 commented Sep 27, 2020

Sorry, I cannot find the attached file. Can you forward it to me to be able to check?
I am spread too thin, but I'll try to carve out some time.
Thanks!

@cecileane
Copy link
Member Author

The file is attached to the google group question. It is not replicable (the data set is huge: 13,757,814 four-taxon sets), but I copy-pasted the key lines above. The problem is that dataCF_summarized still has 13,757,814 rows, instead of a few 1000s.

@brunoasm
Copy link

Hi @cecileane and @crsl4, thanks for looking into this!

Here I am attaching a subset of the data to reproduce the problem and help solving it.
pnetworks_test.zip

I also added a row in which all alleles map to the same species. readTableCF!() removes that one, but does not average rows with same species.

Explicitly calling PhyloNetworks.mergeRows() does average over rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants