Ploidy #19

dramanica · 2024-03-25T19:18:09Z

We could support multiple ploidy. A column can only contain one vector, so storing the ploidy information there is difficult. However, we could simply add a ploidy column to the $fam slot of the bigSNP object in the attributes of genotypes. To implement that, we would need:

When we create a gen_tibble, we should start adding a ploidy attribute to the genotypes column, with an integer corresponding to the ploidy of all individuals (if it is the same) or '0' to indicate multiple ploidy. Then we should make sure that functions that rely on diploids check for ploidy=2.
Figure out how we read multiple ploidy (parsing a VCF) so that ploidy information is stored when the data is imported.
Create a couple of example functions to operate on mixed ploidy to illustrate how it is done.

The text was updated successfully, but these errors were encountered:

dramanica · 2024-03-25T19:40:44Z

A package to check out
https://cran.r-project.org/web/packages/StAMPP/index.html

dramanica · 2024-04-01T19:26:59Z

A possible mixed ploidy dataset that looks pretty suitable as an example is:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6758580/
We should see how easy it is to put together the dataset, but it could work as a test set for mixed ploidy.

dramanica · 2024-04-20T16:49:31Z

An interesting paper detailing tools and theory
https://pubmed.ncbi.nlm.nih.gov/36720820/

dramanica · 2024-04-20T20:14:29Z

There is now a ploidy branch. It implements storing the ploidy value, which is currently set to 2 by default in all cases. There are also check in most basic functions that stop them from operating if ploidy is not 2 (some might be fine, but they need to tested properly).

dramanica · 2024-04-22T07:48:45Z

We now have loci_maf(), loci_alt_freq() and loci_missingness() working with mixed ploidy. The unit tests (not complete) are bunched up in test_show_ploidy, but later it would be better to move them under each function. I envision that, for each function, we would have a testthat section for diploid, and one for polyploid.
In terms of implementation, I am of the mind of using an optimised version for diploid, and a more generic function for polyploids (working on both homogenous ploidy, and mixed ploidy). Right now, I am coding the latter mostly in R, we can then think whether some functions would be best moved to C for speed. But I think it makes sense to get functionality first, and then focus on the bottlenecks for optimisation.
Finally, @Euphrasiologist we should probably have a look together at vcf reading to think on how to bring in polyploids (again, probably with an optimised diploid version like we have now, and then a more general option for polyploids). I think we could look at the first locus, count the alleles for each individual, and then use to decide how to process the vcf. In terms of parsing, if we need a generic function, then it would get the odd characters and sum them (1/0/1/1), summing the 1st, 3rd, 5th and 7th element.

dramanica · 2024-04-22T08:05:58Z

A quick polyploid parser for a vector of genotypes:

genotypes<-c("1/0/1/1","0/0/0/1","1|1|0|1","./././.")
# get ploidy for each individual
sapply(strsplit(genotypes,"[/|]"),function(x) length(x) )
# get dosage for each individual
poly_dosage <- function (x){
  if (x[1]!="."){
    sum(as.numeric(x))
    } else {
    return(NA)
  }
}
sapply(strsplit(genotypes,"[/|]"),poly_dosage )

dramanica · 2024-05-13T20:21:15Z

We now have the full ploidy infrastructure in main. We don't do anything with multiple ploidy data, but in principle we do have the infrastructure for it.

dramanica · 2024-05-23T06:48:51Z

A lot of pop gen formulae for mixed ploidy rely on computing the pop frequencies as mean of individual frequencies (i.e. standardising the impact of ploidy so that the individual is the unit of replication, rather than the allele; this distinction is not important when dealing with uniform ploidy). If we adapt a couple of the cpp functions to compute frequencies from individual frequencies, then we should be able to easily adapt a lot of functions to multiple ploidy.

Euphrasiologist · 2024-11-27T11:46:09Z

I'm unable to compile on my work Mac... but the changes look fairly simple in the cpp code? Do you think you can add these in please? We should absolutely make the full push to multiple ploidy - the plant genetics/genomics community will be grateful!

dramanica · 2024-11-29T14:46:12Z

Compiling on Mac requires some set up: https://cran.r-project.org/bin/macosx/tools/
We have the infrastructure for ploidy, but adapting all functions (and testing them) is a fair amount of work, so not sure when that will happen unless we get some more manpower. But happy to advise anyone who wants to help.

Euphrasiologist · 2024-12-02T10:07:21Z

It's more of a permissions thing on my Mac at ARU... I'll see what I can do! If I can get it working maybe we can meet and chat about what would need to be done :) Cheers!

dramanica assigned dramanica and Euphrasiologist Mar 25, 2024

dramanica added the enhancement New feature or request label Mar 25, 2024

dramanica mentioned this issue Apr 22, 2024

Reading in VCF's #6

Closed

dramanica mentioned this issue May 13, 2024

Ploidy #36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ploidy #19

Ploidy #19

dramanica commented Mar 25, 2024 •

edited

Loading

dramanica commented Mar 25, 2024

dramanica commented Apr 1, 2024

dramanica commented Apr 20, 2024

dramanica commented Apr 20, 2024

dramanica commented Apr 22, 2024

dramanica commented Apr 22, 2024

dramanica commented May 13, 2024

dramanica commented May 23, 2024

Euphrasiologist commented Nov 27, 2024

dramanica commented Nov 29, 2024

Euphrasiologist commented Dec 2, 2024

Ploidy #19

Ploidy #19

Comments

dramanica commented Mar 25, 2024 • edited Loading

dramanica commented Mar 25, 2024

dramanica commented Apr 1, 2024

dramanica commented Apr 20, 2024

dramanica commented Apr 20, 2024

dramanica commented Apr 22, 2024

dramanica commented Apr 22, 2024

dramanica commented May 13, 2024

dramanica commented May 23, 2024

Euphrasiologist commented Nov 27, 2024

dramanica commented Nov 29, 2024

Euphrasiologist commented Dec 2, 2024

dramanica commented Mar 25, 2024 •

edited

Loading