Ancient_DNA_simulations

Simulations for manuscript: Pandey D, Harris M, Garud N and Narasimhan V. Understanding natural selection in Holocene Europe using multi-locus genotype identity scans (https://www.biorxiv.org/content/10.1101/2023.04.24.538113v1)

We include files used to run hard and soft sweep simulations as well as the code used to compute the false discovery rate values computed from neutral simulations ("Neutral" folder).

We simulated the Tennessen et al. demographic model which describes the ancestral human population in Africa, followed by the out-of-Africa event and two periods of European population growth.

Figure 1. Tennessen et al. model (Fu et al. 2013, Fig S5).

Selective sweep simulations

We modified the stdpopsim code (https://popsim-consortium.github.io/stdpopsim-docs/stable/catalog.html#sec_catalog_HomSap_models) for the Two-population out-of-Africa demographic model (Tennessen et al. 2012) to include positive selection. The corresponding code for hard and soft sweep models are: Tennessen_HardSweeps.slim and Tennessen_SoftSweeps.slim, respectively.

We varied the time of the onset of selection, generation of sample and the selection coefficient of the sweeps. We consider a mean generation time of 28 years and obtain samples of 177 individuals at each sampling time point. For the hard sweep simulations, a single beneficial mutation was introduced halfway through the chromosome of a random individual from the European population. For the soft sweep simulations we introduced K beneficial mutations at the time of the onset of selection for K=5,10,25 and 50. All sweep simulations are conditional on the sweep not being lost.

Missing data and pseudo-haploidization

Based on missingness observed in the data we added missing data to our simulated datasets with a mean rate of 0.55 missingness per SNP and standard deviation of 0.23. We next pseudo haploidized the data using a pseudo haploidization scheme in which we randomly selected one of the two alleles in the case of heterozygous genotypes. Based on previous haplotype-based statistics (Garud et al. 2015; Harris et al. 2018), we define the multilocus-genotype based statistic for aDNA data as:

$$G12_{ancient}= (p_1 + p_2)^2 + \sum_{i>2}p_i^2,$$

where $p_i$ is the frequency of the i-th most common pseudo-haplotype in a sample.

The code to run 100 simulations for each combination of parameters can be found in the run_HardSweeps_1000.sh and run_SoftSweeps_1000.sh files for hard and soft sweeps, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Modern_Data		Modern_Data
NeutralSims		NeutralSims
README.md		README.md
SF2_Parse.py		SF2_Parse.py
Tennesen_Parse.py		Tennesen_Parse.py
Tennesen_Parse_noMD.py		Tennesen_Parse_noMD.py
Tennesen_Parse_noPseudoHap.py		Tennesen_Parse_noPseudoHap.py
Tennesen_selec_SoftSweeps.slim		Tennesen_selec_SoftSweeps.slim
Tennesen_selec_hardSweep.slim		Tennesen_selec_hardSweep.slim
Tennessen_HardSweeps.slim		Tennessen_HardSweeps.slim
Tennessen_SoftSweeps.slim		Tennessen_SoftSweeps.slim
Tennessen_selec.py		Tennessen_selec.py
qsub_HardSweep		qsub_HardSweep
qsub_SoftSweep		qsub_SoftSweep
run_HardSweep.sh		run_HardSweep.sh
run_HardSweeps_1000.sh		run_HardSweeps_1000.sh
run_Parse4SF2.sh		run_Parse4SF2.sh
run_SF2_CLR.sh		run_SF2_CLR.sh
run_SF2_CLR_Neutral.sh		run_SF2_CLR_Neutral.sh
run_SoftSweep.sh		run_SoftSweep.sh
run_SoftSweeps_1000.sh		run_SoftSweeps_1000.sh
run_mergeSFoutput.sh		run_mergeSFoutput.sh
slim_parametersHardSweep.py		slim_parametersHardSweep.py
slim_parametersSoftSweep.py		slim_parametersSoftSweep.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ancient_DNA_simulations

About

Releases

Packages

Contributors 2

Languages

mariharris/G12ancient_simulations

Folders and files

Latest commit

History

Repository files navigation

Ancient_DNA_simulations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages