feat: Sampling without replacement option for simgenotype #194

mlamkin7 · 2023-02-17T20:19:43Z

Added sampling without replacement flag in simgenotype --no_replacement that allows the user to run simgenotype and when outputting the newly simulated variants, the reference vcf used will be sampled without replacement per region.

Code adapted from Amy Williams PR that originally used pandas but now uses numpy and lists

aryarm

looks great! I think the tests are pretty comprehensive
and the code seems pretty straightforward
great job!

aryarm · 2023-02-20T22:20:12Z

haptools/sim_genotype.py

@@ -113,6 +122,9 @@ def output_vcf(
    for ind, sample in enumerate(vcf.samples):
        sample_dict[sample] = ind

+    # initialize hap_used array which should contain list of lists for each reference sample's genome and what segment has been used for each
+    haps_used = [[] for samp in range(len(vcf.samples)*2)]


does this scale well for large numbers of samples? I recall that python dynamically allocates memory for list items, even within list comprehensions
so it might have to repeatedly make copes whilst constructing this list?

Honestly, it might not even matter in terms of time, compared to other things in our code. I just wanted to confirm

haptools/sim_genotype.py

amythewilliams

Thanks for adapting all this to not rely on Pandas! Overall this seems great. The primary concern is with regard to how random the haplotypes will be. Also, in admix-simu, we ensured that the haplotype changes at each segment breakpoint. It seems like this code may not do that? Look forward to this being added to haptools!

Update haplotype choices for replacement so it doesn't switch when it isn't changing segments

Ensured that every segment we grab is randomly selecting a sample and not just the first sample in the list over and over again

mlamkin7 added 2 commits February 16, 2023 11:54

Start of no replacement option for simgenotype

6a914b2

Added --no-replacement flag

9a5e342

mlamkin7 requested a review from aryarm February 17, 2023 20:20

main and test_outputvcf files reformatted

d18ed06

aryarm reviewed Feb 20, 2023

View reviewed changes

amythewilliams reviewed Feb 23, 2023

View reviewed changes

haptools/sim_genotype.py Outdated Show resolved Hide resolved

amythewilliams reviewed Feb 23, 2023

View reviewed changes

haptools/sim_genotype.py Show resolved Hide resolved

amythewilliams reviewed Feb 23, 2023

View reviewed changes

haptools/sim_genotype.py Show resolved Hide resolved

amythewilliams reviewed Feb 23, 2023

View reviewed changes

mlamkin7 added 4 commits February 23, 2023 15:19

Update sim_genotype.py

56d771d

Update haplotype choices for replacement so it doesn't switch when it isn't changing segments

Update sim_genotype.py

5e3f81b

Ensured that every segment we grab is randomly selecting a sample and not just the first sample in the list over and over again

Fixed typo

c1f2072

Fixed typo interlen -> inter_len

199dc93

mlamkin7 merged commit 85bd494 into main Feb 23, 2023

github-actions bot mentioned this pull request Feb 23, 2023

chore(main): release 0.2.0 #192

Merged

aryarm deleted the feat/outvcf_no_replacement branch February 24, 2023 00:46

aryarm mentioned this pull request Feb 24, 2023

feat: code to sample genetic data from input VCF without replacement #190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Sampling without replacement option for simgenotype #194

feat: Sampling without replacement option for simgenotype #194

mlamkin7 commented Feb 17, 2023 •

edited

Loading

aryarm left a comment •

edited

Loading

aryarm Feb 20, 2023 •

edited

Loading

amythewilliams left a comment

feat: Sampling without replacement option for simgenotype #194

feat: Sampling without replacement option for simgenotype #194

Conversation

mlamkin7 commented Feb 17, 2023 • edited Loading

aryarm left a comment • edited Loading

Choose a reason for hiding this comment

aryarm Feb 20, 2023 • edited Loading

Choose a reason for hiding this comment

amythewilliams left a comment

Choose a reason for hiding this comment

mlamkin7 commented Feb 17, 2023 •

edited

Loading

aryarm left a comment •

edited

Loading

aryarm Feb 20, 2023 •

edited

Loading