Skip to content

estimate from the highest peak

Kamil S. Jaron edited this page Dec 9, 2019 · 2 revisions

The estimate from the highest peak (estimate_1n_coverage_highest_peak function) is utilized with either 1n estimate from subsets or the user defined 1n coverage (if specified).

The function uses isolated individual smudges (tiles adjacent to local maxima of smudgeplot) to find which of the peaks is the highest (or which smudge is brightest). Then only kmer pairs with relative coverage +- 0.01 from the center of the brighest peak are subselected. The subselected kmers are fit with a kernel smoothing function and only the highest peak is then used a normalized by one of the previous 1n estimates (as specified in the last paragraph).

The pseudocode could look like this:

highest_peak = which.max(kmers_in_smudges)

if no_1n_estimate_exist
    # if the 1n estimate does not exist, we just assume that the brighest smudge is diploid 
    # as that wss usually the case in empirical data where 1n estimate did not converge
    draft_1n = coverage_of_the_highest_peak / 2
else:
    if used_defined_1n_est exists:
        draft_1n = used_defined_1n_est
    else:
        draft_1n = subset_1n_est

highest_min_cov = rounded relative converage of highest_peak
kmer_subset = kmer_coverages that minor_variant_rel_cov < (highest_min_cov + 0.01) & minor_variant_rel_cov > (highest_min_cov - 0.01)
major_cov = highest_peak_est_by_kernel_smoothing(kmer_subset)
major_cov / round(major_cov / draft_1n)