Skip to content

Local realignment

Ryan Wick edited this page Nov 29, 2021 · 14 revisions

Polypolish does not perform local realignment. However, the concept is relevant and I thought it deserved a wiki page. So if you're interested, read on...

What is local realignment?

Most read alignment strategies align each read independently. I.e. a read's alignment doesn't depend on other reads in the read set. However, there can often be multiple ways of aligning reads around indels, and some alignments will make the 'wrong' choice. This is especially common when an indel is near the end of a read.

For example, consider these alignments, where there is a 4-bp deletion in the reads relative to the reference:

   read 5:                                 GTGCGTGTGCGAGGAACCTT
   read 4:                          TACG----TGCGTGTGCGAGGAAC
   read 3:                     GCGACTACG----TGCGTGTGCGA
   read 2:               TACTGTGCGACTACG----TGCGT
   read 1:           CTTTTACTGTGCGACTACGT
           ——————————————————————————————————————————————————————————————
reference: AATTCGTGGACTTTTACTGTGCGACTACGACCATGCGTGTGCGAGGAACCTTAGACTGTCAG

Reads 1 and 5 didn't get the alignment quite right, because due to the scoring scheme, a 1-bp mismatch was preferable to a 4-bp deletion. When looking at just read 1 or just read 5, their alignment looks good. It's only when looking at all read alignments together that it becomes clear that there's a 4-bp deletion and reads 1 and 5 could be aligned better.

Local realignment adjusts read alignments by taking each other into account. In our example, this should yield the following, where reads 1 and 5 are in agreement with the rest:

   read 5:                             G----TGCGTGTGCGAGGAACCTT
   read 4:                          TACG----TGCGTGTGCGAGGAAC
   read 3:                     GCGACTACG----TGCGTGTGCGA
   read 2:               TACTGTGCGACTACG----TGCGT
   read 1:           CTTTTACTGTGCGACTACG----T
           ——————————————————————————————————————————————————————————————
reference: AATTCGTGGACTTTTACTGTGCGACTACGACCATGCGTGTGCGAGGAACCTTAGACTGTCAG

Note that the words 'local' and 'realignment' can have different meanings in different contexts, so the term 'local realignment' might be a bit confusing:

  • Usually in the context of alignment, 'local' means not 'global' (e.g. Smith-Waterman vs Needleman-Wunsch). But in 'local realignment', 'local' refers to the fact that the realignment only aims to make small changes to each read's alignment in the same vicinity of the reference (i.e. locally).
  • 'Realignment' can also refer to the act of taking reads aligned to one reference genome and adjusting the alignments to a different reference genome. E.g. human short reads might be aligned to GRCh37 but you need them to be aligned to GRCh38. This is a different kind of 'realignment' and not what we're talking about here.

You can read more about local realignment in these papers:

Local realignment before Polypolish?

Since Polypolish uses the read pileup to call variants, including indels, local realignment could be beneficial. So if you can locally realign your alignments prior to giving those alignments to Polypolish, you should do so!

However, I have found it to be very difficult in practice. GATK v3 could do local realignment, but this was removed from GATK in v4. Even if you're willing to use an old version of GATK, there are a lot of roadblocks along the way - it's not an easy process! SRMA doesn't support the all-per-read alignments that Polypolish needs (it outputs only one alignment per read). CLC Genomics Workbench seems to have a tool for local realignment, but it isn't open source so I haven't been able to try it. And some other tools like glia only realign around specific variants in a VCF - not the entire set of alignments. Samtools/Bcftools avoid local realignment, instead relying on their BAQ system to address the problem in a different way.

Since I can't find a good way to do it, I don't use local realignment before running Polypolish. But if there is a method that I've missed, I would love to hear about it! So if you know of one, please send me an email 😄