-
Notifications
You must be signed in to change notification settings - Fork 2
human_customized_genome_lib
Certain customizations are performed to both the human reference genome and annotations to facilitate detection of certain types of fusion transcripts, in the default provided human CTAT genome libs or if the prep_genome_lib.pl is executed to build a human lib with option '--human_gencode_filter' .
These modifications include the following:
-
readthru transcripts with long introns (min 100kb) are discarded.
-
IGH and IGL gene annotations are augmented with IG-superloci spanning the entire loci on both strands. These appear like so:
- IGH.g@-ext IGH-.g@-ext
- IGL.g@-ext and IGL-.g@-ext
-
The following gene boundaries are extended at each end by the following number of bases.
- CRLF2, 50kb
- MALT1, 40kb
- DUX4, 10kb
-
homologous regions in the genome to DUX4 and SEPTIN14 corresponding to paralogs or pseudogenes are masked out. These are defined by using blastn with reference transcript sequences searched against the reference genome sequence, performed at ctat genome lib build time.
-
pseudoautosomal regions (PAR) on chrY are masked, including +/- 50kb of PAR features.