Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing all IGH fusions #3

Open
ndaniel opened this issue Dec 26, 2017 · 2 comments
Open

Missing all IGH fusions #3

ndaniel opened this issue Dec 26, 2017 · 2 comments

Comments

@ndaniel
Copy link

ndaniel commented Dec 26, 2017

It looks like all IGH fusions are missed. IGH fusions are for example found in lymphoblastic leukemias.

For example, the IGH-DUX4 fusion is missed in NALM-6 cell line (using this RNA-seq data from CCLE https://gdc-portal.nci.nih.gov/legacy-archive/files/6fa77b04-bb16-49c5-8033-79dd76860c97 ).

@friend1ws
Copy link
Member

Thank you very much for the interest in our software.
Yes, IGH-DUX4 fusion is one of few example which our approach miss.
Our approach accepts list of of chimeric reads generated by aligner (e.g., *.Chimeric.out.sam by STAR), and filter them to identify highly reliable fusions (so mostly similar to the approach by STAR-fusion).
Therefore, actually, there are no chimeric reads supporting IGH-DUX4 fusions at the stage of *.Chimeric.out.sam. (when using STAR), and IGH-DUX4 cannot be found by our software.

I guess either IGH or DUX4 is highly repetitive sequence and STAR miss to accurately align short reads covering these genes...

I'm considering resolve this issue by other approaches.

@ndaniel
Copy link
Author

ndaniel commented Dec 27, 2017

IGH-DUX4 fusion is one of few example which our approach miss.

Our there are more than 40 known IGH fusions (see: http://atlasgeneticsoncology.org/Genes/GC_IGH.html ) so therefore a lot of fusion genes that are missed! Also I would guess that also fusion CIC-DUX4 is missed too (see: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099439 )

STAR and STAR-fusion are known to miss almost all IGH fusions. Out there are fusion callers that are very well known to be able to find IGH fusions in RNA-seq data, like for example CICERO and FusionCatcher.

I guess either IGH or DUX4 is highly repetitive sequence and STAR miss to accurately align short reads covering these genes...

I guess that the IGH-DUX4 fusion is missed because all of these three reasons together:

  • IGH annotation that is used is not optimal because there is high variation at IGH@ from one individual to another (i.e. the standard GTF file is not good enough for finding IGH fusion genes);
  • DUX4 has a very large number of pseudogenes which have very high sequence similarity;
  • IGH fusions in general may have a random sequence of 5bp to 40bp inserted at fusion junction (e.g. STAR and STAR-fusion cannot handle this kind of alignment).

Here is a small FASTQ files test for fusions which contains 17 known fusions and it can be used to asses quickly what fusions are missed by a fusion caller:
https://sourceforge.net/projects/fusioncatcher/files/test/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants