High NDR values when using a specific annotation #459

N-Hoffmann · 2025-01-24T19:53:37Z

Hello Bambu team, I hope that you are doing well in Singapore :)

I am currently working on a Nextflow pipeline that uses Bambu. Our goal is to extend user-provided reference annotations with novel genes and transcripts identified by Bambu. In one of pipeline steps, we filter out novel Bambu transcripts based on their NDR; by default, we keep transcripts that have an NDR < 0.2.

To test the pipeline with our Nanopore data (8 canine samples and 2 human samples), I am using the pipeline with different genome assemblies (human and canine) along with different reference annotations for each genome . The pipeline runs smoothly for each genome and annotation combination, except one. In this combination, I use a canine Ensembl annotation (https://ftp.ensembl.org/pub/release-113/gtf/canis_lupus_familiarisgsd/Canis_lupus_familiarisgsd.UU_Cfam_GSD_1.0.113.chr.gtf.gz) and the lowest NDR I get is 0.51 (I get 198027 transcripts). Using the same reference genome but with a refseq annotation, the lowest NDR I get is 0.0022, and 1583 transcripts out of 177385 have an NDR below 0.2. I obtain transcripts with an NDR below this threshold in all other combinations, except this one.

We checked the annotation file itself but didn't find anything unusual. I also ran my pipeline with two Ensembl annotations for other genome assembly versions, and it works as expected. I also remapped my samples to be sure, but it didn't change the NDR values. I checked the output of Bambu, but it runs fine. I am using Bambu version v.3.4.1 .

I was wondering if you had an idea for why I get these higher NDR values for this specific annotation ?
Thanks for any help !
Nicolaï

cying111 · 2025-01-31T07:15:38Z

Hi @N-Hoffmann ,

thanks for reporting the issue, and sorry for getting back lately.

I had a look at the annotations, it all looks good to me. The results do seem a bit suspicious, but it does happen sometime when reads are of low quality, or low depth. I know you have have checked already, but what is the number of reads and mapping quality of this sample that is aligned to the particular annotation? When you say it runs fine with refseq, did you use the same sample?

If you don't mind, you can also send me your data (bam file, genome fasta) for this particular sample, and I will investigate further for you.

Thank you
Warm regards,
Ying

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High NDR values when using a specific annotation #459

High NDR values when using a specific annotation #459

N-Hoffmann commented Jan 24, 2025

cying111 commented Jan 31, 2025

High NDR values when using a specific annotation #459

High NDR values when using a specific annotation #459

Comments

N-Hoffmann commented Jan 24, 2025

cying111 commented Jan 31, 2025