barcode identification requires randomised fastq #43

Psy-Fer · 2018-01-30T06:56:02Z

When running porechop, I came across unexpected output when identifying the barcodes on 10k sample reads.

It seems it takes the first 10k reads, however I concatenated the outputs of the albacore reads after demultiplexing, so they were ordered from barcode01->barcode12->unclassified.
So the first 10k reads were all barcode01.

I wrote a quick script to shuffle a fastq file (python 2.7ish) shuffle_fastq.py
see here: https://github.com/Psy-Fer/bioinf_tools

When I ran porechop on this new shuffled file, it detected all the correct barcodes (better than albacore i might add) and seems to be running smoothly.

A feature request would be to modify the barcode detection function to randomly sample the ingested fastq. Otherwise note in the docs would do :)

cheers.

rrwick · 2018-01-31T06:13:48Z

Yes, this one could be solved either by specifying barcodes (#42) or by randomly subsampling the input reads.

In the meantime, don't forget that if you give Porechop a directory as input, it will look for all read files in that directory, and then it samples from each of them to avoid this issue. And as a bonus, if the directory looks like an Albacore directory with demultiplexing, Porechop will note the Albacore barcode and put reads in the 'none' bin if it and Albacore disagree. I find this useful for reducing mis-binned reads.

Ryan

Psy-Fer · 2018-01-31T06:16:54Z

Ahh thanks for that.
I was trying to do some comparisons between algorithms, without being aware of each other.
So probably a low priority fix :) My shuffle script fixes the issue for now.

Cheers

rrwick added the enhancement label Feb 2, 2018

MChiaraC mentioned this issue Jan 15, 2019

Porechop demultiplexing error #80

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

barcode identification requires randomised fastq #43

barcode identification requires randomised fastq #43

Psy-Fer commented Jan 30, 2018

rrwick commented Jan 31, 2018

Psy-Fer commented Jan 31, 2018

barcode identification requires randomised fastq #43

barcode identification requires randomised fastq #43

Comments

Psy-Fer commented Jan 30, 2018

rrwick commented Jan 31, 2018

Psy-Fer commented Jan 31, 2018