Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

barcode identification requires randomised fastq #43

Open
Psy-Fer opened this issue Jan 30, 2018 · 2 comments
Open

barcode identification requires randomised fastq #43

Psy-Fer opened this issue Jan 30, 2018 · 2 comments

Comments

@Psy-Fer
Copy link

Psy-Fer commented Jan 30, 2018

When running porechop, I came across unexpected output when identifying the barcodes on 10k sample reads.

It seems it takes the first 10k reads, however I concatenated the outputs of the albacore reads after demultiplexing, so they were ordered from barcode01->barcode12->unclassified.
So the first 10k reads were all barcode01.

I wrote a quick script to shuffle a fastq file (python 2.7ish) shuffle_fastq.py
see here: https://github.com/Psy-Fer/bioinf_tools

When I ran porechop on this new shuffled file, it detected all the correct barcodes (better than albacore i might add) and seems to be running smoothly.

A feature request would be to modify the barcode detection function to randomly sample the ingested fastq. Otherwise note in the docs would do :)

cheers.

@rrwick
Copy link
Owner

rrwick commented Jan 31, 2018

Yes, this one could be solved either by specifying barcodes (#42) or by randomly subsampling the input reads.

In the meantime, don't forget that if you give Porechop a directory as input, it will look for all read files in that directory, and then it samples from each of them to avoid this issue. And as a bonus, if the directory looks like an Albacore directory with demultiplexing, Porechop will note the Albacore barcode and put reads in the 'none' bin if it and Albacore disagree. I find this useful for reducing mis-binned reads.

Ryan

@Psy-Fer
Copy link
Author

Psy-Fer commented Jan 31, 2018

Ahh thanks for that.
I was trying to do some comparisons between algorithms, without being aware of each other.
So probably a low priority fix :) My shuffle script fixes the issue for now.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants