Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blast/sortmerna head of fastq file when amplicon target gene is ambiguous? #3

Open
adswafford opened this issue Jun 18, 2021 · 2 comments

Comments

@adswafford
Copy link
Contributor

@antgonza when downloading wastewater studies with amplicon data, almost all of them are coming back ambiguous even when it's clear from the study page that it is 16S, e.g. https://www.ebi.ac.uk/ena/browser/view/PRJDB6476

What do you think about taking the head of the fastq data and testing to see if it is from a target gene we support (16S, ITS, 18S) instead of just calling it ambiguous?

@antgonza
Copy link

Yeah that study has no preparation/experiment metadata to accurate select the target gene. However, as you mentioned, it's in the study description. What about parsing that and assigning to 16S when there is only one? Like in the example.

@adswafford
Copy link
Contributor Author

I suppose it's a matter of how much we trust the data in the files to match what's in the abstract/description? I think it's worth a shot along with some warning text that we can add to the analytical notes section. What's the best way to convey that info? We can't put it in the qebil_status file since then it won't just have the line 'complete'.

So just to confirm, the short term plan is to search the abstract and description for any of our know target genes, and if only one appears, and the library_strategy is amplicon, then we'll assign the inferred target gene from the abstract and force the user to sort it out after processing if incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants