-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output "NA" as subtype for samples that fail QC with no subtype result or no targets found #112
Comments
Hi @glabbe
Could you or @dankein give an example of subtype result metadata that would be returned with a null subtype result? |
Hi @peterk87 I tried doing what you suggested with the metadata file before mentioning this issue to @glabbe but it doesn't appear to work in the command line or galaxy versions. We're using the metadata to reformat the "tech results" into a format that can be pasted line by line into out LIMS system for reporting. This includes version numbers of the scheme, metadata, galaxy tool, as well as custom comments for reports and/ or instructions not to report certain species without repeats... that sort of thing. Below is a partial example of what we're using. In the case of "No subtype result" we still would like to attach the version metadata and include an instruction to not report the test.
Thanks for your help! |
FYI the development branch version of biohansel (v2.3.0) outputs |
Thanks for the heads up @peterk87, the Galaxy version of biohansel is still v2.2.0: will need to be updated |
Darian has started a pull request (#152) to update biohansel in Galaxy: |
@dankein I will talk with @Takadonet in the coming days about how to update the biohansel version in Galaxy to fix this issue |
@peterk87 Actually I just found that the fix implemented in PR #81 only outputs '#N/A' if there is no k-mer match found. If there are only negative k-mers found, and therefore no subtype found, the subtype field is still left blank. See output file attached that I got when using a truncated MTB sequence. |
I used So, I'm going to hazard a guess that when a kmer is found but no subtype is given, the column is filled with a ' ' or something similar that prevents |
@peterk87 @schonfju Justin found a fix: pandas treats an empty string differently from a missing value. @DarianHole Darian's fix handles the case where it's missing value. We also need to handle the case where it's an empty string. Will do pull request, will add the following line under Darian's line in Main.py dfsummary['subtype'].fillna(value='#N/A', inplace=True) |
You were right @DarianHole, the field changes after merging the results with metadata (which happens for the tb_lineage and Typhi schemes by default). If there are no kmer matches found, there is a bypass in subtyper.py, so the results are not merged with the metadata and the dataframe ends up being different than when kmers are found. |
Merged in #120. Fixes worked for all cases I could think of and that were tested. If something else comes up, reopen or create a new Issue |
@dankein noticed that it is not possible to link metadata to the results using the biohansel metadata option if the "subtype" field is empty, as is the case for QC FAIL due to "NO TARGETS FOUND" or "NO SUBTYPE RESULT"). A possible solution would be to output "NA" in the subtype column in these cases to allow metadata to be returned with the results when "NA" is the subtype.
The text was updated successfully, but these errors were encountered: